spark git commit: [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages
Repository: spark Updated Branches: refs/heads/branch-1.2 7f86c350c - d6262fa05 [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages This PR elimiantes the network package's usage of the Java serializer and replaces it with Encodable, which is a lightweight binary protocol. Each message is preceded by a type id, which will allow us to change messages (by only adding new ones), or to change the format entirely by switching to a special id (such as -1). This protocol has the advantage over Java that we can guarantee that messages will remain compatible across compiled versions and JVMs, though it does not provide a clean way to do schema migration. In the future, it may be good to use a more heavy-weight serialization format like protobuf, thrift, or avro, but these all add several dependencies which are unnecessary at the present time. Additionally this unifies the RPC messages of NettyBlockTransferService and ExternalShuffleClient. Author: Aaron Davidson aa...@databricks.com Closes #3146 from aarondav/free and squashes the following commits: ed1102a [Aaron Davidson] Remove some unused imports b8e2a49 [Aaron Davidson] Add appId to test 538f2a3 [Aaron Davidson] [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages (cherry picked from commit d4fa04e50d299e9cad349b3781772956453a696b) Signed-off-by: Reynold Xin r...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d6262fa0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d6262fa0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d6262fa0 Branch: refs/heads/branch-1.2 Commit: d6262fa05b9b7ffde00e6659810a3436e53df6b8 Parents: 7f86c35 Author: Aaron Davidson aa...@databricks.com Authored: Fri Nov 7 09:42:21 2014 -0800 Committer: Reynold Xin r...@databricks.com Committed: Fri Nov 7 09:42:32 2014 -0800 -- .../spark/network/BlockTransferService.scala| 4 +- .../network/netty/NettyBlockRpcServer.scala | 31 ++--- .../netty/NettyBlockTransferService.scala | 15 ++- .../network/nio/NioBlockTransferService.scala | 1 + .../org/apache/spark/storage/BlockManager.scala | 5 +- .../netty/NettyBlockTransferSecuritySuite.scala | 4 +- .../network/protocol/ChunkFetchFailure.java | 12 +- .../apache/spark/network/protocol/Encoders.java | 93 +++ .../spark/network/protocol/RpcFailure.java | 12 +- .../spark/network/protocol/RpcRequest.java | 9 +- .../spark/network/protocol/RpcResponse.java | 9 +- .../apache/spark/network/util/JavaUtils.java| 27 - .../apache/spark/network/sasl/SaslMessage.java | 24 ++-- .../network/shuffle/ExecutorShuffleInfo.java| 64 --- .../shuffle/ExternalShuffleBlockHandler.java| 21 ++-- .../shuffle/ExternalShuffleBlockManager.java| 1 + .../network/shuffle/ExternalShuffleClient.java | 12 +- .../shuffle/ExternalShuffleMessages.java| 106 - .../network/shuffle/OneForOneBlockFetcher.java | 17 ++- .../network/shuffle/ShuffleStreamHandle.java| 60 -- .../shuffle/protocol/BlockTransferMessage.java | 76 + .../shuffle/protocol/ExecutorShuffleInfo.java | 88 +++ .../network/shuffle/protocol/OpenBlocks.java| 87 ++ .../shuffle/protocol/RegisterExecutor.java | 91 +++ .../network/shuffle/protocol/StreamHandle.java | 80 + .../network/shuffle/protocol/UploadBlock.java | 113 +++ .../shuffle/BlockTransferMessagesSuite.java | 44 .../ExternalShuffleBlockHandlerSuite.java | 29 ++--- .../ExternalShuffleIntegrationSuite.java| 1 + .../shuffle/ExternalShuffleSecuritySuite.java | 1 + .../shuffle/OneForOneBlockFetcherSuite.java | 18 +-- .../network/shuffle/ShuffleMessagesSuite.java | 51 - .../network/shuffle/TestShuffleDataContext.java | 2 + 33 files changed, 782 insertions(+), 426 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d6262fa0/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala -- diff --git a/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala b/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala index 210a581..dcbda5a 100644 --- a/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala +++ b/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala @@ -73,6 +73,7 @@ abstract class BlockTransferService extends ShuffleClient with Closeable with Lo def uploadBlock( hostname: String, port: Int, + execId: String, blockId: BlockId, blockData:
spark git commit: [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages
Repository: spark Updated Branches: refs/heads/master 3abdb1b24 - d4fa04e50 [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages This PR elimiantes the network package's usage of the Java serializer and replaces it with Encodable, which is a lightweight binary protocol. Each message is preceded by a type id, which will allow us to change messages (by only adding new ones), or to change the format entirely by switching to a special id (such as -1). This protocol has the advantage over Java that we can guarantee that messages will remain compatible across compiled versions and JVMs, though it does not provide a clean way to do schema migration. In the future, it may be good to use a more heavy-weight serialization format like protobuf, thrift, or avro, but these all add several dependencies which are unnecessary at the present time. Additionally this unifies the RPC messages of NettyBlockTransferService and ExternalShuffleClient. Author: Aaron Davidson aa...@databricks.com Closes #3146 from aarondav/free and squashes the following commits: ed1102a [Aaron Davidson] Remove some unused imports b8e2a49 [Aaron Davidson] Add appId to test 538f2a3 [Aaron Davidson] [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d4fa04e5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d4fa04e5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d4fa04e5 Branch: refs/heads/master Commit: d4fa04e50d299e9cad349b3781772956453a696b Parents: 3abdb1b Author: Aaron Davidson aa...@databricks.com Authored: Fri Nov 7 09:42:21 2014 -0800 Committer: Reynold Xin r...@databricks.com Committed: Fri Nov 7 09:42:21 2014 -0800 -- .../spark/network/BlockTransferService.scala| 4 +- .../network/netty/NettyBlockRpcServer.scala | 31 ++--- .../netty/NettyBlockTransferService.scala | 15 ++- .../network/nio/NioBlockTransferService.scala | 1 + .../org/apache/spark/storage/BlockManager.scala | 5 +- .../netty/NettyBlockTransferSecuritySuite.scala | 4 +- .../network/protocol/ChunkFetchFailure.java | 12 +- .../apache/spark/network/protocol/Encoders.java | 93 +++ .../spark/network/protocol/RpcFailure.java | 12 +- .../spark/network/protocol/RpcRequest.java | 9 +- .../spark/network/protocol/RpcResponse.java | 9 +- .../apache/spark/network/util/JavaUtils.java| 27 - .../apache/spark/network/sasl/SaslMessage.java | 24 ++-- .../network/shuffle/ExecutorShuffleInfo.java| 64 --- .../shuffle/ExternalShuffleBlockHandler.java| 21 ++-- .../shuffle/ExternalShuffleBlockManager.java| 1 + .../network/shuffle/ExternalShuffleClient.java | 12 +- .../shuffle/ExternalShuffleMessages.java| 106 - .../network/shuffle/OneForOneBlockFetcher.java | 17 ++- .../network/shuffle/ShuffleStreamHandle.java| 60 -- .../shuffle/protocol/BlockTransferMessage.java | 76 + .../shuffle/protocol/ExecutorShuffleInfo.java | 88 +++ .../network/shuffle/protocol/OpenBlocks.java| 87 ++ .../shuffle/protocol/RegisterExecutor.java | 91 +++ .../network/shuffle/protocol/StreamHandle.java | 80 + .../network/shuffle/protocol/UploadBlock.java | 113 +++ .../shuffle/BlockTransferMessagesSuite.java | 44 .../ExternalShuffleBlockHandlerSuite.java | 29 ++--- .../ExternalShuffleIntegrationSuite.java| 1 + .../shuffle/ExternalShuffleSecuritySuite.java | 1 + .../shuffle/OneForOneBlockFetcherSuite.java | 18 +-- .../network/shuffle/ShuffleMessagesSuite.java | 51 - .../network/shuffle/TestShuffleDataContext.java | 2 + 33 files changed, 782 insertions(+), 426 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d4fa04e5/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala -- diff --git a/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala b/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala index 210a581..dcbda5a 100644 --- a/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala +++ b/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala @@ -73,6 +73,7 @@ abstract class BlockTransferService extends ShuffleClient with Closeable with Lo def uploadBlock( hostname: String, port: Int, + execId: String, blockId: BlockId, blockData: ManagedBuffer, level: StorageLevel): Future[Unit] @@ -110,9 +111,10 @@ abstract class BlockTransferService extends
[06/20] spark git commit: Scala 2.11 support with repl and all build changes.
http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala -- diff --git a/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala new file mode 100644 index 000..f966f25 --- /dev/null +++ b/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala @@ -0,0 +1,326 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.repl + +import java.io._ +import java.net.URLClassLoader + +import scala.collection.mutable.ArrayBuffer +import scala.concurrent.Await +import scala.concurrent.duration._ +import scala.tools.nsc.interpreter.SparkILoop + +import com.google.common.io.Files +import org.scalatest.FunSuite +import org.apache.commons.lang3.StringEscapeUtils +import org.apache.spark.SparkContext +import org.apache.spark.util.Utils + + + +class ReplSuite extends FunSuite { + + def runInterpreter(master: String, input: String): String = { +val CONF_EXECUTOR_CLASSPATH = spark.executor.extraClassPath + +val in = new BufferedReader(new StringReader(input + \n)) +val out = new StringWriter() +val cl = getClass.getClassLoader +var paths = new ArrayBuffer[String] +if (cl.isInstanceOf[URLClassLoader]) { + val urlLoader = cl.asInstanceOf[URLClassLoader] + for (url - urlLoader.getURLs) { +if (url.getProtocol == file) { + paths += url.getFile +} + } +} +val classpath = paths.mkString(File.pathSeparator) + +val oldExecutorClasspath = System.getProperty(CONF_EXECUTOR_CLASSPATH) +System.setProperty(CONF_EXECUTOR_CLASSPATH, classpath) + +System.setProperty(spark.master, master) +val interp = { + new SparkILoop(in, new PrintWriter(out)) +} +org.apache.spark.repl.Main.interp = interp +Main.s.processArguments(List(-classpath, classpath), true) +Main.main(Array()) // call main +org.apache.spark.repl.Main.interp = null + +if (oldExecutorClasspath != null) { + System.setProperty(CONF_EXECUTOR_CLASSPATH, oldExecutorClasspath) +} else { + System.clearProperty(CONF_EXECUTOR_CLASSPATH) +} +return out.toString + } + + def assertContains(message: String, output: String) { +val isContain = output.contains(message) +assert(isContain, + Interpreter output did not contain ' + message + ':\n + output) + } + + def assertDoesNotContain(message: String, output: String) { +val isContain = output.contains(message) +assert(!isContain, + Interpreter output contained ' + message + ':\n + output) + } + + test(propagation of local properties) { +// A mock ILoop that doesn't install the SIGINT handler. +class ILoop(out: PrintWriter) extends SparkILoop(None, out) { + settings = new scala.tools.nsc.Settings + settings.usejavacp.value = true + org.apache.spark.repl.Main.interp = this + override def createInterpreter() { +intp = new SparkILoopInterpreter +intp.setContextClassLoader() + } +} + +val out = new StringWriter() +Main.interp = new ILoop(new PrintWriter(out)) +Main.sparkContext = new SparkContext(local, repl-test) +Main.interp.createInterpreter() + +Main.sparkContext.setLocalProperty(someKey, someValue) + +// Make sure the value we set in the caller to interpret is propagated in the thread that +// interprets the command. + Main.interp.interpret(org.apache.spark.repl.Main.sparkContext.getLocalProperty(\someKey\)) +assert(out.toString.contains(someValue)) + +Main.sparkContext.stop() +System.clearProperty(spark.driver.port) + } + + test(simple foreach with accumulator) { +val output = runInterpreter(local, + +|val accum = sc.accumulator(0) +|sc.parallelize(1 to 10).foreach(x = accum += x) +|accum.value + .stripMargin) +assertDoesNotContain(error:, output) +assertDoesNotContain(Exception, output) +assertContains(res1: Int = 55, output) + } + + test(external vars) { +val output = runInterpreter(local, + +|var v = 7
[16/20] spark git commit: small correction
small correction Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/899fc3c0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/899fc3c0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/899fc3c0 Branch: refs/heads/scala-2.11-prashant Commit: 899fc3c0d295111eaeacbaad18aac413e9c2d7ff Parents: d7c35e2 Author: Prashant Sharma prashan...@imaginea.com Authored: Wed Nov 5 18:20:03 2014 +0530 Committer: Prashant Sharma prashan...@imaginea.com Committed: Fri Nov 7 11:22:47 2014 +0530 -- core/pom.xml | 1 - 1 file changed, 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/899fc3c0/core/pom.xml -- diff --git a/core/pom.xml b/core/pom.xml index 37970c9..492eddd 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -329,7 +329,6 @@ plugin groupIdorg.scalatest/groupId artifactIdscalatest-maven-plugin/artifactId -version1.0/version executions execution idtest/id - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Git Push Summary
Repository: spark Updated Branches: refs/heads/scala-2.11-prashant [deleted] fd849b0b4 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[17/20] spark git commit: Run against scala 2.11 on jenkins.
Run against scala 2.11 on jenkins. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d7c35e2f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d7c35e2f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d7c35e2f Branch: refs/heads/scala-2.11-prashant Commit: d7c35e2ffc2379ec4ce61a48f7048d3c8cc7c29c Parents: ed4f646 Author: Prashant Sharma prashan...@imaginea.com Authored: Wed Nov 5 18:18:21 2014 +0530 Committer: Prashant Sharma prashan...@imaginea.com Committed: Fri Nov 7 11:22:47 2014 +0530 -- dev/run-tests | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d7c35e2f/dev/run-tests -- diff --git a/dev/run-tests b/dev/run-tests index de607e4..74b0ba5 100755 --- a/dev/run-tests +++ b/dev/run-tests @@ -53,7 +53,7 @@ function handle_error () { fi } -export SBT_MAVEN_PROFILES_ARGS=$SBT_MAVEN_PROFILES_ARGS -Pkinesis-asl +export SBT_MAVEN_PROFILES_ARGS=$SBT_MAVEN_PROFILES_ARGS -Pkinesis-asl -Pscala-2.11 # Determine Java path and version. { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[15/20] spark git commit: Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eeed1a68 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eeed1a68 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eeed1a68 Branch: refs/heads/scala-2.11-prashant Commit: eeed1a68f01538abfcc1007f9e25f8e34b76e554 Parents: c696f39 Author: Prashant Sharma prashan...@imaginea.com Authored: Wed Nov 5 15:57:30 2014 +0530 Committer: Prashant Sharma prashan...@imaginea.com Committed: Fri Nov 7 11:22:47 2014 +0530 -- core/pom.xml| 4 ++-- network/shuffle/pom.xml | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/eeed1a68/core/pom.xml -- diff --git a/core/pom.xml b/core/pom.xml index afa8c8c..d71f265 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -74,12 +74,12 @@ /dependency dependency groupIdorg.apache.spark/groupId - artifactIdspark-network-common_2.10/artifactId + artifactIdspark-network-common_${scala.binary.version}/artifactId version${project.version}/version /dependency dependency groupIdorg.apache.spark/groupId - artifactIdspark-network-shuffle_2.10/artifactId + artifactIdspark-network-shuffle_${scala.binary.version}/artifactId version${project.version}/version /dependency dependency http://git-wip-us.apache.org/repos/asf/spark/blob/eeed1a68/network/shuffle/pom.xml -- diff --git a/network/shuffle/pom.xml b/network/shuffle/pom.xml index fe5681d..662212f 100644 --- a/network/shuffle/pom.xml +++ b/network/shuffle/pom.xml @@ -39,7 +39,7 @@ !-- Core dependencies -- dependency groupIdorg.apache.spark/groupId - artifactIdspark-network-common_2.10/artifactId + artifactIdspark-network-common_${scala.binary.version}/artifactId version${project.version}/version /dependency dependency @@ -58,7 +58,7 @@ !-- Test dependencies -- dependency groupIdorg.apache.spark/groupId - artifactIdspark-network-common_2.10/artifactId + artifactIdspark-network-common_${scala.binary.version}/artifactId version${project.version}/version typetest-jar/type scopetest/scope - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[10/20] spark git commit: Scala 2.11 support with repl and all build changes.
http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala -- diff --git a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala new file mode 100644 index 000..e56b74e --- /dev/null +++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala @@ -0,0 +1,1091 @@ +// scalastyle:off + +/* NSC -- new Scala compiler + * Copyright 2005-2013 LAMP/EPFL + * @author Alexander Spoon + */ + +package org.apache.spark.repl + + +import java.net.URL + +import scala.reflect.io.AbstractFile +import scala.tools.nsc._ +import scala.tools.nsc.backend.JavaPlatform +import scala.tools.nsc.interpreter._ + +import scala.tools.nsc.interpreter.{Results = IR} +import Predef.{println = _, _} +import java.io.{BufferedReader, FileReader} +import java.net.URI +import java.util.concurrent.locks.ReentrantLock +import scala.sys.process.Process +import scala.tools.nsc.interpreter.session._ +import scala.util.Properties.{jdkHome, javaVersion} +import scala.tools.util.{Javap} +import scala.annotation.tailrec +import scala.collection.mutable.ListBuffer +import scala.concurrent.ops +import scala.tools.nsc.util._ +import scala.tools.nsc.interpreter._ +import scala.tools.nsc.io.{File, Directory} +import scala.reflect.NameTransformer._ +import scala.tools.nsc.util.ScalaClassLoader._ +import scala.tools.util._ +import scala.language.{implicitConversions, existentials, postfixOps} +import scala.reflect.{ClassTag, classTag} +import scala.tools.reflect.StdRuntimeTags._ + +import java.lang.{Class = jClass} +import scala.reflect.api.{Mirror, TypeCreator, Universe = ApiUniverse} + +import org.apache.spark.Logging +import org.apache.spark.SparkConf +import org.apache.spark.SparkContext +import org.apache.spark.util.Utils + +/** The Scala interactive shell. It provides a read-eval-print loop + * around the Interpreter class. + * After instantiation, clients should call the main() method. + * + * If no in0 is specified, then input will come from the console, and + * the class will attempt to provide input editing feature such as + * input history. + * + * @author Moez A. Abdel-Gawad + * @author Lex Spoon + * @version 1.2 + */ +class SparkILoop(in0: Option[BufferedReader], protected val out: JPrintWriter, + val master: Option[String]) +extends AnyRef + with LoopCommands + with SparkILoopInit + with Logging +{ + def this(in0: BufferedReader, out: JPrintWriter, master: String) = this(Some(in0), out, Some(master)) + def this(in0: BufferedReader, out: JPrintWriter) = this(Some(in0), out, None) + def this() = this(None, new JPrintWriter(Console.out, true), None) + + var in: InteractiveReader = _ // the input stream from which commands come + var settings: Settings = _ + var intp: SparkIMain = _ + + @deprecated(Use `intp` instead., 2.9.0) def interpreter = intp + @deprecated(Use `intp` instead., 2.9.0) def interpreter_= (i: SparkIMain): Unit = intp = i + + /** Having inherited the difficult var-ness of the repl instance, + * I'm trying to work around it by moving operations into a class from + * which it will appear a stable prefix. + */ + private def onIntp[T](f: SparkIMain = T): T = f(intp) + + class IMainOps[T : SparkIMain](val intp: T) { +import intp._ +import global._ + +def printAfterTyper(msg: = String) = + intp.reporter printMessage afterTyper(msg) + +/** Strip NullaryMethodType artifacts. */ +private def replInfo(sym: Symbol) = { + sym.info match { +case NullaryMethodType(restpe) if sym.isAccessor = restpe +case info = info + } +} +def echoTypeStructure(sym: Symbol) = + printAfterTyper( + deconstruct.show(replInfo(sym))) + +def echoTypeSignature(sym: Symbol, verbose: Boolean) = { + if (verbose) SparkILoop.this.echo(// Type signature) + printAfterTyper( + replInfo(sym)) + + if (verbose) { +SparkILoop.this.echo(\n// Internal Type structure) +echoTypeStructure(sym) + } +} + } + implicit def stabilizeIMain(intp: SparkIMain) = new IMainOps[intp.type](intp) + + /** TODO - + * -n normalize + * -l label with case class parameter names + * -c complete - leave nothing out + */ + private def typeCommandInternal(expr: String, verbose: Boolean): Result = { +onIntp { intp = + val sym = intp.symbolOfLine(expr) + if (sym.exists) intp.echoTypeSignature(sym, verbose) + else +} + } + + var sparkContext: SparkContext = _ + + override def echoCommandMessage(msg: String) { +intp.reporter printMessage msg + } + + // def isAsync = !settings.Yreplsync.value + def isAsync = false + // lazy val power = new Power(intp, new
[05/20] spark git commit: Scala 2.11 support with repl and all build changes.
http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/repl/src/main/scala/org/apache/spark/repl/SparkIMain.scala -- diff --git a/repl/src/main/scala/org/apache/spark/repl/SparkIMain.scala b/repl/src/main/scala/org/apache/spark/repl/SparkIMain.scala deleted file mode 100644 index 646c68e..000 --- a/repl/src/main/scala/org/apache/spark/repl/SparkIMain.scala +++ /dev/null @@ -1,1445 +0,0 @@ -// scalastyle:off - -/* NSC -- new Scala compiler - * Copyright 2005-2013 LAMP/EPFL - * @author Martin Odersky - */ - -package org.apache.spark.repl - -import java.io.File - -import scala.tools.nsc._ -import scala.tools.nsc.backend.JavaPlatform -import scala.tools.nsc.interpreter._ - -import Predef.{ println = _, _ } -import scala.tools.nsc.util.{MergedClassPath, stringFromWriter, ScalaClassLoader, stackTraceString} -import scala.reflect.internal.util._ -import java.net.URL -import scala.sys.BooleanProp -import io.{AbstractFile, PlainFile, VirtualDirectory} - -import reporters._ -import symtab.Flags -import scala.reflect.internal.Names -import scala.tools.util.PathResolver -import ScalaClassLoader.URLClassLoader -import scala.tools.nsc.util.Exceptional.unwrap -import scala.collection.{ mutable, immutable } -import scala.util.control.Exception.{ ultimately } -import SparkIMain._ -import java.util.concurrent.Future -import typechecker.Analyzer -import scala.language.implicitConversions -import scala.reflect.runtime.{ universe = ru } -import scala.reflect.{ ClassTag, classTag } -import scala.tools.reflect.StdRuntimeTags._ -import scala.util.control.ControlThrowable - -import org.apache.spark.{Logging, HttpServer, SecurityManager, SparkConf} -import org.apache.spark.util.Utils - -// /** directory to save .class files to */ -// private class ReplVirtualDirectory(out: JPrintWriter) extends VirtualDirectory(((memory)), None) { -// private def pp(root: AbstractFile, indentLevel: Int) { -// val spaces = * indentLevel -// out.println(spaces + root.name) -// if (root.isDirectory) -// root.toList sortBy (_.name) foreach (x = pp(x, indentLevel + 1)) -// } -// // print the contents hierarchically -// def show() = pp(this, 0) -// } - - /** An interpreter for Scala code. - * - * The main public entry points are compile(), interpret(), and bind(). - * The compile() method loads a complete Scala file. The interpret() method - * executes one line of Scala code at the request of the user. The bind() - * method binds an object to a variable that can then be used by later - * interpreted code. - * - * The overall approach is based on compiling the requested code and then - * using a Java classloader and Java reflection to run the code - * and access its results. - * - * In more detail, a single compiler instance is used - * to accumulate all successfully compiled or interpreted Scala code. To - * interpret a line of code, the compiler generates a fresh object that - * includes the line of code and which has public member(s) to export - * all variables defined by that code. To extract the result of an - * interpreted line to show the user, a second result object is created - * which imports the variables exported by the above object and then - * exports members called $eval and $print. To accomodate user expressions - * that read from variables or methods defined in previous statements, import - * statements are used. - * - * This interpreter shares the strengths and weaknesses of using the - * full compiler-to-Java. The main strength is that interpreted code - * behaves exactly as does compiled code, including running at full speed. - * The main weakness is that redefining classes and methods is not handled - * properly, because rebinding at the Java level is technically difficult. - * - * @author Moez A. Abdel-Gawad - * @author Lex Spoon - */ - class SparkIMain( - initialSettings: Settings, - val out: JPrintWriter, - propagateExceptions: Boolean = false) -extends SparkImports with Logging { imain = - -val conf = new SparkConf() - -val SPARK_DEBUG_REPL: Boolean = (System.getenv(SPARK_DEBUG_REPL) == 1) -/** Local directory to save .class files too */ -lazy val outputDir = { - val tmp = System.getProperty(java.io.tmpdir) - val rootDir = conf.get(spark.repl.classdir, tmp) - Utils.createTempDir(rootDir) -} -if (SPARK_DEBUG_REPL) { - echo(Output directory: + outputDir) -} - -val virtualDirectory = new PlainFile(outputDir) // directory for classfiles -/** Jetty server that will serve our classes to worker nodes */ -val classServerPort = conf.getInt(spark.replClassServer.port, 0) -val classServer = new HttpServer(outputDir, new SecurityManager(conf), classServerPort,
[01/20] spark git commit: Changed scripts to ignore target.
Repository: spark Updated Branches: refs/heads/scala-2.11-prashant [created] fd849b0b4 Changed scripts to ignore target. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d3065ecd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d3065ecd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d3065ecd Branch: refs/heads/scala-2.11-prashant Commit: d3065ecda919a2fc929d5cbd838b8ed3210c61d4 Parents: 899fc3c Author: Prashant Sharma prashan...@imaginea.com Authored: Thu Nov 6 08:30:51 2014 +0530 Committer: Prashant Sharma prashan...@imaginea.com Committed: Fri Nov 7 11:22:47 2014 +0530 -- dev/change-version-to-2.10.sh | 2 +- dev/change-version-to-2.11.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d3065ecd/dev/change-version-to-2.10.sh -- diff --git a/dev/change-version-to-2.10.sh b/dev/change-version-to-2.10.sh index ca48e6f..152df60 100755 --- a/dev/change-version-to-2.10.sh +++ b/dev/change-version-to-2.10.sh @@ -17,4 +17,4 @@ # limitations under the License. # -find -name 'pom.xml' -exec sed -i 's|\(artifactId.*\)_2.11|\1_2.10|g' {} \; +find \( -name 'pom.xml' -a -not -path 'target' \) -exec sed -i 's|\(artifactId.*\)_2.11|\1_2.10|g' {} \; http://git-wip-us.apache.org/repos/asf/spark/blob/d3065ecd/dev/change-version-to-2.11.sh -- diff --git a/dev/change-version-to-2.11.sh b/dev/change-version-to-2.11.sh index 07056b1..b52d5f7 100755 --- a/dev/change-version-to-2.11.sh +++ b/dev/change-version-to-2.11.sh @@ -17,4 +17,4 @@ # limitations under the License. # -find -name 'pom.xml' -exec sed -i 's|\(artifactId.*\)_2.10|\1_2.11|g' {} \; +find \( -name 'pom.xml' -a -not -path 'target' \) -exec sed -i 's|\(artifactId.*\)_2.10|\1_2.11|g' {} \; - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[12/20] spark git commit: Code review
Code review Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1b836d80 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1b836d80 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1b836d80 Branch: refs/heads/scala-2.11-prashant Commit: 1b836d80a0d429a1518da41c034f2a511a6545a2 Parents: 4af9de7 Author: Prashant Sharma prashan...@imaginea.com Authored: Fri Oct 24 09:47:48 2014 +0530 Committer: Prashant Sharma prashan...@imaginea.com Committed: Fri Nov 7 11:22:47 2014 +0530 -- bin/compute-classpath.sh | 14 -- conf/spark-env.sh.template | 3 --- core/pom.xml | 4 examples/pom.xml | 2 ++ project/SparkBuild.scala | 2 +- 5 files changed, 15 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1b836d80/bin/compute-classpath.sh -- diff --git a/bin/compute-classpath.sh b/bin/compute-classpath.sh index 993d260..86dd4b2 100755 --- a/bin/compute-classpath.sh +++ b/bin/compute-classpath.sh @@ -20,8 +20,6 @@ # This script computes Spark's classpath and prints it to stdout; it's used by both the run # script and the ExecutorRunner in standalone cluster mode. -SCALA_VERSION=${SCALA_VERSION:-2.10} - # Figure out where Spark is installed FWDIR=$(cd `dirname $0`/..; pwd) @@ -36,6 +34,18 @@ else CLASSPATH=$CLASSPATH:$FWDIR/conf fi +if [ -z $SCALA_VERSION ]; then + +ASSEMBLY_DIR2=$FWDIR/assembly/target/scala-2.11 +# if scala-2.11 directory for assembly exists, we use that. Otherwise we default to +# scala 2.10. +if [ -d $ASSEMBLY_DIR2 ]; then +SCALA_VERSION=2.11 +else +SCALA_VERSION=2.10 +fi +fi + ASSEMBLY_DIR=$FWDIR/assembly/target/scala-$SCALA_VERSION if [ -n $JAVA_HOME ]; then http://git-wip-us.apache.org/repos/asf/spark/blob/1b836d80/conf/spark-env.sh.template -- diff --git a/conf/spark-env.sh.template b/conf/spark-env.sh.template index 6a5622e..f8ffbf6 100755 --- a/conf/spark-env.sh.template +++ b/conf/spark-env.sh.template @@ -3,9 +3,6 @@ # This file is sourced when running various Spark programs. # Copy it as spark-env.sh and edit that to configure Spark for your site. -# Uncomment this if you plan to use scala 2.11 -# SCALA_VERSION=2.11 - # Options read when launching programs locally with # ./bin/run-example or ./bin/spark-submit # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files http://git-wip-us.apache.org/repos/asf/spark/blob/1b836d80/core/pom.xml -- diff --git a/core/pom.xml b/core/pom.xml index 624aa96..afa8c8c 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -297,10 +297,6 @@ scopetest/scope /dependency dependency - groupIdcom.twitter/groupId - artifactIdchill-java/artifactId -/dependency -dependency groupIdasm/groupId artifactIdasm/artifactId scopetest/scope http://git-wip-us.apache.org/repos/asf/spark/blob/1b836d80/examples/pom.xml -- diff --git a/examples/pom.xml b/examples/pom.xml index e80c637..0cc15e5 100644 --- a/examples/pom.xml +++ b/examples/pom.xml @@ -282,6 +282,8 @@ /dependencies /profile profile + !-- We add source directories specific to Scala 2.10 and 2.11 since some examples + work only in one and not the other -- idscala-2.10/id activation activeByDefaulttrue/activeByDefault http://git-wip-us.apache.org/repos/asf/spark/blob/1b836d80/project/SparkBuild.scala -- diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 349cc27..0d8adcb 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -100,7 +100,7 @@ object SparkBuild extends PomBuild { conjunction with environment variable.) v.split((\\s+|,)).filterNot(_.isEmpty).map(_.trim.replaceAll(-P, )).toSeq } -if(profiles.exists(_.contains(scala))) { +if(profiles.exists(_.contains(scala-))) { profiles } else { println(Enabled default scala profile) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[02/20] spark git commit: MAven equivalent of setting spark.executor.extraClasspath during tests.
MAven equivalent of setting spark.executor.extraClasspath during tests. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed4f6463 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed4f6463 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed4f6463 Branch: refs/heads/scala-2.11-prashant Commit: ed4f646313b8f7224775de7072aaf4ee6c32d243 Parents: 4bcf66f Author: Prashant Sharma prashan...@imaginea.com Authored: Wed Nov 5 18:17:34 2014 +0530 Committer: Prashant Sharma prashan...@imaginea.com Committed: Fri Nov 7 11:22:47 2014 +0530 -- core/pom.xml | 17 ++--- examples/pom.xml | 10 ++ pom.xml | 46 +- 3 files changed, 65 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ed4f6463/core/pom.xml -- diff --git a/core/pom.xml b/core/pom.xml index d71f265..37970c9 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -329,14 +329,17 @@ plugin groupIdorg.scalatest/groupId artifactIdscalatest-maven-plugin/artifactId -configuration - environmentVariables -SPARK_HOME${basedir}/../SPARK_HOME -SPARK_TESTING1/SPARK_TESTING -SPARK_CLASSPATH${spark.classpath}/SPARK_CLASSPATH - /environmentVariables -/configuration +version1.0/version +executions + execution +idtest/id +goals + goaltest/goal +/goals + /execution +/executions /plugin + !-- Unzip py4j so we can include its files in the jar -- plugin groupIdorg.apache.maven.plugins/groupId http://git-wip-us.apache.org/repos/asf/spark/blob/ed4f6463/examples/pom.xml -- diff --git a/examples/pom.xml b/examples/pom.xml index 5f9d0b5..027745a 100644 --- a/examples/pom.xml +++ b/examples/pom.xml @@ -185,6 +185,16 @@ testOutputDirectorytarget/scala-${scala.binary.version}/test-classes/testOutputDirectory plugins plugin +groupIdorg.codehaus.gmaven/groupId +artifactIdgmaven-plugin/artifactId +version1.4/version +executions + execution +phasenone/phase + /execution +/executions + /plugin + plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-deploy-plugin/artifactId configuration http://git-wip-us.apache.org/repos/asf/spark/blob/ed4f6463/pom.xml -- diff --git a/pom.xml b/pom.xml index 1bd4970..b239356 100644 --- a/pom.xml +++ b/pom.xml @@ -142,9 +142,13 @@ aws.kinesis.client.version1.1.0/aws.kinesis.client.version commons.httpclient.version4.2.6/commons.httpclient.version commons.math3.version3.1.1/commons.math3.version - + test_classpath_file${project.build.directory}/spark-test-classpath.txt/test_classpath_file PermGen64m/PermGen MaxPermGen512m/MaxPermGen +scala.version2.10.4/scala.version +scala.binary.version2.10/scala.binary.version +jline.version${scala.version}/jline.version +jline.groupidorg.scala-lang/jline.groupid /properties repositories @@ -961,6 +965,7 @@ spark.test.home${session.executionRootDirectory}/spark.test.home spark.testing1/spark.testing spark.ui.enabledfalse/spark.ui.enabled + spark.executor.extraClassPath${test_classpath}/spark.executor.extraClassPath /systemProperties /configuration executions @@ -1022,6 +1027,45 @@ /pluginManagement plugins + !-- This plugin dumps the test classpath into a file -- + plugin +groupIdorg.apache.maven.plugins/groupId +artifactIdmaven-dependency-plugin/artifactId +version2.9/version +executions + execution +phasetest-compile/phase +goals + goalbuild-classpath/goal +/goals +configuration + includeScopetest/includeScope + outputFile${test_classpath_file}/outputFile +/configuration + /execution +/executions + /plugin + + !-- This plugin reads a file into maven property. And it lets us write groovy !! -- + plugin +groupIdorg.codehaus.gmaven/groupId +artifactIdgmaven-plugin/artifactId +version1.4/version +executions + execution +phaseprocess-test-classes/phase +goals + goalexecute/goal +/goals +configuration + source +
[08/20] spark git commit: Scala 2.11 support with repl and all build changes.
http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkMemberHandlers.scala -- diff --git a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkMemberHandlers.scala b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkMemberHandlers.scala new file mode 100644 index 000..13cd2b7 --- /dev/null +++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkMemberHandlers.scala @@ -0,0 +1,232 @@ +// scalastyle:off + +/* NSC -- new Scala compiler + * Copyright 2005-2013 LAMP/EPFL + * @author Martin Odersky + */ + +package org.apache.spark.repl + +import scala.tools.nsc._ +import scala.tools.nsc.interpreter._ + +import scala.collection.{ mutable, immutable } +import scala.PartialFunction.cond +import scala.reflect.internal.Chars +import scala.reflect.internal.Flags._ +import scala.language.implicitConversions + +trait SparkMemberHandlers { + val intp: SparkIMain + + import intp.{ Request, global, naming } + import global._ + import naming._ + + private def codegenln(leadingPlus: Boolean, xs: String*): String = codegen(leadingPlus, (xs ++ Array(\n)): _*) + private def codegenln(xs: String*): String = codegenln(true, xs: _*) + + private def codegen(xs: String*): String = codegen(true, xs: _*) + private def codegen(leadingPlus: Boolean, xs: String*): String = { +val front = if (leadingPlus) + else +front + (xs map string2codeQuoted mkString + ) + } + private implicit def name2string(name: Name) = name.toString + + /** A traverser that finds all mentioned identifiers, i.e. things + * that need to be imported. It might return extra names. + */ + private class ImportVarsTraverser extends Traverser { +val importVars = new mutable.HashSet[Name]() + +override def traverse(ast: Tree) = ast match { + case Ident(name) = +// XXX this is obviously inadequate but it's going to require some effort +// to get right. +if (name.toString startsWith x$) () +else importVars += name + case _= super.traverse(ast) +} + } + private object ImportVarsTraverser { +def apply(member: Tree) = { + val ivt = new ImportVarsTraverser() + ivt traverse member + ivt.importVars.toList +} + } + + def chooseHandler(member: Tree): MemberHandler = member match { +case member: DefDef= new DefHandler(member) +case member: ValDef= new ValHandler(member) +case member: Assign= new AssignHandler(member) +case member: ModuleDef = new ModuleHandler(member) +case member: ClassDef = new ClassHandler(member) +case member: TypeDef = new TypeAliasHandler(member) +case member: Import= new ImportHandler(member) +case DocDef(_, documented) = chooseHandler(documented) +case member= new GenericHandler(member) + } + + sealed abstract class MemberDefHandler(override val member: MemberDef) extends MemberHandler(member) { +def symbol = if (member.symbol eq null) NoSymbol else member.symbol +def name: Name = member.name +def mods: Modifiers = member.mods +def keyword = member.keyword +def prettyName = name.decode + +override def definesImplicit = member.mods.isImplicit +override def definesTerm: Option[TermName] = Some(name.toTermName) filter (_ = name.isTermName) +override def definesType: Option[TypeName] = Some(name.toTypeName) filter (_ = name.isTypeName) +override def definedSymbols = if (symbol eq NoSymbol) Nil else List(symbol) + } + + /** Class to handle one member among all the members included + * in a single interpreter request. + */ + sealed abstract class MemberHandler(val member: Tree) { +def definesImplicit = false +def definesValue= false +def isLegalTopLevel = false + +def definesTerm = Option.empty[TermName] +def definesType = Option.empty[TypeName] + +lazy val referencedNames = ImportVarsTraverser(member) +def importedNames= List[Name]() +def definedNames = definesTerm.toList ++ definesType.toList +def definedOrImported= definedNames ++ importedNames +def definedSymbols = List[Symbol]() + +def extraCodeToEvaluate(req: Request): String = +def resultExtractionCode(req: Request): String = + +private def shortName = this.getClass.toString split '.' last +override def toString = shortName + referencedNames.mkString( (refs: , , , )) + } + + class GenericHandler(member: Tree) extends MemberHandler(member) + + class ValHandler(member: ValDef) extends MemberDefHandler(member) { +val maxStringElements = 1000 // no need to mkString billions of elements +override def definesValue = true + +override def resultExtractionCode(req: Request): String = { + val isInternal = isUserVarName(name) req.lookupTypeOf(name) ==
[11/20] spark git commit: Scala 2.11 support with repl and all build changes.
Scala 2.11 support with repl and all build changes. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4af9de7d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4af9de7d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4af9de7d Branch: refs/heads/scala-2.11-prashant Commit: 4af9de7dc72a809e10f4a287b509dec4ca12ae53 Parents: 48a19a6 Author: Prashant Sharma prashan...@imaginea.com Authored: Mon Oct 20 17:43:38 2014 +0530 Committer: Prashant Sharma prashan...@imaginea.com Committed: Fri Nov 7 11:22:47 2014 +0530 -- .rat-excludes |1 + assembly/pom.xml|8 +- bin/compute-classpath.sh|8 +- conf/spark-env.sh.template |3 + core/pom.xml| 41 +- dev/change-version-to-2.10.sh | 20 + dev/change-version-to-2.11.sh | 20 + examples/pom.xml| 149 +- .../examples/streaming/JavaKafkaWordCount.java | 113 ++ .../examples/streaming/KafkaWordCount.scala | 102 ++ .../examples/streaming/TwitterAlgebirdCMS.scala | 114 ++ .../examples/streaming/TwitterAlgebirdHLL.scala | 92 ++ .../examples/streaming/JavaKafkaWordCount.java | 113 -- .../examples/streaming/KafkaWordCount.scala | 102 -- .../examples/streaming/TwitterAlgebirdCMS.scala | 114 -- .../examples/streaming/TwitterAlgebirdHLL.scala | 92 -- external/mqtt/pom.xml |5 - pom.xml | 114 +- project/SparkBuild.scala| 12 +- project/project/SparkPluginBuild.scala |2 +- repl/pom.xml| 90 +- .../main/scala/org/apache/spark/repl/Main.scala | 33 + .../apache/spark/repl/SparkCommandLine.scala| 37 + .../org/apache/spark/repl/SparkExprTyper.scala | 114 ++ .../org/apache/spark/repl/SparkHelper.scala | 22 + .../org/apache/spark/repl/SparkILoop.scala | 1091 + .../org/apache/spark/repl/SparkILoopInit.scala | 147 ++ .../org/apache/spark/repl/SparkIMain.scala | 1445 ++ .../org/apache/spark/repl/SparkImports.scala| 238 +++ .../spark/repl/SparkJLineCompletion.scala | 377 + .../apache/spark/repl/SparkJLineReader.scala| 90 ++ .../apache/spark/repl/SparkMemberHandlers.scala | 232 +++ .../apache/spark/repl/SparkRunnerSettings.scala | 32 + .../scala/org/apache/spark/repl/ReplSuite.scala | 318 .../main/scala/org/apache/spark/repl/Main.scala | 85 ++ .../org/apache/spark/repl/SparkExprTyper.scala | 86 ++ .../org/apache/spark/repl/SparkILoop.scala | 966 .../org/apache/spark/repl/SparkIMain.scala | 1319 .../org/apache/spark/repl/SparkImports.scala| 201 +++ .../spark/repl/SparkJLineCompletion.scala | 350 + .../apache/spark/repl/SparkMemberHandlers.scala | 221 +++ .../apache/spark/repl/SparkReplReporter.scala | 53 + .../scala/org/apache/spark/repl/ReplSuite.scala | 326 .../main/scala/org/apache/spark/repl/Main.scala | 33 - .../apache/spark/repl/SparkCommandLine.scala| 37 - .../org/apache/spark/repl/SparkExprTyper.scala | 114 -- .../org/apache/spark/repl/SparkHelper.scala | 22 - .../org/apache/spark/repl/SparkILoop.scala | 1091 - .../org/apache/spark/repl/SparkILoopInit.scala | 147 -- .../org/apache/spark/repl/SparkIMain.scala | 1445 -- .../org/apache/spark/repl/SparkImports.scala| 238 --- .../spark/repl/SparkJLineCompletion.scala | 377 - .../apache/spark/repl/SparkJLineReader.scala| 90 -- .../apache/spark/repl/SparkMemberHandlers.scala | 232 --- .../apache/spark/repl/SparkRunnerSettings.scala | 32 - .../scala/org/apache/spark/repl/ReplSuite.scala | 318 sql/catalyst/pom.xml| 29 +- 57 files changed, 8602 insertions(+), 4701 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/.rat-excludes -- diff --git a/.rat-excludes b/.rat-excludes index 20e3372..d8bee1f 100644 --- a/.rat-excludes +++ b/.rat-excludes @@ -44,6 +44,7 @@ SparkImports.scala SparkJLineCompletion.scala SparkJLineReader.scala SparkMemberHandlers.scala +SparkReplReporter.scala sbt sbt-launch-lib.bash plugins.sbt http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 31a01e4..e592220 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -66,22 +66,22 @@ /dependency
[18/20] spark git commit: Fixed Python Runner suite. null check should be first case in scala 2.11.
Fixed Python Runner suite. null check should be first case in scala 2.11. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/29cc4d9f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/29cc4d9f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/29cc4d9f Branch: refs/heads/scala-2.11-prashant Commit: 29cc4d9f8d441a3fee48856433e5cbc131f1e3a1 Parents: eeed1a6 Author: Prashant Sharma prashan...@imaginea.com Authored: Wed Nov 5 15:57:48 2014 +0530 Committer: Prashant Sharma prashan...@imaginea.com Committed: Fri Nov 7 11:22:47 2014 +0530 -- core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/29cc4d9f/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala b/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala index af94b05..039c871 100644 --- a/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala +++ b/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala @@ -87,8 +87,8 @@ object PythonRunner { // Strip the URI scheme from the path formattedPath = new URI(formattedPath).getScheme match { -case Utils.windowsDrive(d) if windows = formattedPath case null = formattedPath +case Utils.windowsDrive(d) if windows = formattedPath case _ = new URI(formattedPath).getPath } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[14/20] spark git commit: Print an error if build for 2.10 and 2.11 is spotted.
Print an error if build for 2.10 and 2.11 is spotted. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c696f394 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c696f394 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c696f394 Branch: refs/heads/scala-2.11-prashant Commit: c696f394ce45d6899d5facabafcfa59488a728da Parents: 99a0df1 Author: Prashant Sharma prashan...@imaginea.com Authored: Mon Oct 27 13:59:13 2014 +0530 Committer: Prashant Sharma prashan...@imaginea.com Committed: Fri Nov 7 11:22:47 2014 +0530 -- bin/compute-classpath.sh | 10 -- examples/pom.xml | 1 - 2 files changed, 8 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c696f394/bin/compute-classpath.sh -- diff --git a/bin/compute-classpath.sh b/bin/compute-classpath.sh index dea1592..108c9af 100755 --- a/bin/compute-classpath.sh +++ b/bin/compute-classpath.sh @@ -37,8 +37,14 @@ fi if [ -z $SPARK_SCALA_VERSION ]; then ASSEMBLY_DIR2=$FWDIR/assembly/target/scala-2.11 -# if scala-2.11 directory for assembly exists, we use that. Otherwise we default to -# scala 2.10. +ASSEMBLY_DIR1=$FWDIR/assembly/target/scala-2.10 + +if [[ -d $ASSEMBLY_DIR2 -d $ASSEMBLY_DIR1 ]]; then +echo -e Presence of build for both scala versions(SCALA 2.10 and SCALA 2.11) detected. 12 +echo -e 'Either clean one of them or, export SPARK_SCALA_VERSION=2.11 in spark-env.sh.' 12 +exit 1 +fi + if [ -d $ASSEMBLY_DIR2 ]; then SPARK_SCALA_VERSION=2.11 else http://git-wip-us.apache.org/repos/asf/spark/blob/c696f394/examples/pom.xml -- diff --git a/examples/pom.xml b/examples/pom.xml index 0cc15e5..5f9d0b5 100644 --- a/examples/pom.xml +++ b/examples/pom.xml @@ -129,7 +129,6 @@ artifactIdjetty-server/artifactId /dependency dependency -dependency groupIdorg.apache.commons/groupId artifactIdcommons-math3/artifactId /dependency - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[09/20] spark git commit: Scala 2.11 support with repl and all build changes.
http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala -- diff --git a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala new file mode 100644 index 000..646c68e --- /dev/null +++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala @@ -0,0 +1,1445 @@ +// scalastyle:off + +/* NSC -- new Scala compiler + * Copyright 2005-2013 LAMP/EPFL + * @author Martin Odersky + */ + +package org.apache.spark.repl + +import java.io.File + +import scala.tools.nsc._ +import scala.tools.nsc.backend.JavaPlatform +import scala.tools.nsc.interpreter._ + +import Predef.{ println = _, _ } +import scala.tools.nsc.util.{MergedClassPath, stringFromWriter, ScalaClassLoader, stackTraceString} +import scala.reflect.internal.util._ +import java.net.URL +import scala.sys.BooleanProp +import io.{AbstractFile, PlainFile, VirtualDirectory} + +import reporters._ +import symtab.Flags +import scala.reflect.internal.Names +import scala.tools.util.PathResolver +import ScalaClassLoader.URLClassLoader +import scala.tools.nsc.util.Exceptional.unwrap +import scala.collection.{ mutable, immutable } +import scala.util.control.Exception.{ ultimately } +import SparkIMain._ +import java.util.concurrent.Future +import typechecker.Analyzer +import scala.language.implicitConversions +import scala.reflect.runtime.{ universe = ru } +import scala.reflect.{ ClassTag, classTag } +import scala.tools.reflect.StdRuntimeTags._ +import scala.util.control.ControlThrowable + +import org.apache.spark.{Logging, HttpServer, SecurityManager, SparkConf} +import org.apache.spark.util.Utils + +// /** directory to save .class files to */ +// private class ReplVirtualDirectory(out: JPrintWriter) extends VirtualDirectory(((memory)), None) { +// private def pp(root: AbstractFile, indentLevel: Int) { +// val spaces = * indentLevel +// out.println(spaces + root.name) +// if (root.isDirectory) +// root.toList sortBy (_.name) foreach (x = pp(x, indentLevel + 1)) +// } +// // print the contents hierarchically +// def show() = pp(this, 0) +// } + + /** An interpreter for Scala code. + * + * The main public entry points are compile(), interpret(), and bind(). + * The compile() method loads a complete Scala file. The interpret() method + * executes one line of Scala code at the request of the user. The bind() + * method binds an object to a variable that can then be used by later + * interpreted code. + * + * The overall approach is based on compiling the requested code and then + * using a Java classloader and Java reflection to run the code + * and access its results. + * + * In more detail, a single compiler instance is used + * to accumulate all successfully compiled or interpreted Scala code. To + * interpret a line of code, the compiler generates a fresh object that + * includes the line of code and which has public member(s) to export + * all variables defined by that code. To extract the result of an + * interpreted line to show the user, a second result object is created + * which imports the variables exported by the above object and then + * exports members called $eval and $print. To accomodate user expressions + * that read from variables or methods defined in previous statements, import + * statements are used. + * + * This interpreter shares the strengths and weaknesses of using the + * full compiler-to-Java. The main strength is that interpreted code + * behaves exactly as does compiled code, including running at full speed. + * The main weakness is that redefining classes and methods is not handled + * properly, because rebinding at the Java level is technically difficult. + * + * @author Moez A. Abdel-Gawad + * @author Lex Spoon + */ + class SparkIMain( + initialSettings: Settings, + val out: JPrintWriter, + propagateExceptions: Boolean = false) +extends SparkImports with Logging { imain = + +val conf = new SparkConf() + +val SPARK_DEBUG_REPL: Boolean = (System.getenv(SPARK_DEBUG_REPL) == 1) +/** Local directory to save .class files too */ +lazy val outputDir = { + val tmp = System.getProperty(java.io.tmpdir) + val rootDir = conf.get(spark.repl.classdir, tmp) + Utils.createTempDir(rootDir) +} +if (SPARK_DEBUG_REPL) { + echo(Output directory: + outputDir) +} + +val virtualDirectory = new PlainFile(outputDir) // directory for classfiles +/** Jetty server that will serve our classes to worker nodes */ +val classServerPort = conf.getInt(spark.replClassServer.port, 0) +val classServer = new HttpServer(outputDir, new
[07/20] spark git commit: Scala 2.11 support with repl and all build changes.
http://git-wip-us.apache.org/repos/asf/spark/blob/4af9de7d/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala -- diff --git a/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala new file mode 100644 index 000..1bb62c8 --- /dev/null +++ b/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala @@ -0,0 +1,1319 @@ +/* NSC -- new Scala compiler + * Copyright 2005-2013 LAMP/EPFL + * @author Martin Odersky + */ + +package scala +package tools.nsc +package interpreter + +import PartialFunction.cond +import scala.language.implicitConversions +import scala.beans.BeanProperty +import scala.collection.mutable +import scala.concurrent.{ Future, ExecutionContext } +import scala.reflect.runtime.{ universe = ru } +import scala.reflect.{ ClassTag, classTag } +import scala.reflect.internal.util.{ BatchSourceFile, SourceFile } +import scala.tools.util.PathResolver +import scala.tools.nsc.io.AbstractFile +import scala.tools.nsc.typechecker.{ TypeStrings, StructuredTypeStrings } +import scala.tools.nsc.util.{ ScalaClassLoader, stringFromReader, stringFromWriter, StackTraceOps } +import scala.tools.nsc.util.Exceptional.unwrap +import javax.script.{AbstractScriptEngine, Bindings, ScriptContext, ScriptEngine, ScriptEngineFactory, ScriptException, CompiledScript, Compilable} + +/** An interpreter for Scala code. + * + * The main public entry points are compile(), interpret(), and bind(). + * The compile() method loads a complete Scala file. The interpret() method + * executes one line of Scala code at the request of the user. The bind() + * method binds an object to a variable that can then be used by later + * interpreted code. + * + * The overall approach is based on compiling the requested code and then + * using a Java classloader and Java reflection to run the code + * and access its results. + * + * In more detail, a single compiler instance is used + * to accumulate all successfully compiled or interpreted Scala code. To + * interpret a line of code, the compiler generates a fresh object that + * includes the line of code and which has public member(s) to export + * all variables defined by that code. To extract the result of an + * interpreted line to show the user, a second result object is created + * which imports the variables exported by the above object and then + * exports members called $eval and $print. To accomodate user expressions + * that read from variables or methods defined in previous statements, import + * statements are used. + * + * This interpreter shares the strengths and weaknesses of using the + * full compiler-to-Java. The main strength is that interpreted code + * behaves exactly as does compiled code, including running at full speed. + * The main weakness is that redefining classes and methods is not handled + * properly, because rebinding at the Java level is technically difficult. + * + * @author Moez A. Abdel-Gawad + * @author Lex Spoon + */ +class SparkIMain(@BeanProperty val factory: ScriptEngineFactory, initialSettings: Settings, + protected val out: JPrintWriter) extends AbstractScriptEngine with Compilable with SparkImports { + imain = + + setBindings(createBindings, ScriptContext.ENGINE_SCOPE) + object replOutput extends ReplOutput(settings.Yreploutdir) { } + + @deprecated(Use replOutput.dir instead, 2.11.0) + def virtualDirectory = replOutput.dir + // Used in a test case. + def showDirectory() = replOutput.show(out) + + private[nsc] var printResults = true // whether to print result lines + private[nsc] var totalSilence = false // whether to print anything + private var _initializeComplete = false // compiler is initialized + private var _isInitialized: Future[Boolean] = null // set up initialization future + private var bindExceptions = true // whether to bind the lastException variable + private var _executionWrapper = // code to be wrapped around all lines + + /** We're going to go to some trouble to initialize the compiler asynchronously. +* It's critical that nothing call into it until it's been initialized or we will +* run into unrecoverable issues, but the perceived repl startup time goes +* through the roof if we wait for it. So we initialize it with a future and +* use a lazy val to ensure that any attempt to use the compiler object waits +* on the future. +*/ + private var _classLoader: util.AbstractFileClassLoader = null // active classloader + private val _compiler: ReplGlobal = newCompiler(settings, reporter) // our private compiler + + def compilerClasspath: Seq[java.net.URL] = ( +if
[03/20] spark git commit: Setting test jars on executor classpath during tests from sbt.
Setting test jars on executor classpath during tests from sbt. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4bcf66f2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4bcf66f2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4bcf66f2 Branch: refs/heads/scala-2.11-prashant Commit: 4bcf66f2c9421f0c50e5a4578a80812aea4b8b10 Parents: 29cc4d9 Author: Prashant Sharma prashan...@imaginea.com Authored: Wed Nov 5 16:51:09 2014 +0530 Committer: Prashant Sharma prashan...@imaginea.com Committed: Fri Nov 7 11:22:47 2014 +0530 -- bin/compute-classpath.sh | 6 -- project/SparkBuild.scala | 8 +--- 2 files changed, 5 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4bcf66f2/bin/compute-classpath.sh -- diff --git a/bin/compute-classpath.sh b/bin/compute-classpath.sh index 108c9af..14e972f 100755 --- a/bin/compute-classpath.sh +++ b/bin/compute-classpath.sh @@ -137,9 +137,6 @@ if [ -n $datanucleus_jars ]; then fi fi -test_jars=$(find $FWDIR/lib_managed/test \( -name '*jar' -a -type f \) 2/dev/null | \ -tr \n : | sed s/:$//g) - # Add test classes if we're running from SBT or Maven with SPARK_TESTING set to 1 if [[ $SPARK_TESTING == 1 ]]; then CLASSPATH=$CLASSPATH:$FWDIR/core/target/scala-$SPARK_SCALA_VERSION/test-classes @@ -151,9 +148,6 @@ if [[ $SPARK_TESTING == 1 ]]; then CLASSPATH=$CLASSPATH:$FWDIR/sql/catalyst/target/scala-$SPARK_SCALA_VERSION/test-classes CLASSPATH=$CLASSPATH:$FWDIR/sql/core/target/scala-$SPARK_SCALA_VERSION/test-classes CLASSPATH=$CLASSPATH:$FWDIR/sql/hive/target/scala-$SPARK_SCALA_VERSION/test-classes - if [[ $SPARK_SCALA_VERSION == 2.11 ]]; then - CLASSPATH=$CLASSPATH:$test_jars - fi fi # Add hadoop conf dir if given -- otherwise FileSystem.*, etc fail ! http://git-wip-us.apache.org/repos/asf/spark/blob/4bcf66f2/project/SparkBuild.scala -- diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 0d8adcb..9e46a11 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -31,8 +31,8 @@ object BuildCommons { private val buildLocation = file(.).getAbsoluteFile.getParentFile val allProjects@Seq(bagel, catalyst, core, graphx, hive, hiveThriftServer, mllib, repl, - sql, networkCommon, networkShuffle, streaming, streamingFlumeSink, streamingFlume, streamingKafka, - streamingMqtt, streamingTwitter, streamingZeromq) = +sql, networkCommon, networkShuffle, streaming, streamingFlumeSink, streamingFlume, streamingKafka, +streamingMqtt, streamingTwitter, streamingZeromq) = Seq(bagel, catalyst, core, graphx, hive, hive-thriftserver, mllib, repl, sql, network-common, network-shuffle, streaming, streaming-flume-sink, streaming-flume, streaming-kafka, streaming-mqtt, streaming-twitter, @@ -361,8 +361,10 @@ object TestSettings { .map { case (k,v) = s-D$k=$v }.toSeq, javaOptions in Test ++= -Xmx3g -XX:PermSize=128M -XX:MaxNewSize=256m -XX:MaxPermSize=1g .split( ).toSeq, +javaOptions in Test += + -Dspark.executor.extraClassPath= + (fullClasspath in Test).value.files. + map(_.getAbsolutePath).mkString(:).stripSuffix(:), javaOptions += -Xmx3g, -retrievePattern := [conf]/[artifact](-[revision]).[ext], // Show full stack trace and duration in test cases. testOptions in Test += Tests.Argument(-oDF), testOptions += Tests.Argument(TestFrameworks.JUnit, -v, -a), - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[19/20] spark git commit: Testing new Hive version with shaded jline
Testing new Hive version with shaded jline Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/37d972c3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/37d972c3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/37d972c3 Branch: refs/heads/scala-2.11-prashant Commit: 37d972c35b4752f76be473ee82a0f1ac995bd798 Parents: d3065ec Author: Patrick Wendell pwend...@gmail.com Authored: Fri Nov 7 11:09:52 2014 -0800 Committer: Patrick Wendell pwend...@gmail.com Committed: Fri Nov 7 11:13:34 2014 -0800 -- pom.xml | 20 1 file changed, 4 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/37d972c3/pom.xml -- diff --git a/pom.xml b/pom.xml index b239356..3053e7c 100644 --- a/pom.xml +++ b/pom.xml @@ -126,7 +126,7 @@ flume.version1.4.0/flume.version zookeeper.version3.4.5/zookeeper.version !-- Version used in Maven Hive dependency -- -hive.version0.13.1a/hive.version +hive.version0.13.1b/hive.version !-- Version used for internal directory structure -- hive.version.short0.13.1/hive.version.short derby.version10.10.1.1/derby.version @@ -231,22 +231,10 @@ /snapshots /repository repository - !-- This is temporarily included to fix issues with Hive 0.12 -- + !-- This is temporarily included to fix issues with Hive 0.13 -- idspark-staging/id nameSpring Staging Repository/name - urlhttps://oss.sonatype.org/content/repositories/orgspark-project-1085/url - releases -enabledtrue/enabled - /releases - snapshots -enabledfalse/enabled - /snapshots -/repository -repository - !-- This is temporarily included to fix issues with Hive 0.13 -- - idspark-staging-hive13/id - nameSpring Staging Repository Hive 13/name - urlhttps://oss.sonatype.org/content/repositories/orgspark-project-1089//url + urlhttps://oss.sonatype.org/content/repositories/orgspark-project-1090/url releases enabledtrue/enabled /releases @@ -1401,7 +1389,7 @@ activeByDefaultfalse/activeByDefault /activation properties -hive.version0.13.1a/hive.version +hive.version0.13.1b/hive.version hive.version.short0.13.1/hive.version.short derby.version10.10.1.1/derby.version /properties - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SQL][DOC][Minor] Spark SQL Hive now support dynamic partitioning
Repository: spark Updated Branches: refs/heads/branch-1.2 d6262fa05 - e5b8cea7e [SQL][DOC][Minor] Spark SQL Hive now support dynamic partitioning Author: wangfei wangf...@huawei.com Closes #3127 from scwf/patch-9 and squashes the following commits: e39a560 [wangfei] now support dynamic partitioning (cherry picked from commit 636d7bcc96b912f5b5caa91110cd55b55fa38ad8) Signed-off-by: Michael Armbrust mich...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e5b8cea7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e5b8cea7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e5b8cea7 Branch: refs/heads/branch-1.2 Commit: e5b8cea7ef219be33df1db77a0921885833a4254 Parents: d6262fa Author: wangfei wangf...@huawei.com Authored: Fri Nov 7 11:43:35 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 11:43:49 2014 -0800 -- docs/sql-programming-guide.md | 1 - 1 file changed, 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e5b8cea7/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index e399fec..ffcce2c 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1059,7 +1059,6 @@ in Hive deployments. **Major Hive Features** -* Spark SQL does not currently support inserting to tables using dynamic partitioning. * Tables with buckets: bucket is the hash partitioning within a Hive table partition. Spark SQL doesn't support buckets yet. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4225][SQL] Resorts to SparkContext.version to inspect Spark version
Repository: spark Updated Branches: refs/heads/master 636d7bcc9 - 86e9eaa3f [SPARK-4225][SQL] Resorts to SparkContext.version to inspect Spark version This PR resorts to `SparkContext.version` rather than META-INF/MANIFEST.MF in the assembly jar to inspect Spark version. Currently, when built with Maven, the MANIFEST.MF file in the assembly jar is incorrectly replaced by Guava 15.0 MANIFEST.MF, probably because of the assembly/shading tricks. Another related PR is #3103, which tries to fix the MANIFEST issue. Author: Cheng Lian l...@databricks.com Closes #3105 from liancheng/spark-4225 and squashes the following commits: d9585e1 [Cheng Lian] Resorts to SparkContext.version to inspect Spark version Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/86e9eaa3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/86e9eaa3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/86e9eaa3 Branch: refs/heads/master Commit: 86e9eaa3f0ec23cb38bce67585adb2d5f484f4ee Parents: 636d7bc Author: Cheng Lian l...@databricks.com Authored: Fri Nov 7 11:45:25 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 11:45:25 2014 -0800 -- .../scala/org/apache/spark/util/Utils.scala | 24 ++-- .../hive/thriftserver/SparkSQLCLIService.scala | 12 -- 2 files changed, 12 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/86e9eaa3/core/src/main/scala/org/apache/spark/util/Utils.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index a14d612..6b85c03 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -21,10 +21,8 @@ import java.io._ import java.lang.management.ManagementFactory import java.net._ import java.nio.ByteBuffer -import java.util.jar.Attributes.Name -import java.util.{Properties, Locale, Random, UUID} -import java.util.concurrent.{ThreadFactory, ConcurrentHashMap, Executors, ThreadPoolExecutor} -import java.util.jar.{Manifest = JarManifest} +import java.util.concurrent.{ConcurrentHashMap, Executors, ThreadFactory, ThreadPoolExecutor} +import java.util.{Locale, Properties, Random, UUID} import scala.collection.JavaConversions._ import scala.collection.Map @@ -38,11 +36,11 @@ import com.google.common.io.{ByteStreams, Files} import com.google.common.util.concurrent.ThreadFactoryBuilder import org.apache.commons.lang3.SystemUtils import org.apache.hadoop.conf.Configuration -import org.apache.log4j.PropertyConfigurator import org.apache.hadoop.fs.{FileSystem, FileUtil, Path} +import org.apache.log4j.PropertyConfigurator import org.eclipse.jetty.util.MultiException import org.json4s._ -import tachyon.client.{TachyonFile,TachyonFS} +import tachyon.client.{TachyonFS, TachyonFile} import org.apache.spark._ import org.apache.spark.deploy.SparkHadoopUtil @@ -352,8 +350,8 @@ private[spark] object Utils extends Logging { * Download a file to target directory. Supports fetching the file in a variety of ways, * including HTTP, HDFS and files on a standard filesystem, based on the URL parameter. * - * If `useCache` is true, first attempts to fetch the file to a local cache that's shared - * across executors running the same application. `useCache` is used mainly for + * If `useCache` is true, first attempts to fetch the file to a local cache that's shared + * across executors running the same application. `useCache` is used mainly for * the executors, and not in local mode. * * Throws SparkException if the target file already exists and has different contents than @@ -400,7 +398,7 @@ private[spark] object Utils extends Logging { } else { doFetchFile(url, targetDir, fileName, conf, securityMgr, hadoopConf) } - + // Decompress the file if it's a .tar or .tar.gz if (fileName.endsWith(.tar.gz) || fileName.endsWith(.tgz)) { logInfo(Untarring + fileName) @@ -1776,13 +1774,6 @@ private[spark] object Utils extends Logging { s$libraryPathEnvName=$libraryPath$ampersand } - lazy val sparkVersion = -SparkContext.jarOfObject(this).map { path = - val manifestUrl = new URL(sjar:file:$path!/META-INF/MANIFEST.MF) - val manifest = new JarManifest(manifestUrl.openStream()) - manifest.getMainAttributes.getValue(Name.IMPLEMENTATION_VERSION) -}.getOrElse(Unknown) - /** * Return the value of a config either through the SparkConf or the Hadoop configuration * if this is Yarn mode. In the latter case, this defaults to the value set through SparkConf @@ -1796,7 +1787,6 @@ private[spark]
spark git commit: [SPARK-4225][SQL] Resorts to SparkContext.version to inspect Spark version
Repository: spark Updated Branches: refs/heads/branch-1.2 e5b8cea7e - 2cd8e3e2b [SPARK-4225][SQL] Resorts to SparkContext.version to inspect Spark version This PR resorts to `SparkContext.version` rather than META-INF/MANIFEST.MF in the assembly jar to inspect Spark version. Currently, when built with Maven, the MANIFEST.MF file in the assembly jar is incorrectly replaced by Guava 15.0 MANIFEST.MF, probably because of the assembly/shading tricks. Another related PR is #3103, which tries to fix the MANIFEST issue. Author: Cheng Lian l...@databricks.com Closes #3105 from liancheng/spark-4225 and squashes the following commits: d9585e1 [Cheng Lian] Resorts to SparkContext.version to inspect Spark version (cherry picked from commit 86e9eaa3f0ec23cb38bce67585adb2d5f484f4ee) Signed-off-by: Michael Armbrust mich...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2cd8e3e2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2cd8e3e2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2cd8e3e2 Branch: refs/heads/branch-1.2 Commit: 2cd8e3e2b00c6191bccfb70743df7a4c9ffd98b2 Parents: e5b8cea Author: Cheng Lian l...@databricks.com Authored: Fri Nov 7 11:45:25 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 11:46:27 2014 -0800 -- .../scala/org/apache/spark/util/Utils.scala | 24 ++-- .../hive/thriftserver/SparkSQLCLIService.scala | 12 -- 2 files changed, 12 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2cd8e3e2/core/src/main/scala/org/apache/spark/util/Utils.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index a14d612..6b85c03 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -21,10 +21,8 @@ import java.io._ import java.lang.management.ManagementFactory import java.net._ import java.nio.ByteBuffer -import java.util.jar.Attributes.Name -import java.util.{Properties, Locale, Random, UUID} -import java.util.concurrent.{ThreadFactory, ConcurrentHashMap, Executors, ThreadPoolExecutor} -import java.util.jar.{Manifest = JarManifest} +import java.util.concurrent.{ConcurrentHashMap, Executors, ThreadFactory, ThreadPoolExecutor} +import java.util.{Locale, Properties, Random, UUID} import scala.collection.JavaConversions._ import scala.collection.Map @@ -38,11 +36,11 @@ import com.google.common.io.{ByteStreams, Files} import com.google.common.util.concurrent.ThreadFactoryBuilder import org.apache.commons.lang3.SystemUtils import org.apache.hadoop.conf.Configuration -import org.apache.log4j.PropertyConfigurator import org.apache.hadoop.fs.{FileSystem, FileUtil, Path} +import org.apache.log4j.PropertyConfigurator import org.eclipse.jetty.util.MultiException import org.json4s._ -import tachyon.client.{TachyonFile,TachyonFS} +import tachyon.client.{TachyonFS, TachyonFile} import org.apache.spark._ import org.apache.spark.deploy.SparkHadoopUtil @@ -352,8 +350,8 @@ private[spark] object Utils extends Logging { * Download a file to target directory. Supports fetching the file in a variety of ways, * including HTTP, HDFS and files on a standard filesystem, based on the URL parameter. * - * If `useCache` is true, first attempts to fetch the file to a local cache that's shared - * across executors running the same application. `useCache` is used mainly for + * If `useCache` is true, first attempts to fetch the file to a local cache that's shared + * across executors running the same application. `useCache` is used mainly for * the executors, and not in local mode. * * Throws SparkException if the target file already exists and has different contents than @@ -400,7 +398,7 @@ private[spark] object Utils extends Logging { } else { doFetchFile(url, targetDir, fileName, conf, securityMgr, hadoopConf) } - + // Decompress the file if it's a .tar or .tar.gz if (fileName.endsWith(.tar.gz) || fileName.endsWith(.tgz)) { logInfo(Untarring + fileName) @@ -1776,13 +1774,6 @@ private[spark] object Utils extends Logging { s$libraryPathEnvName=$libraryPath$ampersand } - lazy val sparkVersion = -SparkContext.jarOfObject(this).map { path = - val manifestUrl = new URL(sjar:file:$path!/META-INF/MANIFEST.MF) - val manifest = new JarManifest(manifestUrl.openStream()) - manifest.getMainAttributes.getValue(Name.IMPLEMENTATION_VERSION) -}.getOrElse(Unknown) - /** * Return the value of a config either through the SparkConf or the Hadoop configuration
spark git commit: [SQL] Support ScalaReflection of schema in different universes
Repository: spark Updated Branches: refs/heads/master 86e9eaa3f - 8154ed7df [SQL] Support ScalaReflection of schema in different universes Author: Michael Armbrust mich...@databricks.com Closes #3096 from marmbrus/reflectionContext and squashes the following commits: adc221f [Michael Armbrust] Support ScalaReflection of schema in different universes Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8154ed7d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8154ed7d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8154ed7d Branch: refs/heads/master Commit: 8154ed7df6c5407e638f465d3bd86b43f36216ef Parents: 86e9eaa Author: Michael Armbrust mich...@databricks.com Authored: Fri Nov 7 11:51:20 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 11:51:20 2014 -0800 -- .../spark/sql/catalyst/ScalaReflection.scala | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8154ed7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala index 9cda373..71034c2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala @@ -26,14 +26,26 @@ import org.apache.spark.sql.catalyst.plans.logical.LocalRelation import org.apache.spark.sql.catalyst.types._ import org.apache.spark.sql.catalyst.types.decimal.Decimal + /** - * Provides experimental support for generating catalyst schemas for scala objects. + * A default version of ScalaReflection that uses the runtime universe. */ -object ScalaReflection { +object ScalaReflection extends ScalaReflection { + val universe: scala.reflect.runtime.universe.type = scala.reflect.runtime.universe +} + +/** + * Support for generating catalyst schemas for scala objects. + */ +trait ScalaReflection { + /** The universe we work in (runtime or macro) */ + val universe: scala.reflect.api.Universe + + import universe._ + // The Predef.Map is scala.collection.immutable.Map. // Since the map values can be mutable, we explicitly import scala.collection.Map at here. import scala.collection.Map - import scala.reflect.runtime.universe._ case class Schema(dataType: DataType, nullable: Boolean) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SQL] Support ScalaReflection of schema in different universes
Repository: spark Updated Branches: refs/heads/branch-1.2 2cd8e3e2b - f1f1ae418 [SQL] Support ScalaReflection of schema in different universes Author: Michael Armbrust mich...@databricks.com Closes #3096 from marmbrus/reflectionContext and squashes the following commits: adc221f [Michael Armbrust] Support ScalaReflection of schema in different universes (cherry picked from commit 8154ed7df6c5407e638f465d3bd86b43f36216ef) Signed-off-by: Michael Armbrust mich...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f1f1ae41 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f1f1ae41 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f1f1ae41 Branch: refs/heads/branch-1.2 Commit: f1f1ae418031957256e7dac896e29d64c81bf1a4 Parents: 2cd8e3e Author: Michael Armbrust mich...@databricks.com Authored: Fri Nov 7 11:51:20 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 11:51:33 2014 -0800 -- .../spark/sql/catalyst/ScalaReflection.scala | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f1f1ae41/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala index 9cda373..71034c2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala @@ -26,14 +26,26 @@ import org.apache.spark.sql.catalyst.plans.logical.LocalRelation import org.apache.spark.sql.catalyst.types._ import org.apache.spark.sql.catalyst.types.decimal.Decimal + /** - * Provides experimental support for generating catalyst schemas for scala objects. + * A default version of ScalaReflection that uses the runtime universe. */ -object ScalaReflection { +object ScalaReflection extends ScalaReflection { + val universe: scala.reflect.runtime.universe.type = scala.reflect.runtime.universe +} + +/** + * Support for generating catalyst schemas for scala objects. + */ +trait ScalaReflection { + /** The universe we work in (runtime or macro) */ + val universe: scala.reflect.api.Universe + + import universe._ + // The Predef.Map is scala.collection.immutable.Map. // Since the map values can be mutable, we explicitly import scala.collection.Map at here. import scala.collection.Map - import scala.reflect.runtime.universe._ case class Schema(dataType: DataType, nullable: Boolean) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SQL] Modify keyword val location according to ordering
Repository: spark Updated Branches: refs/heads/master 8154ed7df - 68609c51a [SQL] Modify keyword val location according to ordering 'DOUBLE' should be moved before 'ELSE' according to the ordering convension Author: Jacky Li jacky.li...@gmail.com Closes #3080 from jackylk/patch-5 and squashes the following commits: 3c11df7 [Jacky Li] [SQL] Modify keyword val location according to ordering Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/68609c51 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/68609c51 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/68609c51 Branch: refs/heads/master Commit: 68609c51ad1ab2def302df3c4a1c0bc1ec6e1075 Parents: 8154ed7 Author: Jacky Li jacky.li...@gmail.com Authored: Fri Nov 7 11:52:08 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 11:52:08 2014 -0800 -- .../src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/68609c51/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala index 5e613e0..affef27 100755 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala @@ -55,10 +55,10 @@ class SqlParser extends AbstractSparkSQLParser { protected val DECIMAL = Keyword(DECIMAL) protected val DESC = Keyword(DESC) protected val DISTINCT = Keyword(DISTINCT) + protected val DOUBLE = Keyword(DOUBLE) protected val ELSE = Keyword(ELSE) protected val END = Keyword(END) protected val EXCEPT = Keyword(EXCEPT) - protected val DOUBLE = Keyword(DOUBLE) protected val FALSE = Keyword(FALSE) protected val FIRST = Keyword(FIRST) protected val FROM = Keyword(FROM) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4213][SQL] ParquetFilters - No support for LT, LTE, GT, GTE operators
Repository: spark Updated Branches: refs/heads/master 68609c51a - 14c54f187 [SPARK-4213][SQL] ParquetFilters - No support for LT, LTE, GT, GTE operators Following description is quoted from JIRA: When I issue a hql query against a HiveContext where my predicate uses a column of string type with one of LT, LTE, GT, or GTE operator, I get the following error: scala.MatchError: StringType (of class org.apache.spark.sql.catalyst.types.StringType$) Looking at the code in org.apache.spark.sql.parquet.ParquetFilters, StringType is absent from the corresponding functions for creating these filters. To reproduce, in a Hive 0.13.1 shell, I created the following table (at a specified DB): create table sparkbug ( id int, event string ) stored as parquet; Insert some sample data: insert into table sparkbug select 1, '2011-06-18' from some table limit 1; insert into table sparkbug select 2, '2012-01-01' from some table limit 1; Launch a spark shell and create a HiveContext to the metastore where the table above is located. import org.apache.spark.sql._ import org.apache.spark.sql.SQLContext import org.apache.spark.sql.hive.HiveContext val hc = new HiveContext(sc) hc.setConf(spark.sql.shuffle.partitions, 10) hc.setConf(spark.sql.hive.convertMetastoreParquet, true) hc.setConf(spark.sql.parquet.compression.codec, snappy) import hc._ hc.hql(select * from db.sparkbug where event = '2011-12-01') A scala.MatchError will appear in the output. Author: Kousuke Saruta saru...@oss.nttdata.co.jp Closes #3083 from sarutak/SPARK-4213 and squashes the following commits: 4ab6e56 [Kousuke Saruta] WIP b6890c6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4213 9a1fae7 [Kousuke Saruta] Fixed ParquetFilters so that compare Strings Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/14c54f18 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/14c54f18 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/14c54f18 Branch: refs/heads/master Commit: 14c54f1876fcf91b5c10e80be2df5421c7328557 Parents: 68609c5 Author: Kousuke Saruta saru...@oss.nttdata.co.jp Authored: Fri Nov 7 11:56:40 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 11:56:40 2014 -0800 -- .../spark/sql/parquet/ParquetFilters.scala | 335 ++- .../spark/sql/parquet/ParquetQuerySuite.scala | 40 +++ 2 files changed, 364 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/14c54f18/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala index 517a5cf..1e67799 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala @@ -18,13 +18,15 @@ package org.apache.spark.sql.parquet import java.nio.ByteBuffer +import java.sql.{Date, Timestamp} import org.apache.hadoop.conf.Configuration +import parquet.common.schema.ColumnPath import parquet.filter2.compat.FilterCompat import parquet.filter2.compat.FilterCompat._ -import parquet.filter2.predicate.FilterPredicate -import parquet.filter2.predicate.FilterApi +import parquet.filter2.predicate.Operators.{Column, SupportsLtGt} +import parquet.filter2.predicate.{FilterApi, FilterPredicate} import parquet.filter2.predicate.FilterApi._ import parquet.io.api.Binary import parquet.column.ColumnReader @@ -33,9 +35,11 @@ import com.google.common.io.BaseEncoding import org.apache.spark.SparkEnv import org.apache.spark.sql.catalyst.types._ +import org.apache.spark.sql.catalyst.types.decimal.Decimal import org.apache.spark.sql.catalyst.expressions.{Predicate = CatalystPredicate} import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.execution.SparkSqlSerializer +import org.apache.spark.sql.parquet.ParquetColumns._ private[sql] object ParquetFilters { val PARQUET_FILTER_DATA = org.apache.spark.sql.parquet.row.filter @@ -50,15 +54,25 @@ private[sql] object ParquetFilters { if (filters.length 0) FilterCompat.get(filters.reduce(FilterApi.and)) else null } - def createFilter(expression: Expression): Option[CatalystFilter] ={ + def createFilter(expression: Expression): Option[CatalystFilter] = { def createEqualityFilter( name: String, literal: Literal, predicate: CatalystPredicate) = literal.dataType match { case BooleanType = -ComparisonFilter.createBooleanFilter( +
spark git commit: [SPARK-4213][SQL] ParquetFilters - No support for LT, LTE, GT, GTE operators
Repository: spark Updated Branches: refs/heads/branch-1.2 51ef8ab8e - d530c3952 [SPARK-4213][SQL] ParquetFilters - No support for LT, LTE, GT, GTE operators Following description is quoted from JIRA: When I issue a hql query against a HiveContext where my predicate uses a column of string type with one of LT, LTE, GT, or GTE operator, I get the following error: scala.MatchError: StringType (of class org.apache.spark.sql.catalyst.types.StringType$) Looking at the code in org.apache.spark.sql.parquet.ParquetFilters, StringType is absent from the corresponding functions for creating these filters. To reproduce, in a Hive 0.13.1 shell, I created the following table (at a specified DB): create table sparkbug ( id int, event string ) stored as parquet; Insert some sample data: insert into table sparkbug select 1, '2011-06-18' from some table limit 1; insert into table sparkbug select 2, '2012-01-01' from some table limit 1; Launch a spark shell and create a HiveContext to the metastore where the table above is located. import org.apache.spark.sql._ import org.apache.spark.sql.SQLContext import org.apache.spark.sql.hive.HiveContext val hc = new HiveContext(sc) hc.setConf(spark.sql.shuffle.partitions, 10) hc.setConf(spark.sql.hive.convertMetastoreParquet, true) hc.setConf(spark.sql.parquet.compression.codec, snappy) import hc._ hc.hql(select * from db.sparkbug where event = '2011-12-01') A scala.MatchError will appear in the output. Author: Kousuke Saruta saru...@oss.nttdata.co.jp Closes #3083 from sarutak/SPARK-4213 and squashes the following commits: 4ab6e56 [Kousuke Saruta] WIP b6890c6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4213 9a1fae7 [Kousuke Saruta] Fixed ParquetFilters so that compare Strings (cherry picked from commit 14c54f1876fcf91b5c10e80be2df5421c7328557) Signed-off-by: Michael Armbrust mich...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d530c395 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d530c395 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d530c395 Branch: refs/heads/branch-1.2 Commit: d530c3952131b29fd4d7a3e54496bfe634517af1 Parents: 51ef8ab Author: Kousuke Saruta saru...@oss.nttdata.co.jp Authored: Fri Nov 7 11:56:40 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 11:57:12 2014 -0800 -- .../spark/sql/parquet/ParquetFilters.scala | 335 ++- .../spark/sql/parquet/ParquetQuerySuite.scala | 40 +++ 2 files changed, 364 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d530c395/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala index 517a5cf..1e67799 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala @@ -18,13 +18,15 @@ package org.apache.spark.sql.parquet import java.nio.ByteBuffer +import java.sql.{Date, Timestamp} import org.apache.hadoop.conf.Configuration +import parquet.common.schema.ColumnPath import parquet.filter2.compat.FilterCompat import parquet.filter2.compat.FilterCompat._ -import parquet.filter2.predicate.FilterPredicate -import parquet.filter2.predicate.FilterApi +import parquet.filter2.predicate.Operators.{Column, SupportsLtGt} +import parquet.filter2.predicate.{FilterApi, FilterPredicate} import parquet.filter2.predicate.FilterApi._ import parquet.io.api.Binary import parquet.column.ColumnReader @@ -33,9 +35,11 @@ import com.google.common.io.BaseEncoding import org.apache.spark.SparkEnv import org.apache.spark.sql.catalyst.types._ +import org.apache.spark.sql.catalyst.types.decimal.Decimal import org.apache.spark.sql.catalyst.expressions.{Predicate = CatalystPredicate} import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.execution.SparkSqlSerializer +import org.apache.spark.sql.parquet.ParquetColumns._ private[sql] object ParquetFilters { val PARQUET_FILTER_DATA = org.apache.spark.sql.parquet.row.filter @@ -50,15 +54,25 @@ private[sql] object ParquetFilters { if (filters.length 0) FilterCompat.get(filters.reduce(FilterApi.and)) else null } - def createFilter(expression: Expression): Option[CatalystFilter] ={ + def createFilter(expression: Expression): Option[CatalystFilter] = { def createEqualityFilter( name: String, literal: Literal, predicate:
spark git commit: [SPARK-4272] [SQL] Add more unwrapper functions for primitive type in TableReader
Repository: spark Updated Branches: refs/heads/master 14c54f187 - 60ab80f50 [SPARK-4272] [SQL] Add more unwrapper functions for primitive type in TableReader Currently, the data unwrap only support couple of primitive types, not all, it will not cause exception, but may get some performance in table scanning for the type like binary, date, timestamp, decimal etc. Author: Cheng Hao hao.ch...@intel.com Closes #3136 from chenghao-intel/table_reader and squashes the following commits: fffb729 [Cheng Hao] fix bug for retrieving the timestamp object e9c97a4 [Cheng Hao] Add more unwrapper functions for primitive type in TableReader Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60ab80f5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60ab80f5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60ab80f5 Branch: refs/heads/master Commit: 60ab80f501b8384ddf48a9ac0ba0c2b9eb548b28 Parents: 14c54f1 Author: Cheng Hao hao.ch...@intel.com Authored: Fri Nov 7 12:15:53 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 12:15:53 2014 -0800 -- .../org/apache/spark/sql/hive/HiveInspectors.scala | 4 .../org/apache/spark/sql/hive/TableReader.scala | 15 +++ 2 files changed, 15 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/60ab80f5/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala index 58815da..bdc7e1d 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala @@ -119,10 +119,6 @@ private[hive] trait HiveInspectors { * Wraps with Hive types based on object inspector. * TODO: Consolidate all hive OI/data interface code. */ - /** - * Wraps with Hive types based on object inspector. - * TODO: Consolidate all hive OI/data interface code. - */ protected def wrapperFor(oi: ObjectInspector): Any = Any = oi match { case _: JavaHiveVarcharObjectInspector = (o: Any) = new HiveVarchar(o.asInstanceOf[String], o.asInstanceOf[String].size) http://git-wip-us.apache.org/repos/asf/spark/blob/60ab80f5/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala index e49f095..f60bc37 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala @@ -290,6 +290,21 @@ private[hive] object HadoopTableReader extends HiveInspectors { (value: Any, row: MutableRow, ordinal: Int) = row.setFloat(ordinal, oi.get(value)) case oi: DoubleObjectInspector = (value: Any, row: MutableRow, ordinal: Int) = row.setDouble(ordinal, oi.get(value)) +case oi: HiveVarcharObjectInspector = + (value: Any, row: MutableRow, ordinal: Int) = +row.setString(ordinal, oi.getPrimitiveJavaObject(value).getValue) +case oi: HiveDecimalObjectInspector = + (value: Any, row: MutableRow, ordinal: Int) = +row.update(ordinal, HiveShim.toCatalystDecimal(oi, value)) +case oi: TimestampObjectInspector = + (value: Any, row: MutableRow, ordinal: Int) = +row.update(ordinal, oi.getPrimitiveJavaObject(value).clone()) +case oi: DateObjectInspector = + (value: Any, row: MutableRow, ordinal: Int) = +row.update(ordinal, oi.getPrimitiveJavaObject(value)) +case oi: BinaryObjectInspector = + (value: Any, row: MutableRow, ordinal: Int) = +row.update(ordinal, oi.getPrimitiveJavaObject(value)) case oi = (value: Any, row: MutableRow, ordinal: Int) = row(ordinal) = unwrap(value, oi) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4272] [SQL] Add more unwrapper functions for primitive type in TableReader
Repository: spark Updated Branches: refs/heads/branch-1.2 d530c3952 - ff1a08256 [SPARK-4272] [SQL] Add more unwrapper functions for primitive type in TableReader Currently, the data unwrap only support couple of primitive types, not all, it will not cause exception, but may get some performance in table scanning for the type like binary, date, timestamp, decimal etc. Author: Cheng Hao hao.ch...@intel.com Closes #3136 from chenghao-intel/table_reader and squashes the following commits: fffb729 [Cheng Hao] fix bug for retrieving the timestamp object e9c97a4 [Cheng Hao] Add more unwrapper functions for primitive type in TableReader (cherry picked from commit 60ab80f501b8384ddf48a9ac0ba0c2b9eb548b28) Signed-off-by: Michael Armbrust mich...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ff1a0825 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ff1a0825 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ff1a0825 Branch: refs/heads/branch-1.2 Commit: ff1a0825637690b3fce780d4dcaad68dce382fb9 Parents: d530c39 Author: Cheng Hao hao.ch...@intel.com Authored: Fri Nov 7 12:15:53 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 12:16:18 2014 -0800 -- .../org/apache/spark/sql/hive/HiveInspectors.scala | 4 .../org/apache/spark/sql/hive/TableReader.scala | 15 +++ 2 files changed, 15 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ff1a0825/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala index 58815da..bdc7e1d 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala @@ -119,10 +119,6 @@ private[hive] trait HiveInspectors { * Wraps with Hive types based on object inspector. * TODO: Consolidate all hive OI/data interface code. */ - /** - * Wraps with Hive types based on object inspector. - * TODO: Consolidate all hive OI/data interface code. - */ protected def wrapperFor(oi: ObjectInspector): Any = Any = oi match { case _: JavaHiveVarcharObjectInspector = (o: Any) = new HiveVarchar(o.asInstanceOf[String], o.asInstanceOf[String].size) http://git-wip-us.apache.org/repos/asf/spark/blob/ff1a0825/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala index e49f095..f60bc37 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala @@ -290,6 +290,21 @@ private[hive] object HadoopTableReader extends HiveInspectors { (value: Any, row: MutableRow, ordinal: Int) = row.setFloat(ordinal, oi.get(value)) case oi: DoubleObjectInspector = (value: Any, row: MutableRow, ordinal: Int) = row.setDouble(ordinal, oi.get(value)) +case oi: HiveVarcharObjectInspector = + (value: Any, row: MutableRow, ordinal: Int) = +row.setString(ordinal, oi.getPrimitiveJavaObject(value).getValue) +case oi: HiveDecimalObjectInspector = + (value: Any, row: MutableRow, ordinal: Int) = +row.update(ordinal, HiveShim.toCatalystDecimal(oi, value)) +case oi: TimestampObjectInspector = + (value: Any, row: MutableRow, ordinal: Int) = +row.update(ordinal, oi.getPrimitiveJavaObject(value).clone()) +case oi: DateObjectInspector = + (value: Any, row: MutableRow, ordinal: Int) = +row.update(ordinal, oi.getPrimitiveJavaObject(value)) +case oi: BinaryObjectInspector = + (value: Any, row: MutableRow, ordinal: Int) = +row.update(ordinal, oi.getPrimitiveJavaObject(value)) case oi = (value: Any, row: MutableRow, ordinal: Int) = row(ordinal) = unwrap(value, oi) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4270][SQL] Fix Cast from DateType to DecimalType.
Repository: spark Updated Branches: refs/heads/master 60ab80f50 - a6405c5dd [SPARK-4270][SQL] Fix Cast from DateType to DecimalType. `Cast` from `DateType` to `DecimalType` throws `NullPointerException`. Author: Takuya UESHIN ues...@happy-camper.st Closes #3134 from ueshin/issues/SPARK-4270 and squashes the following commits: 7394e4b [Takuya UESHIN] Fix Cast from DateType to DecimalType. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6405c5d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a6405c5d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a6405c5d Branch: refs/heads/master Commit: a6405c5ddcda112f8efd7d50d8e5f44f78a0fa41 Parents: 60ab80f Author: Takuya UESHIN ues...@happy-camper.st Authored: Fri Nov 7 12:30:47 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 12:30:47 2014 -0800 -- .../scala/org/apache/spark/sql/catalyst/expressions/Cast.scala | 2 +- .../spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a6405c5d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala index 2200966..55319e7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala @@ -281,7 +281,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w case BooleanType = buildCast[Boolean](_, b = changePrecision(if (b) Decimal(1) else Decimal(0), target)) case DateType = - buildCast[Date](_, d = changePrecision(null, target)) // date can't cast to decimal in Hive + buildCast[Date](_, d = null) // date can't cast to decimal in Hive case TimestampType = // Note that we lose precision here. buildCast[Timestamp](_, t = changePrecision(Decimal(timestampToDouble(t)), target)) http://git-wip-us.apache.org/repos/asf/spark/blob/a6405c5d/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala index 6bfa0db..918996f 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala @@ -412,6 +412,8 @@ class ExpressionEvaluationSuite extends FunSuite { checkEvaluation(Cast(d, LongType), null) checkEvaluation(Cast(d, FloatType), null) checkEvaluation(Cast(d, DoubleType), null) +checkEvaluation(Cast(d, DecimalType.Unlimited), null) +checkEvaluation(Cast(d, DecimalType(10, 2)), null) checkEvaluation(Cast(d, StringType), 1970-01-01) checkEvaluation(Cast(Cast(d, TimestampType), StringType), 1970-01-01 00:00:00) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4203][SQL] Partition directories in random order when inserting into hive table
Repository: spark Updated Branches: refs/heads/master a6405c5dd - ac70c972a [SPARK-4203][SQL] Partition directories in random order when inserting into hive table When doing an insert into hive table with partitions the folders written to the file system are in a random order instead of the order defined in table creation. Seems that the loadPartition method in Hive.java has a MapString,String parameter but expects to be called with a map that has a defined ordering such as LinkedHashMap. Working on a test but having intillij problems Author: Matthew Taylor matthe...@tbfe.net Closes #3076 from tbfenet/partition_dir_order_problem and squashes the following commits: f1b9a52 [Matthew Taylor] Comment format fix bca709f [Matthew Taylor] review changes 0e50f6b [Matthew Taylor] test fix 99f1a31 [Matthew Taylor] partition ordering fix 369e618 [Matthew Taylor] partition ordering fix Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ac70c972 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ac70c972 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ac70c972 Branch: refs/heads/master Commit: ac70c972a51952f801fd02dd5962c0a0c1aba8f8 Parents: a6405c5 Author: Matthew Taylor matthe...@tbfe.net Authored: Fri Nov 7 12:53:08 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 12:53:08 2014 -0800 -- .../hive/execution/InsertIntoHiveTable.scala| 13 ++-- .../sql/hive/InsertIntoHiveTableSuite.scala | 34 ++-- 2 files changed, 43 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ac70c972/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala index 74b4e7a..81390f6 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala @@ -17,6 +17,8 @@ package org.apache.spark.sql.hive.execution +import java.util + import scala.collection.JavaConversions._ import org.apache.hadoop.hive.common.`type`.HiveVarchar @@ -203,6 +205,13 @@ case class InsertIntoHiveTable( // holdDDLTime will be true when TOK_HOLD_DDLTIME presents in the query as a hint. val holdDDLTime = false if (partition.nonEmpty) { + + // loadPartition call orders directories created on the iteration order of the this map + val orderedPartitionSpec = new util.LinkedHashMap[String,String]() + table.hiveQlTable.getPartCols().foreach{ +entry= + orderedPartitionSpec.put(entry.getName,partitionSpec.get(entry.getName).getOrElse()) + } val partVals = MetaStoreUtils.getPvals(table.hiveQlTable.getPartCols, partitionSpec) db.validatePartitionNameCharacters(partVals) // inheritTableSpecs is set to true. It should be set to false for a IMPORT query @@ -214,7 +223,7 @@ case class InsertIntoHiveTable( db.loadDynamicPartitions( outputPath, qualifiedTableName, - partitionSpec, + orderedPartitionSpec, overwrite, numDynamicPartitions, holdDDLTime, @@ -224,7 +233,7 @@ case class InsertIntoHiveTable( db.loadPartition( outputPath, qualifiedTableName, - partitionSpec, + orderedPartitionSpec, overwrite, holdDDLTime, inheritTableSpecs, http://git-wip-us.apache.org/repos/asf/spark/blob/ac70c972/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala -- diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala index 18dc937..5dbfb92 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala @@ -17,8 +17,10 @@ package org.apache.spark.sql.hive -import org.apache.spark.sql.QueryTest -import org.apache.spark.sql._ +import java.io.File + +import com.google.common.io.Files +import org.apache.spark.sql.{QueryTest, _} import org.apache.spark.sql.hive.test.TestHive /* Implicits */ @@ -91,4 +93,32 @@ class InsertIntoHiveTableSuite extends QueryTest { sql(DROP TABLE hiveTableWithMapValue) } + + test(SPARK-4203:random partition directory order) { +
spark git commit: [SPARK-4203][SQL] Partition directories in random order when inserting into hive table
Repository: spark Updated Branches: refs/heads/branch-1.2 684d1f0ec - c96da3676 [SPARK-4203][SQL] Partition directories in random order when inserting into hive table When doing an insert into hive table with partitions the folders written to the file system are in a random order instead of the order defined in table creation. Seems that the loadPartition method in Hive.java has a MapString,String parameter but expects to be called with a map that has a defined ordering such as LinkedHashMap. Working on a test but having intillij problems Author: Matthew Taylor matthe...@tbfe.net Closes #3076 from tbfenet/partition_dir_order_problem and squashes the following commits: f1b9a52 [Matthew Taylor] Comment format fix bca709f [Matthew Taylor] review changes 0e50f6b [Matthew Taylor] test fix 99f1a31 [Matthew Taylor] partition ordering fix 369e618 [Matthew Taylor] partition ordering fix (cherry picked from commit ac70c972a51952f801fd02dd5962c0a0c1aba8f8) Signed-off-by: Michael Armbrust mich...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c96da367 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c96da367 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c96da367 Branch: refs/heads/branch-1.2 Commit: c96da3676c32579d0f97347d35d95353b1d2ef07 Parents: 684d1f0 Author: Matthew Taylor matthe...@tbfe.net Authored: Fri Nov 7 12:53:08 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 12:53:32 2014 -0800 -- .../hive/execution/InsertIntoHiveTable.scala| 13 ++-- .../sql/hive/InsertIntoHiveTableSuite.scala | 34 ++-- 2 files changed, 43 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c96da367/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala index 74b4e7a..81390f6 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala @@ -17,6 +17,8 @@ package org.apache.spark.sql.hive.execution +import java.util + import scala.collection.JavaConversions._ import org.apache.hadoop.hive.common.`type`.HiveVarchar @@ -203,6 +205,13 @@ case class InsertIntoHiveTable( // holdDDLTime will be true when TOK_HOLD_DDLTIME presents in the query as a hint. val holdDDLTime = false if (partition.nonEmpty) { + + // loadPartition call orders directories created on the iteration order of the this map + val orderedPartitionSpec = new util.LinkedHashMap[String,String]() + table.hiveQlTable.getPartCols().foreach{ +entry= + orderedPartitionSpec.put(entry.getName,partitionSpec.get(entry.getName).getOrElse()) + } val partVals = MetaStoreUtils.getPvals(table.hiveQlTable.getPartCols, partitionSpec) db.validatePartitionNameCharacters(partVals) // inheritTableSpecs is set to true. It should be set to false for a IMPORT query @@ -214,7 +223,7 @@ case class InsertIntoHiveTable( db.loadDynamicPartitions( outputPath, qualifiedTableName, - partitionSpec, + orderedPartitionSpec, overwrite, numDynamicPartitions, holdDDLTime, @@ -224,7 +233,7 @@ case class InsertIntoHiveTable( db.loadPartition( outputPath, qualifiedTableName, - partitionSpec, + orderedPartitionSpec, overwrite, holdDDLTime, inheritTableSpecs, http://git-wip-us.apache.org/repos/asf/spark/blob/c96da367/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala -- diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala index 18dc937..5dbfb92 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala @@ -17,8 +17,10 @@ package org.apache.spark.sql.hive -import org.apache.spark.sql.QueryTest -import org.apache.spark.sql._ +import java.io.File + +import com.google.common.io.Files +import org.apache.spark.sql.{QueryTest, _} import org.apache.spark.sql.hive.test.TestHive /* Implicits */ @@ -91,4 +93,32 @@ class InsertIntoHiveTableSuite extends QueryTest {
spark git commit: [SPARK-4292][SQL] Result set iterator bug in JDBC/ODBC
Repository: spark Updated Branches: refs/heads/branch-1.2 c96da3676 - 47bd8f302 [SPARK-4292][SQL] Result set iterator bug in JDBC/ODBC select * from src, get the wrong result set as follows: ``` ... | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | ... ``` Author: wangfei wangf...@huawei.com Closes #3149 from scwf/SPARK-4292 and squashes the following commits: 1574a43 [wangfei] using result.collect 8b2d845 [wangfei] adding test f64eddf [wangfei] result set iter bug (cherry picked from commit d6e55524437026c0c76addeba8f99249a8316716) Signed-off-by: Michael Armbrust mich...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47bd8f30 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/47bd8f30 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/47bd8f30 Branch: refs/heads/branch-1.2 Commit: 47bd8f3020149a009f605e8390c2c28f3f835191 Parents: c96da36 Author: wangfei wangf...@huawei.com Authored: Fri Nov 7 12:55:11 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 12:55:40 2014 -0800 -- .../thriftserver/HiveThriftServer2Suite.scala | 23 .../spark/sql/hive/thriftserver/Shim12.scala| 5 ++--- .../spark/sql/hive/thriftserver/Shim13.scala| 5 ++--- 3 files changed, 27 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/47bd8f30/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala -- diff --git a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala index 65d910a..bba29b2 100644 --- a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala +++ b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala @@ -267,4 +267,27 @@ class HiveThriftServer2Suite extends FunSuite with Logging { assert(resultSet.getString(1) === sspark.sql.hive.version=${HiveShim.version}) } } + + test(SPARK-4292 regression: result set iterator issue) { +withJdbcStatement() { statement = + val dataFilePath = + Thread.currentThread().getContextClassLoader.getResource(data/files/small_kv.txt) + + val queries = Seq( +DROP TABLE IF EXISTS test_4292, +CREATE TABLE test_4292(key INT, val STRING), +sLOAD DATA LOCAL INPATH '$dataFilePath' OVERWRITE INTO TABLE test_4292) + + queries.foreach(statement.execute) + + val resultSet = statement.executeQuery(SELECT key FROM test_4292) + + Seq(238, 86, 311, 27, 165).foreach { key = +resultSet.next() +assert(resultSet.getInt(1) == key) + } + + statement.executeQuery(DROP TABLE IF EXISTS test_4292) +} + } } http://git-wip-us.apache.org/repos/asf/spark/blob/47bd8f30/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala -- diff --git a/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala b/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala index 8077d0e..e3ba991 100644 --- a/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala +++ b/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala @@ -202,13 +202,12 @@ private[hive] class SparkExecuteStatementOperation( hiveContext.sparkContext.setLocalProperty(spark.scheduler.pool, pool) } iter = { -val resultRdd = result.queryExecution.toRdd val useIncrementalCollect = hiveContext.getConf(spark.sql.thriftServer.incrementalCollect, false).toBoolean if (useIncrementalCollect) { - resultRdd.toLocalIterator + result.toLocalIterator } else { - resultRdd.collect().iterator + result.collect().iterator } } dataTypes = result.queryExecution.analyzed.output.map(_.dataType).toArray
spark git commit: [SPARK-4292][SQL] Result set iterator bug in JDBC/ODBC
Repository: spark Updated Branches: refs/heads/master ac70c972a - d6e555244 [SPARK-4292][SQL] Result set iterator bug in JDBC/ODBC select * from src, get the wrong result set as follows: ``` ... | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 309 | val_309 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | | 97 | val_97 | ... ``` Author: wangfei wangf...@huawei.com Closes #3149 from scwf/SPARK-4292 and squashes the following commits: 1574a43 [wangfei] using result.collect 8b2d845 [wangfei] adding test f64eddf [wangfei] result set iter bug Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d6e55524 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d6e55524 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d6e55524 Branch: refs/heads/master Commit: d6e55524437026c0c76addeba8f99249a8316716 Parents: ac70c97 Author: wangfei wangf...@huawei.com Authored: Fri Nov 7 12:55:11 2014 -0800 Committer: Michael Armbrust mich...@databricks.com Committed: Fri Nov 7 12:55:11 2014 -0800 -- .../thriftserver/HiveThriftServer2Suite.scala | 23 .../spark/sql/hive/thriftserver/Shim12.scala| 5 ++--- .../spark/sql/hive/thriftserver/Shim13.scala| 5 ++--- 3 files changed, 27 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d6e55524/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala -- diff --git a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala index 65d910a..bba29b2 100644 --- a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala +++ b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala @@ -267,4 +267,27 @@ class HiveThriftServer2Suite extends FunSuite with Logging { assert(resultSet.getString(1) === sspark.sql.hive.version=${HiveShim.version}) } } + + test(SPARK-4292 regression: result set iterator issue) { +withJdbcStatement() { statement = + val dataFilePath = + Thread.currentThread().getContextClassLoader.getResource(data/files/small_kv.txt) + + val queries = Seq( +DROP TABLE IF EXISTS test_4292, +CREATE TABLE test_4292(key INT, val STRING), +sLOAD DATA LOCAL INPATH '$dataFilePath' OVERWRITE INTO TABLE test_4292) + + queries.foreach(statement.execute) + + val resultSet = statement.executeQuery(SELECT key FROM test_4292) + + Seq(238, 86, 311, 27, 165).foreach { key = +resultSet.next() +assert(resultSet.getInt(1) == key) + } + + statement.executeQuery(DROP TABLE IF EXISTS test_4292) +} + } } http://git-wip-us.apache.org/repos/asf/spark/blob/d6e55524/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala -- diff --git a/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala b/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala index 8077d0e..e3ba991 100644 --- a/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala +++ b/sql/hive-thriftserver/v0.12.0/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim12.scala @@ -202,13 +202,12 @@ private[hive] class SparkExecuteStatementOperation( hiveContext.sparkContext.setLocalProperty(spark.scheduler.pool, pool) } iter = { -val resultRdd = result.queryExecution.toRdd val useIncrementalCollect = hiveContext.getConf(spark.sql.thriftServer.incrementalCollect, false).toBoolean if (useIncrementalCollect) { - resultRdd.toLocalIterator + result.toLocalIterator } else { - resultRdd.collect().iterator + result.collect().iterator } } dataTypes = result.queryExecution.analyzed.output.map(_.dataType).toArray http://git-wip-us.apache.org/repos/asf/spark/blob/d6e55524/sql/hive-thriftserver/v0.13.1/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim13.scala -- diff --git
spark git commit: Update JavaCustomReceiver.java
Repository: spark Updated Branches: refs/heads/master d6e555244 - 7c9ec529a Update JavaCustomReceiver.java æ°ç»ä¸æ è¶ç Author: xiao321 1042460...@qq.com Closes #3153 from xiao321/patch-1 and squashes the following commits: 0ed17b5 [xiao321] Update JavaCustomReceiver.java Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7c9ec529 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7c9ec529 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7c9ec529 Branch: refs/heads/master Commit: 7c9ec529a3483fab48f728481dd1d3663369e50a Parents: d6e5552 Author: xiao321 1042460...@qq.com Authored: Fri Nov 7 12:56:49 2014 -0800 Committer: Tathagata Das tathagata.das1...@gmail.com Committed: Fri Nov 7 12:56:49 2014 -0800 -- .../org/apache/spark/examples/streaming/JavaCustomReceiver.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7c9ec529/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java index 981bc4f..99df259 100644 --- a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java +++ b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java @@ -70,7 +70,7 @@ public class JavaCustomReceiver extends ReceiverString { // Create a input stream with the custom receiver on target ip:port and count the // words in input stream of \n delimited text (eg. generated by 'nc') JavaReceiverInputDStreamString lines = ssc.receiverStream( - new JavaCustomReceiver(args[1], Integer.parseInt(args[2]))); + new JavaCustomReceiver(args[0], Integer.parseInt(args[1]))); JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, String() { @Override public IterableString call(String x) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Update JavaCustomReceiver.java
Repository: spark Updated Branches: refs/heads/branch-1.2 47bd8f302 - 8cefb63c1 Update JavaCustomReceiver.java æ°ç»ä¸æ è¶ç Author: xiao321 1042460...@qq.com Closes #3153 from xiao321/patch-1 and squashes the following commits: 0ed17b5 [xiao321] Update JavaCustomReceiver.java (cherry picked from commit 7c9ec529a3483fab48f728481dd1d3663369e50a) Signed-off-by: Tathagata Das tathagata.das1...@gmail.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8cefb63c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8cefb63c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8cefb63c Branch: refs/heads/branch-1.2 Commit: 8cefb63c122e7c7cf4af959f9606f4491148d9f4 Parents: 47bd8f3 Author: xiao321 1042460...@qq.com Authored: Fri Nov 7 12:56:49 2014 -0800 Committer: Tathagata Das tathagata.das1...@gmail.com Committed: Fri Nov 7 12:57:17 2014 -0800 -- .../org/apache/spark/examples/streaming/JavaCustomReceiver.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8cefb63c/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java index 981bc4f..99df259 100644 --- a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java +++ b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java @@ -70,7 +70,7 @@ public class JavaCustomReceiver extends ReceiverString { // Create a input stream with the custom receiver on target ip:port and count the // words in input stream of \n delimited text (eg. generated by 'nc') JavaReceiverInputDStreamString lines = ssc.receiverStream( - new JavaCustomReceiver(args[1], Integer.parseInt(args[2]))); + new JavaCustomReceiver(args[0], Integer.parseInt(args[1]))); JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, String() { @Override public IterableString call(String x) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Update JavaCustomReceiver.java
Repository: spark Updated Branches: refs/heads/branch-1.1 0a40eac25 - 4fb26df87 Update JavaCustomReceiver.java æ°ç»ä¸æ è¶ç Author: xiao321 1042460...@qq.com Closes #3153 from xiao321/patch-1 and squashes the following commits: 0ed17b5 [xiao321] Update JavaCustomReceiver.java (cherry picked from commit 7c9ec529a3483fab48f728481dd1d3663369e50a) Signed-off-by: Tathagata Das tathagata.das1...@gmail.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4fb26df8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4fb26df8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4fb26df8 Branch: refs/heads/branch-1.1 Commit: 4fb26df8748ea7dda11db8c2b99f4b08da25bb4e Parents: 0a40eac Author: xiao321 1042460...@qq.com Authored: Fri Nov 7 12:56:49 2014 -0800 Committer: Tathagata Das tathagata.das1...@gmail.com Committed: Fri Nov 7 12:57:38 2014 -0800 -- .../org/apache/spark/examples/streaming/JavaCustomReceiver.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4fb26df8/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java index 5622df5..f92615d 100644 --- a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java +++ b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java @@ -70,7 +70,7 @@ public class JavaCustomReceiver extends ReceiverString { // Create a input stream with the custom receiver on target ip:port and count the // words in input stream of \n delimited text (eg. generated by 'nc') JavaReceiverInputDStreamString lines = ssc.receiverStream( - new JavaCustomReceiver(args[1], Integer.parseInt(args[2]))); + new JavaCustomReceiver(args[0], Integer.parseInt(args[1]))); JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, String() { @Override public IterableString call(String x) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Update JavaCustomReceiver.java
Repository: spark Updated Branches: refs/heads/branch-1.0 76c20cac9 - 18c8c3833 Update JavaCustomReceiver.java æ°ç»ä¸æ è¶ç Author: xiao321 1042460...@qq.com Closes #3153 from xiao321/patch-1 and squashes the following commits: 0ed17b5 [xiao321] Update JavaCustomReceiver.java (cherry picked from commit 7c9ec529a3483fab48f728481dd1d3663369e50a) Signed-off-by: Tathagata Das tathagata.das1...@gmail.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/18c8c383 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/18c8c383 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/18c8c383 Branch: refs/heads/branch-1.0 Commit: 18c8c3833d8508ff1ac1cf2c02060c41e46908c1 Parents: 76c20ca Author: xiao321 1042460...@qq.com Authored: Fri Nov 7 12:56:49 2014 -0800 Committer: Tathagata Das tathagata.das1...@gmail.com Committed: Fri Nov 7 12:58:02 2014 -0800 -- .../org/apache/spark/examples/streaming/JavaCustomReceiver.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/18c8c383/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java index 5622df5..f92615d 100644 --- a/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java +++ b/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java @@ -70,7 +70,7 @@ public class JavaCustomReceiver extends ReceiverString { // Create a input stream with the custom receiver on target ip:port and count the // words in input stream of \n delimited text (eg. generated by 'nc') JavaReceiverInputDStreamString lines = ssc.receiverStream( - new JavaCustomReceiver(args[1], Integer.parseInt(args[2]))); + new JavaCustomReceiver(args[0], Integer.parseInt(args[1]))); JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, String() { @Override public IterableString call(String x) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: MAINTENANCE: Automated closing of pull requests.
Repository: spark Updated Branches: refs/heads/master 7c9ec529a - 5923dd986 MAINTENANCE: Automated closing of pull requests. This commit exists to close the following pull requests on Github: Closes #3016 (close requested by 'andrewor14') Closes #2798 (close requested by 'andrewor14') Closes #2864 (close requested by 'andrewor14') Closes #3154 (close requested by 'JoshRosen') Closes #3156 (close requested by 'JoshRosen') Closes #214 (close requested by 'kayousterhout') Closes #2584 (close requested by 'andrewor14') Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5923dd98 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5923dd98 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5923dd98 Branch: refs/heads/master Commit: 5923dd986ba26d0fcc8707dd8d16863f1c1005cb Parents: 7c9ec52 Author: Patrick Wendell pwend...@gmail.com Authored: Fri Nov 7 13:08:25 2014 -0800 Committer: Patrick Wendell pwend...@gmail.com Committed: Fri Nov 7 13:08:25 2014 -0800 -- -- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4304] [PySpark] Fix sort on empty RDD
Repository: spark Updated Branches: refs/heads/master 5923dd986 - 777910979 [SPARK-4304] [PySpark] Fix sort on empty RDD This PR fix sortBy()/sortByKey() on empty RDD. This should be back ported into 1.1/1.2 Author: Davies Liu dav...@databricks.com Closes #3162 from davies/fix_sort and squashes the following commits: 84f64b7 [Davies Liu] add tests 52995b5 [Davies Liu] fix sortByKey() on empty RDD Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/77791097 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/77791097 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/77791097 Branch: refs/heads/master Commit: 7779109796c90d789464ab0be35917f963bbe867 Parents: 5923dd9 Author: Davies Liu dav...@databricks.com Authored: Fri Nov 7 20:53:03 2014 -0800 Committer: Josh Rosen joshro...@databricks.com Committed: Fri Nov 7 20:53:03 2014 -0800 -- python/pyspark/rdd.py | 2 ++ python/pyspark/tests.py | 3 +++ 2 files changed, 5 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/77791097/python/pyspark/rdd.py -- diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py index 879655d..08d0474 100644 --- a/python/pyspark/rdd.py +++ b/python/pyspark/rdd.py @@ -521,6 +521,8 @@ class RDD(object): # the key-space into bins such that the bins have roughly the same # number of (key, value) pairs falling into them rddSize = self.count() +if not rddSize: +return self # empty RDD maxSampleSize = numPartitions * 20.0 # constant from Spark's RangePartitioner fraction = min(maxSampleSize / max(rddSize, 1), 1.0) samples = self.sample(False, fraction, 1).map(lambda (k, v): k).collect() http://git-wip-us.apache.org/repos/asf/spark/blob/77791097/python/pyspark/tests.py -- diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index 9f625c5..491e445 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -649,6 +649,9 @@ class RDDTests(ReusedPySparkTestCase): self.assertEquals(result.getNumPartitions(), 5) self.assertEquals(result.count(), 3) +def test_sort_on_empty_rdd(self): +self.assertEqual([], self.sc.parallelize(zip([], [])).sortByKey().collect()) + def test_sample(self): rdd = self.sc.parallelize(range(0, 100), 4) wo = rdd.sample(False, 0.1, 2).collect() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4304] [PySpark] Fix sort on empty RDD
Repository: spark Updated Branches: refs/heads/branch-1.2 8cefb63c1 - 3b07c483a [SPARK-4304] [PySpark] Fix sort on empty RDD This PR fix sortBy()/sortByKey() on empty RDD. This should be back ported into 1.1/1.2 Author: Davies Liu dav...@databricks.com Closes #3162 from davies/fix_sort and squashes the following commits: 84f64b7 [Davies Liu] add tests 52995b5 [Davies Liu] fix sortByKey() on empty RDD (cherry picked from commit 7779109796c90d789464ab0be35917f963bbe867) Signed-off-by: Josh Rosen joshro...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3b07c483 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3b07c483 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3b07c483 Branch: refs/heads/branch-1.2 Commit: 3b07c483aa98965ac9dc8fdcc40e593e4edb97fd Parents: 8cefb63 Author: Davies Liu dav...@databricks.com Authored: Fri Nov 7 20:53:03 2014 -0800 Committer: Josh Rosen joshro...@databricks.com Committed: Fri Nov 7 20:53:34 2014 -0800 -- python/pyspark/rdd.py | 2 ++ python/pyspark/tests.py | 3 +++ 2 files changed, 5 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3b07c483/python/pyspark/rdd.py -- diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py index 879655d..08d0474 100644 --- a/python/pyspark/rdd.py +++ b/python/pyspark/rdd.py @@ -521,6 +521,8 @@ class RDD(object): # the key-space into bins such that the bins have roughly the same # number of (key, value) pairs falling into them rddSize = self.count() +if not rddSize: +return self # empty RDD maxSampleSize = numPartitions * 20.0 # constant from Spark's RangePartitioner fraction = min(maxSampleSize / max(rddSize, 1), 1.0) samples = self.sample(False, fraction, 1).map(lambda (k, v): k).collect() http://git-wip-us.apache.org/repos/asf/spark/blob/3b07c483/python/pyspark/tests.py -- diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index 9f625c5..491e445 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -649,6 +649,9 @@ class RDDTests(ReusedPySparkTestCase): self.assertEquals(result.getNumPartitions(), 5) self.assertEquals(result.count(), 3) +def test_sort_on_empty_rdd(self): +self.assertEqual([], self.sc.parallelize(zip([], [])).sortByKey().collect()) + def test_sample(self): rdd = self.sc.parallelize(range(0, 100), 4) wo = rdd.sample(False, 0.1, 2).collect() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4304] [PySpark] Fix sort on empty RDD
Repository: spark Updated Branches: refs/heads/branch-1.1 4fb26df87 - 4895f6544 [SPARK-4304] [PySpark] Fix sort on empty RDD This PR fix sortBy()/sortByKey() on empty RDD. This should be back ported into 1.1/1.2 Author: Davies Liu dav...@databricks.com Closes #3162 from davies/fix_sort and squashes the following commits: 84f64b7 [Davies Liu] add tests 52995b5 [Davies Liu] fix sortByKey() on empty RDD (cherry picked from commit 7779109796c90d789464ab0be35917f963bbe867) Signed-off-by: Josh Rosen joshro...@databricks.com Conflicts: python/pyspark/tests.py Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4895f654 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4895f654 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4895f654 Branch: refs/heads/branch-1.1 Commit: 4895f65447aa2338729fccb5200efa29a9d62163 Parents: 4fb26df Author: Davies Liu dav...@databricks.com Authored: Fri Nov 7 20:53:03 2014 -0800 Committer: Josh Rosen joshro...@databricks.com Committed: Fri Nov 7 20:55:12 2014 -0800 -- python/pyspark/rdd.py | 2 ++ python/pyspark/tests.py | 3 +++ 2 files changed, 5 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4895f654/python/pyspark/rdd.py -- diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py index 3f81550..ac8ceff 100644 --- a/python/pyspark/rdd.py +++ b/python/pyspark/rdd.py @@ -598,6 +598,8 @@ class RDD(object): # the key-space into bins such that the bins have roughly the same # number of (key, value) pairs falling into them rddSize = self.count() +if not rddSize: +return self # empty RDD maxSampleSize = numPartitions * 20.0 # constant from Spark's RangePartitioner fraction = min(maxSampleSize / max(rddSize, 1), 1.0) samples = self.sample(False, fraction, 1).map(lambda (k, v): k).collect() http://git-wip-us.apache.org/repos/asf/spark/blob/4895f654/python/pyspark/tests.py -- diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index 5cea1b0..b4a9c59 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -470,6 +470,9 @@ class TestRDDFunctions(PySparkTestCase): self.assertEquals(([1, b], [5]), rdd.histogram(1)) self.assertRaises(TypeError, lambda: rdd.histogram(2)) +def test_sort_on_empty_rdd(self): +self.assertEqual([], self.sc.parallelize(zip([], [])).sortByKey().collect()) + def test_sample(self): rdd = self.sc.parallelize(range(0, 100), 4) wo = rdd.sample(False, 0.1, 2).collect() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4304] [PySpark] Fix sort on empty RDD (1.0 branch)
Repository: spark Updated Branches: refs/heads/branch-1.0 18c8c3833 - d4aed266d [SPARK-4304] [PySpark] Fix sort on empty RDD (1.0 branch) This PR fix sortBy()/sortByKey() on empty RDD. This should be back ported into 1.0 Author: Davies Liu dav...@databricks.com Closes #3163 from davies/fix_sort_1.0 and squashes the following commits: 9be984f [Davies Liu] fix sort on empty RDD Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d4aed266 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d4aed266 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d4aed266 Branch: refs/heads/branch-1.0 Commit: d4aed266d3db3cb3aea711f30aa058c74bfe60a5 Parents: 18c8c38 Author: Davies Liu dav...@databricks.com Authored: Fri Nov 7 20:57:56 2014 -0800 Committer: Josh Rosen joshro...@databricks.com Committed: Fri Nov 7 20:57:56 2014 -0800 -- python/pyspark/rdd.py | 2 ++ python/pyspark/tests.py | 3 +++ 2 files changed, 5 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d4aed266/python/pyspark/rdd.py -- diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py index 368ab50..57c2cd7 100644 --- a/python/pyspark/rdd.py +++ b/python/pyspark/rdd.py @@ -496,6 +496,8 @@ class RDD(object): # number of (key, value) pairs falling into them if numPartitions 1: rddSize = self.count() +if not rddSize: +return self maxSampleSize = numPartitions * 20.0 # constant from Spark's RangePartitioner fraction = min(maxSampleSize / max(rddSize, 1), 1.0) http://git-wip-us.apache.org/repos/asf/spark/blob/d4aed266/python/pyspark/tests.py -- diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index 45284ee..8f5b48d 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -198,6 +198,9 @@ class TestRDDFunctions(PySparkTestCase): os.unlink(tempFile.name) self.assertRaises(Exception, lambda: filtered_data.count()) +def test_sort_on_empty_rdd(self): +self.assertEqual([], self.sc.parallelize(zip([], [])).sortByKey().collect()) + def test_itemgetter(self): rdd = self.sc.parallelize([range(10)]) from operator import itemgetter - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegative ALS in the python API
Repository: spark Updated Branches: refs/heads/master 777910979 - 7e9d97567 [MLLIB] [PYTHON] SPARK-4221: Expose nonnegative ALS in the python API SPARK-1553 added alternating nonnegative least squares to MLLib, however it's not possible to access it via the python API. This pull request resolves that. Author: Michelangelo D'Agostino mdagost...@civisanalytics.com Closes #3095 from mdagost/python_nmf and squashes the following commits: a6743ad [Michelangelo D'Agostino] Use setters instead of static methods in PythonMLLibAPI. Remove the new static methods I added. Set seed in tests. Change ratings to ratingsRDD in both train and trainImplicit for consistency. 7cffd39 [Michelangelo D'Agostino] Swapped nonnegative and seed in a few more places. 3fdc851 [Michelangelo D'Agostino] Moved seed to the end of the python parameter list. bdcc154 [Michelangelo D'Agostino] Change seed type to java.lang.Long so that it can handle null. cedf043 [Michelangelo D'Agostino] Added in ability to set the seed from python and made that play nice with the nonnegative changes. Also made the python ALS tests more exact. a72fdc9 [Michelangelo D'Agostino] Expose nonnegative ALS in the python API. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7e9d9756 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7e9d9756 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7e9d9756 Branch: refs/heads/master Commit: 7e9d975676d56ace0e84c2200137e4cd4eba074a Parents: 7779109 Author: Michelangelo D'Agostino mdagost...@civisanalytics.com Authored: Fri Nov 7 22:53:01 2014 -0800 Committer: Xiangrui Meng m...@databricks.com Committed: Fri Nov 7 22:53:01 2014 -0800 -- .../spark/mllib/api/python/PythonMLLibAPI.scala | 39 --- python/pyspark/mllib/recommendation.py | 40 2 files changed, 58 insertions(+), 21 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7e9d9756/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala index d832ae3..70d7138 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala @@ -275,12 +275,25 @@ class PythonMLLibAPI extends Serializable { * the Py4J documentation. */ def trainALSModel( - ratings: JavaRDD[Rating], + ratingsJRDD: JavaRDD[Rating], rank: Int, iterations: Int, lambda: Double, - blocks: Int): MatrixFactorizationModel = { -new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, iterations, lambda, blocks)) + blocks: Int, + nonnegative: Boolean, + seed: java.lang.Long): MatrixFactorizationModel = { + +val als = new ALS() + .setRank(rank) + .setIterations(iterations) + .setLambda(lambda) + .setBlocks(blocks) + .setNonnegative(nonnegative) + +if (seed != null) als.setSeed(seed) + +val model = als.run(ratingsJRDD.rdd) +new MatrixFactorizationModelWrapper(model) } /** @@ -295,9 +308,23 @@ class PythonMLLibAPI extends Serializable { iterations: Int, lambda: Double, blocks: Int, - alpha: Double): MatrixFactorizationModel = { -new MatrixFactorizationModelWrapper( - ALS.trainImplicit(ratingsJRDD.rdd, rank, iterations, lambda, blocks, alpha)) + alpha: Double, + nonnegative: Boolean, + seed: java.lang.Long): MatrixFactorizationModel = { + +val als = new ALS() + .setImplicitPrefs(true) + .setRank(rank) + .setIterations(iterations) + .setLambda(lambda) + .setBlocks(blocks) + .setAlpha(alpha) + .setNonnegative(nonnegative) + +if (seed != null) als.setSeed(seed) + +val model = als.run(ratingsJRDD.rdd) +new MatrixFactorizationModelWrapper(model) } /** http://git-wip-us.apache.org/repos/asf/spark/blob/7e9d9756/python/pyspark/mllib/recommendation.py -- diff --git a/python/pyspark/mllib/recommendation.py b/python/pyspark/mllib/recommendation.py index e8b9984..e26b152 100644 --- a/python/pyspark/mllib/recommendation.py +++ b/python/pyspark/mllib/recommendation.py @@ -44,31 +44,39 @@ class MatrixFactorizationModel(JavaModelWrapper): r2 = (1, 2, 2.0) r3 = (2, 1, 2.0) ratings = sc.parallelize([r1, r2, r3]) - model = ALS.trainImplicit(ratings, 1) - model.predict(2,2) is not None -True + model = ALS.trainImplicit(ratings, 1, seed=10) +
spark git commit: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegative ALS in the python API
Repository: spark Updated Branches: refs/heads/branch-1.2 3b07c483a - 427d7911f [MLLIB] [PYTHON] SPARK-4221: Expose nonnegative ALS in the python API SPARK-1553 added alternating nonnegative least squares to MLLib, however it's not possible to access it via the python API. This pull request resolves that. Author: Michelangelo D'Agostino mdagost...@civisanalytics.com Closes #3095 from mdagost/python_nmf and squashes the following commits: a6743ad [Michelangelo D'Agostino] Use setters instead of static methods in PythonMLLibAPI. Remove the new static methods I added. Set seed in tests. Change ratings to ratingsRDD in both train and trainImplicit for consistency. 7cffd39 [Michelangelo D'Agostino] Swapped nonnegative and seed in a few more places. 3fdc851 [Michelangelo D'Agostino] Moved seed to the end of the python parameter list. bdcc154 [Michelangelo D'Agostino] Change seed type to java.lang.Long so that it can handle null. cedf043 [Michelangelo D'Agostino] Added in ability to set the seed from python and made that play nice with the nonnegative changes. Also made the python ALS tests more exact. a72fdc9 [Michelangelo D'Agostino] Expose nonnegative ALS in the python API. (cherry picked from commit 7e9d975676d56ace0e84c2200137e4cd4eba074a) Signed-off-by: Xiangrui Meng m...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/427d7911 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/427d7911 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/427d7911 Branch: refs/heads/branch-1.2 Commit: 427d7911f527e00e75dec0498b4bbdbe164db7ca Parents: 3b07c48 Author: Michelangelo D'Agostino mdagost...@civisanalytics.com Authored: Fri Nov 7 22:53:01 2014 -0800 Committer: Xiangrui Meng m...@databricks.com Committed: Fri Nov 7 22:53:22 2014 -0800 -- .../spark/mllib/api/python/PythonMLLibAPI.scala | 39 --- python/pyspark/mllib/recommendation.py | 40 2 files changed, 58 insertions(+), 21 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/427d7911/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala index d832ae3..70d7138 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala @@ -275,12 +275,25 @@ class PythonMLLibAPI extends Serializable { * the Py4J documentation. */ def trainALSModel( - ratings: JavaRDD[Rating], + ratingsJRDD: JavaRDD[Rating], rank: Int, iterations: Int, lambda: Double, - blocks: Int): MatrixFactorizationModel = { -new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, iterations, lambda, blocks)) + blocks: Int, + nonnegative: Boolean, + seed: java.lang.Long): MatrixFactorizationModel = { + +val als = new ALS() + .setRank(rank) + .setIterations(iterations) + .setLambda(lambda) + .setBlocks(blocks) + .setNonnegative(nonnegative) + +if (seed != null) als.setSeed(seed) + +val model = als.run(ratingsJRDD.rdd) +new MatrixFactorizationModelWrapper(model) } /** @@ -295,9 +308,23 @@ class PythonMLLibAPI extends Serializable { iterations: Int, lambda: Double, blocks: Int, - alpha: Double): MatrixFactorizationModel = { -new MatrixFactorizationModelWrapper( - ALS.trainImplicit(ratingsJRDD.rdd, rank, iterations, lambda, blocks, alpha)) + alpha: Double, + nonnegative: Boolean, + seed: java.lang.Long): MatrixFactorizationModel = { + +val als = new ALS() + .setImplicitPrefs(true) + .setRank(rank) + .setIterations(iterations) + .setLambda(lambda) + .setBlocks(blocks) + .setAlpha(alpha) + .setNonnegative(nonnegative) + +if (seed != null) als.setSeed(seed) + +val model = als.run(ratingsJRDD.rdd) +new MatrixFactorizationModelWrapper(model) } /** http://git-wip-us.apache.org/repos/asf/spark/blob/427d7911/python/pyspark/mllib/recommendation.py -- diff --git a/python/pyspark/mllib/recommendation.py b/python/pyspark/mllib/recommendation.py index e8b9984..e26b152 100644 --- a/python/pyspark/mllib/recommendation.py +++ b/python/pyspark/mllib/recommendation.py @@ -44,31 +44,39 @@ class MatrixFactorizationModel(JavaModelWrapper): r2 = (1, 2, 2.0) r3 = (2, 1, 2.0) ratings = sc.parallelize([r1, r2, r3]) - model =
spark git commit: [SPARK-4291][Build] Rename network module projects
Repository: spark Updated Branches: refs/heads/branch-1.2 427d7911f - fc51de339 [SPARK-4291][Build] Rename network module projects The names of the recently introduced network modules are inconsistent with those of the other modules in the project. We should just drop the Code suffix since it doesn't sacrifice any meaning, especially before they get into an official release. ``` [INFO] Reactor Build Order: [INFO] [INFO] Spark Project Parent POM [INFO] Spark Project Common Network Code [INFO] Spark Project Shuffle Streaming Service Code [INFO] Spark Project Core [INFO] Spark Project Bagel [INFO] Spark Project GraphX [INFO] Spark Project Streaming [INFO] Spark Project Catalyst [INFO] Spark Project SQL [INFO] Spark Project ML Library [INFO] Spark Project Tools [INFO] Spark Project Hive [INFO] Spark Project REPL [INFO] Spark Project YARN Parent POM [INFO] Spark Project YARN Stable API [INFO] Spark Project Assembly [INFO] Spark Project External Twitter [INFO] Spark Project External Kafka [INFO] Spark Project External Flume Sink [INFO] Spark Project External Flume [INFO] Spark Project External ZeroMQ [INFO] Spark Project External MQTT [INFO] Spark Project Examples [INFO] Spark Project Yarn Shuffle Service Code ``` Author: Andrew Or and...@databricks.com Closes #3148 from andrewor14/build-drop-code and squashes the following commits: eac839b [Andrew Or] Network - Networking d01ad47 [Andrew Or] Rename network module project names (cherry picked from commit 7afc8564f33eb2868f458f85046f59a51b516ed6) Signed-off-by: Patrick Wendell pwend...@gmail.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fc51de33 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fc51de33 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fc51de33 Branch: refs/heads/branch-1.2 Commit: fc51de3395f25983052ae9d3c5c17891f6e6b8a7 Parents: 427d791 Author: Andrew Or and...@databricks.com Authored: Fri Nov 7 23:16:13 2014 -0800 Committer: Patrick Wendell pwend...@gmail.com Committed: Fri Nov 7 23:16:38 2014 -0800 -- network/common/pom.xml | 2 +- network/shuffle/pom.xml | 2 +- network/yarn/pom.xml| 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/fc51de33/network/common/pom.xml -- diff --git a/network/common/pom.xml b/network/common/pom.xml index 6144548..8b24ebf 100644 --- a/network/common/pom.xml +++ b/network/common/pom.xml @@ -29,7 +29,7 @@ groupIdorg.apache.spark/groupId artifactIdspark-network-common_2.10/artifactId packagingjar/packaging - nameSpark Project Common Network Code/name + nameSpark Project Networking/name urlhttp://spark.apache.org//url properties sbt.project.namenetwork-common/sbt.project.name http://git-wip-us.apache.org/repos/asf/spark/blob/fc51de33/network/shuffle/pom.xml -- diff --git a/network/shuffle/pom.xml b/network/shuffle/pom.xml index fe5681d..27c8467 100644 --- a/network/shuffle/pom.xml +++ b/network/shuffle/pom.xml @@ -29,7 +29,7 @@ groupIdorg.apache.spark/groupId artifactIdspark-network-shuffle_2.10/artifactId packagingjar/packaging - nameSpark Project Shuffle Streaming Service Code/name + nameSpark Project Shuffle Streaming Service/name urlhttp://spark.apache.org//url properties sbt.project.namenetwork-shuffle/sbt.project.name http://git-wip-us.apache.org/repos/asf/spark/blob/fc51de33/network/yarn/pom.xml -- diff --git a/network/yarn/pom.xml b/network/yarn/pom.xml index e60d8c1..6e6f6f3 100644 --- a/network/yarn/pom.xml +++ b/network/yarn/pom.xml @@ -29,7 +29,7 @@ groupIdorg.apache.spark/groupId artifactIdspark-network-yarn_2.10/artifactId packagingjar/packaging - nameSpark Project Yarn Shuffle Service Code/name + nameSpark Project YARN Shuffle Service/name urlhttp://spark.apache.org//url properties sbt.project.namenetwork-yarn/sbt.project.name - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org