spark git commit: [SQL][MINOR] use stricter type parameter to make it clear that parquet reader returns UnsafeRow
Repository: spark Updated Branches: refs/heads/master 386127377 -> ae226283e [SQL][MINOR] use stricter type parameter to make it clear that parquet reader returns UnsafeRow ## What changes were proposed in this pull request? a small code style change, it's better to make the type parameter more accurate. ## How was this patch tested? N/A Author: Wenchen Fan Closes #14458 from cloud-fan/parquet. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ae226283 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ae226283 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ae226283 Branch: refs/heads/master Commit: ae226283e19ce396216c73b0ae2470efa122b65b Parents: 3861273 Author: Wenchen Fan Authored: Wed Aug 3 08:23:26 2016 +0800 Committer: Cheng Lian Committed: Wed Aug 3 08:23:26 2016 +0800 -- .../execution/datasources/parquet/ParquetFileFormat.scala | 4 ++-- .../datasources/parquet/ParquetReadSupport.scala | 10 +- .../datasources/parquet/ParquetRecordMaterializer.scala | 6 +++--- 3 files changed, 10 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ae226283/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala index 772e031..c3e75f1 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala @@ -370,11 +370,11 @@ private[sql] class ParquetFileFormat logDebug(s"Falling back to parquet-mr") val reader = pushed match { case Some(filter) => -new ParquetRecordReader[InternalRow]( +new ParquetRecordReader[UnsafeRow]( new ParquetReadSupport, FilterCompat.get(filter, null)) case _ => -new ParquetRecordReader[InternalRow](new ParquetReadSupport) +new ParquetRecordReader[UnsafeRow](new ParquetReadSupport) } reader.initialize(split, hadoopAttemptContext) reader http://git-wip-us.apache.org/repos/asf/spark/blob/ae226283/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala index 8a2e0d7..f1a35dd 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala @@ -29,12 +29,12 @@ import org.apache.parquet.schema._ import org.apache.parquet.schema.Type.Repetition import org.apache.spark.internal.Logging -import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.UnsafeRow import org.apache.spark.sql.types._ /** * A Parquet [[ReadSupport]] implementation for reading Parquet records as Catalyst - * [[InternalRow]]s. + * [[UnsafeRow]]s. * * The API interface of [[ReadSupport]] is a little bit over complicated because of historical * reasons. In older versions of parquet-mr (say 1.6.0rc3 and prior), [[ReadSupport]] need to be @@ -48,7 +48,7 @@ import org.apache.spark.sql.types._ * Due to this reason, we no longer rely on [[ReadContext]] to pass requested schema from [[init()]] * to [[prepareForRead()]], but use a private `var` for simplicity. */ -private[parquet] class ParquetReadSupport extends ReadSupport[InternalRow] with Logging { +private[parquet] class ParquetReadSupport extends ReadSupport[UnsafeRow] with Logging { private var catalystRequestedSchema: StructType = _ /** @@ -72,13 +72,13 @@ private[parquet] class ParquetReadSupport extends ReadSupport[InternalRow] with /** * Called on executor side after [[init()]], before instantiating actual Parquet record readers. * Responsible for instantiating [[RecordMaterializer]], which is used for converting Parquet - * records to Catalyst [[InternalRow]]s. + * records to Catalyst [[UnsafeRow]]s. */ override def prepareForRead( conf: Configuration, keyValueMetaData: JMap[String, String], fileSchema: MessageType, - readContext: ReadContext): RecordMaterializer[Inter
spark git commit: [SPARK-16796][WEB UI] Visible passwords on Spark environment page
Repository: spark Updated Branches: refs/heads/master b73a57060 -> 386127377 [SPARK-16796][WEB UI] Visible passwords on Spark environment page ## What changes were proposed in this pull request? Mask spark.ssl.keyPassword, spark.ssl.keyStorePassword, spark.ssl.trustStorePassword in Web UI environment page. (Changes their values to * in env. page) ## How was this patch tested? I've built spark, run spark shell and checked that this values have been masked with *. Also run tests: ./dev/run-tests [info] ScalaTest [info] Run completed in 1 hour, 9 minutes, 5 seconds. [info] Total number of tests run: 2166 [info] Suites: completed 65, aborted 0 [info] Tests: succeeded 2166, failed 0, canceled 0, ignored 590, pending 0 [info] All tests passed. ![mask](https://cloud.githubusercontent.com/assets/15244468/17262154/7641e132-55e2-11e6-8a6c-30ead77c7372.png) Author: Artur Sukhenko Closes #14409 from Devian-ua/maskpass. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/38612737 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/38612737 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/38612737 Branch: refs/heads/master Commit: 3861273771c2631e88e1f37a498c644ad45ac1c0 Parents: b73a570 Author: Artur Sukhenko Authored: Tue Aug 2 16:13:12 2016 -0700 Committer: Sean Owen Committed: Tue Aug 2 16:13:12 2016 -0700 -- .../main/scala/org/apache/spark/ui/env/EnvironmentPage.scala | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/38612737/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala -- diff --git a/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala b/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala index f0a1174..22136a6 100644 --- a/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala +++ b/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala @@ -26,11 +26,15 @@ import org.apache.spark.ui.{UIUtils, WebUIPage} private[ui] class EnvironmentPage(parent: EnvironmentTab) extends WebUIPage("") { private val listener = parent.listener + private def removePass(kv: (String, String)): (String, String) = { +if (kv._1.toLowerCase.contains("password")) (kv._1, "**") else kv + } + def render(request: HttpServletRequest): Seq[Node] = { val runtimeInformationTable = UIUtils.listingTable( propertyHeader, jvmRow, listener.jvmInformation, fixedWidth = true) val sparkPropertiesTable = UIUtils.listingTable( - propertyHeader, propertyRow, listener.sparkProperties, fixedWidth = true) + propertyHeader, propertyRow, listener.sparkProperties.map(removePass), fixedWidth = true) val systemPropertiesTable = UIUtils.listingTable( propertyHeader, propertyRow, listener.systemProperties, fixedWidth = true) val classpathEntriesTable = UIUtils.listingTable( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (branch-1.6)
Repository: spark Updated Branches: refs/heads/branch-1.6 8a22275de -> 797e758b1 [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (branch-1.6) ## What changes were proposed in this pull request? Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with Java 8 on Java 7 ## How was this patch tested? Compilation. Existing automatic tests Author: Maciej Brynski Closes #14390 from maver1ck/spark-15541. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/797e758b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/797e758b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/797e758b Branch: refs/heads/branch-1.6 Commit: 797e758b16946aa5779cc302f943eafec34c0c39 Parents: 8a22275 Author: Maciej Brynski Authored: Tue Aug 2 16:07:35 2016 -0700 Committer: Sean Owen Committed: Tue Aug 2 16:07:35 2016 -0700 -- .../scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/797e758b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala index 8f4ce74..e66d2ad 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala @@ -17,7 +17,7 @@ package org.apache.spark.sql.catalyst.analysis -import java.util.concurrent.ConcurrentHashMap +import java.util.concurrent.{ConcurrentHashMap, ConcurrentMap} import scala.collection.JavaConverters._ import scala.collection.mutable @@ -80,7 +80,8 @@ trait Catalog { } class SimpleCatalog(val conf: CatalystConf) extends Catalog { - private[this] val tables = new ConcurrentHashMap[String, LogicalPlan] + private[this] val tables: ConcurrentMap[String, LogicalPlan] = +new ConcurrentHashMap[String, LogicalPlan] override def registerTable(tableIdent: TableIdentifier, plan: LogicalPlan): Unit = { tables.put(getTableName(tableIdent), plan) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16858][SQL][TEST] Removal of TestHiveSharedState
Repository: spark Updated Branches: refs/heads/master e9fc0b6a8 -> b73a57060 [SPARK-16858][SQL][TEST] Removal of TestHiveSharedState ### What changes were proposed in this pull request? This PR is to remove `TestHiveSharedState`. Also, this is also associated with the Hive refractoring for removing `HiveSharedState`. ### How was this patch tested? The existing test cases Author: gatorsmile Closes #14463 from gatorsmile/removeTestHiveSharedState. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b73a5706 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b73a5706 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b73a5706 Branch: refs/heads/master Commit: b73a5706032eae7c87f7f2f8b0a72e7ee6d2e7e5 Parents: e9fc0b6 Author: gatorsmile Authored: Tue Aug 2 14:17:45 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 2 14:17:45 2016 -0700 -- .../apache/spark/sql/hive/test/TestHive.scala | 78 +--- .../spark/sql/hive/ShowCreateTableSuite.scala | 2 +- 2 files changed, 20 insertions(+), 60 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b73a5706/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala index fbacd59..cdc8d61 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala @@ -24,7 +24,6 @@ import scala.collection.JavaConverters._ import scala.collection.mutable import scala.language.implicitConversions -import org.apache.hadoop.conf.Configuration import org.apache.hadoop.hive.conf.HiveConf.ConfVars import org.apache.hadoop.hive.ql.exec.FunctionRegistry import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe @@ -40,7 +39,6 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.execution.QueryExecution import org.apache.spark.sql.execution.command.CacheTableCommand import org.apache.spark.sql.hive._ -import org.apache.spark.sql.hive.client.HiveClient import org.apache.spark.sql.internal.SQLConf import org.apache.spark.util.{ShutdownHookManager, Utils} @@ -86,8 +84,6 @@ class TestHiveContext( new TestHiveContext(sparkSession.newSession()) } - override def sharedState: TestHiveSharedState = sparkSession.sharedState - override def sessionState: TestHiveSessionState = sparkSession.sessionState def setCacheTables(c: Boolean): Unit = { @@ -112,38 +108,43 @@ class TestHiveContext( * A [[SparkSession]] used in [[TestHiveContext]]. * * @param sc SparkContext - * @param scratchDirPath scratch directory used by Hive's metastore client - * @param metastoreTemporaryConf configuration options for Hive's metastore - * @param existingSharedState optional [[TestHiveSharedState]] + * @param existingSharedState optional [[HiveSharedState]] * @param loadTestTables if true, load the test tables. They can only be loaded when running * in the JVM, i.e when calling from Python this flag has to be false. */ private[hive] class TestHiveSparkSession( @transient private val sc: SparkContext, -scratchDirPath: File, -metastoreTemporaryConf: Map[String, String], -@transient private val existingSharedState: Option[TestHiveSharedState], +@transient private val existingSharedState: Option[HiveSharedState], private val loadTestTables: Boolean) extends SparkSession(sc) with Logging { self => def this(sc: SparkContext, loadTestTables: Boolean) { this( sc, - TestHiveContext.makeScratchDir(), - HiveUtils.newTemporaryConfiguration(useInMemoryDerby = false), - None, + existingSharedState = None, loadTestTables) } + { // set the metastore temporary configuration +val metastoreTempConf = HiveUtils.newTemporaryConfiguration(useInMemoryDerby = false) ++ Map( + ConfVars.METASTORE_INTEGER_JDO_PUSHDOWN.varname -> "true", + // scratch directory used by Hive's metastore client + ConfVars.SCRATCHDIR.varname -> TestHiveContext.makeScratchDir().toURI.toString, + ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY.varname -> "1") + +metastoreTempConf.foreach { case (k, v) => + sc.hadoopConfiguration.set(k, v) +} + } + assume(sc.conf.get(CATALOG_IMPLEMENTATION) == "hive") - // TODO: Let's remove TestHiveSharedState and TestHiveSessionState. Otherwise, + // TODO: Let's remove HiveSharedState and TestHiveSessionState. Otherwise, // we are not really testing the reflection logic based on the setting of // CATALOG_IMPLEMENTATI
spark git commit: [SPARK-16787] SparkContext.addFile() should not throw if called twice with the same file
Repository: spark Updated Branches: refs/heads/master a9beeaaae -> e9fc0b6a8 [SPARK-16787] SparkContext.addFile() should not throw if called twice with the same file ## What changes were proposed in this pull request? The behavior of `SparkContext.addFile()` changed slightly with the introduction of the Netty-RPC-based file server, which was introduced in Spark 1.6 (where it was disabled by default) and became the default / only file server in Spark 2.0.0. Prior to 2.0, calling `SparkContext.addFile()` with files that have the same name and identical contents would succeed. This behavior was never explicitly documented but Spark has behaved this way since very early 1.x versions. In 2.0 (or 1.6 with the Netty file server enabled), the second `addFile()` call will fail with a requirement error because NettyStreamManager tries to guard against duplicate file registration. This problem also affects `addJar()` in a more subtle way: the `fileServer.addJar()` call will also fail with an exception but that exception is logged and ignored; I believe that the problematic exception-catching path was mistakenly copied from some old code which was only relevant to very old versions of Spark and YARN mode. I believe that this change of behavior was unintentional, so this patch weakens the `require` check so that adding the same filename at the same path will succeed. At file download time, Spark tasks will fail with exceptions if an executor already has a local copy of a file and that file's contents do not match the contents of the file being downloaded / added. As a result, it's important that we prevent files with the same name and different contents from being served because allowing that can effectively brick an executor by preventing it from successfully launching any new tasks. Before this patch's change, this was prevented by forbidding `addFile()` from being called twice on files with the same name. Because Spark does not defensively copy local files that are passed to `addFile` it is vulnerable to files' contents changing, so I think it's okay to rely on an implicit assumption that these files are intended to be immutable (since if they _are_ mutable then this can lead to either explicit task failures or implicit incorrectness (in case new executors silently get newer copies of the file while old executors continue to use an older versi on)). To guard against this, I have decided to only update the file addition timestamps on the first call to `addFile()`; duplicate calls will succeed but will not update the timestamp. This behavior is fine as long as we assume files are immutable, which seems reasonable given the behaviors described above. As part of this change, I also improved the thread-safety of the `addedJars` and `addedFiles` maps; this is important because these maps may be concurrently read by a task launching thread and written by a driver thread in case the user's driver code is multi-threaded. ## How was this patch tested? I added regression tests in `SparkContextSuite`. Author: Josh Rosen Closes #14396 from JoshRosen/SPARK-16787. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e9fc0b6a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e9fc0b6a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e9fc0b6a Branch: refs/heads/master Commit: e9fc0b6a8b4ce62cab56d18581f588c67b811f5b Parents: a9beeaa Author: Josh Rosen Authored: Tue Aug 2 12:02:11 2016 -0700 Committer: Josh Rosen Committed: Tue Aug 2 12:02:11 2016 -0700 -- .../scala/org/apache/spark/SparkContext.scala | 36 ++ .../spark/rpc/netty/NettyStreamManager.scala| 12 +++-- .../scala/org/apache/spark/scheduler/Task.scala | 5 +- .../org/apache/spark/SparkContextSuite.scala| 51 4 files changed, 78 insertions(+), 26 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e9fc0b6a/core/src/main/scala/org/apache/spark/SparkContext.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index d48e2b4..48126c2 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -21,7 +21,7 @@ import java.io._ import java.lang.reflect.Constructor import java.net.URI import java.util.{Arrays, Locale, Properties, ServiceLoader, UUID} -import java.util.concurrent.ConcurrentMap +import java.util.concurrent.{ConcurrentHashMap, ConcurrentMap} import java.util.concurrent.atomic.{AtomicBoolean, AtomicInteger, AtomicReference} import scala.collection.JavaConverters._ @@ -262,8 +262,8 @@ class SparkContext(config: Spark
spark git commit: [SPARK-16787] SparkContext.addFile() should not throw if called twice with the same file
Repository: spark Updated Branches: refs/heads/branch-2.0 f190bb83b -> 063a507fc [SPARK-16787] SparkContext.addFile() should not throw if called twice with the same file ## What changes were proposed in this pull request? The behavior of `SparkContext.addFile()` changed slightly with the introduction of the Netty-RPC-based file server, which was introduced in Spark 1.6 (where it was disabled by default) and became the default / only file server in Spark 2.0.0. Prior to 2.0, calling `SparkContext.addFile()` with files that have the same name and identical contents would succeed. This behavior was never explicitly documented but Spark has behaved this way since very early 1.x versions. In 2.0 (or 1.6 with the Netty file server enabled), the second `addFile()` call will fail with a requirement error because NettyStreamManager tries to guard against duplicate file registration. This problem also affects `addJar()` in a more subtle way: the `fileServer.addJar()` call will also fail with an exception but that exception is logged and ignored; I believe that the problematic exception-catching path was mistakenly copied from some old code which was only relevant to very old versions of Spark and YARN mode. I believe that this change of behavior was unintentional, so this patch weakens the `require` check so that adding the same filename at the same path will succeed. At file download time, Spark tasks will fail with exceptions if an executor already has a local copy of a file and that file's contents do not match the contents of the file being downloaded / added. As a result, it's important that we prevent files with the same name and different contents from being served because allowing that can effectively brick an executor by preventing it from successfully launching any new tasks. Before this patch's change, this was prevented by forbidding `addFile()` from being called twice on files with the same name. Because Spark does not defensively copy local files that are passed to `addFile` it is vulnerable to files' contents changing, so I think it's okay to rely on an implicit assumption that these files are intended to be immutable (since if they _are_ mutable then this can lead to either explicit task failures or implicit incorrectness (in case new executors silently get newer copies of the file while old executors continue to use an older versi on)). To guard against this, I have decided to only update the file addition timestamps on the first call to `addFile()`; duplicate calls will succeed but will not update the timestamp. This behavior is fine as long as we assume files are immutable, which seems reasonable given the behaviors described above. As part of this change, I also improved the thread-safety of the `addedJars` and `addedFiles` maps; this is important because these maps may be concurrently read by a task launching thread and written by a driver thread in case the user's driver code is multi-threaded. ## How was this patch tested? I added regression tests in `SparkContextSuite`. Author: Josh Rosen Closes #14396 from JoshRosen/SPARK-16787. (cherry picked from commit e9fc0b6a8b4ce62cab56d18581f588c67b811f5b) Signed-off-by: Josh Rosen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/063a507f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/063a507f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/063a507f Branch: refs/heads/branch-2.0 Commit: 063a507fce862d14061b0c0464b7a51a0afde066 Parents: f190bb8 Author: Josh Rosen Authored: Tue Aug 2 12:02:11 2016 -0700 Committer: Josh Rosen Committed: Tue Aug 2 12:03:10 2016 -0700 -- .../scala/org/apache/spark/SparkContext.scala | 36 ++ .../spark/rpc/netty/NettyStreamManager.scala| 12 +++-- .../scala/org/apache/spark/scheduler/Task.scala | 5 +- .../org/apache/spark/SparkContextSuite.scala| 51 4 files changed, 78 insertions(+), 26 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/063a507f/core/src/main/scala/org/apache/spark/SparkContext.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index fe15052..d3e8de3 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -21,7 +21,7 @@ import java.io._ import java.lang.reflect.Constructor import java.net.URI import java.util.{Arrays, Locale, Properties, ServiceLoader, UUID} -import java.util.concurrent.ConcurrentMap +import java.util.concurrent.{ConcurrentHashMap, ConcurrentMap} import java.util.concurrent.atomic.{AtomicBoolean, AtomicInteger, AtomicRe
spark git commit: [SPARK-16855][SQL] move Greatest and Least from conditionalExpressions.scala to arithmetic.scala
Repository: spark Updated Branches: refs/heads/master cbdff4935 -> a9beeaaae [SPARK-16855][SQL] move Greatest and Least from conditionalExpressions.scala to arithmetic.scala ## What changes were proposed in this pull request? `Greatest` and `Least` are not conditional expressions, but arithmetic expressions. ## How was this patch tested? N/A Author: Wenchen Fan Closes #14460 from cloud-fan/move. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a9beeaaa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a9beeaaa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a9beeaaa Branch: refs/heads/master Commit: a9beeaaaeb52e9c940fe86a3d70801655401623c Parents: cbdff49 Author: Wenchen Fan Authored: Tue Aug 2 11:08:32 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 2 11:08:32 2016 -0700 -- .../sql/catalyst/expressions/arithmetic.scala | 121 ++ .../expressions/conditionalExpressions.scala| 122 --- .../expressions/ArithmeticExpressionSuite.scala | 107 .../ConditionalExpressionSuite.scala| 107 4 files changed, 228 insertions(+), 229 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a9beeaaa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala index 77d40a5..4aebef9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql.catalyst.expressions import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.util.TypeUtils import org.apache.spark.sql.types._ @@ -460,3 +461,123 @@ case class Pmod(left: Expression, right: Expression) extends BinaryArithmetic wi override def sql: String = s"$prettyName(${left.sql}, ${right.sql})" } + +/** + * A function that returns the least value of all parameters, skipping null values. + * It takes at least 2 parameters, and returns null iff all parameters are null. + */ +@ExpressionDescription( + usage = "_FUNC_(n1, ...) - Returns the least value of all parameters, skipping null values.") +case class Least(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = children.forall(_.nullable) + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"LEAST requires at least 2 arguments") +} else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) { + TypeCheckResult.TypeCheckFailure( +s"The expressions should all have the same type," + + s" got LEAST(${children.map(_.dataType.simpleString).mkString(", ")}).") +} else { + TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName) +} + } + + override def dataType: DataType = children.head.dataType + + override def eval(input: InternalRow): Any = { +children.foldLeft[Any](null)((r, c) => { + val evalc = c.eval(input) + if (evalc != null) { +if (r == null || ordering.lt(evalc, r)) evalc else r + } else { +r + } +}) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val first = evalChildren(0) +val rest = evalChildren.drop(1) +def updateEval(eval: ExprCode): String = { + s""" +${eval.code} +if (!${eval.isNull} && (${ev.isNull} || + ${ctx.genGreater(dataType, ev.value, eval.value)})) { + ${ev.isNull} = false; + ${ev.value} = ${eval.value}; +} + """ +} +ev.copy(code = s""" + ${first.code} + boolean ${ev.isNull} = ${first.isNull}; + ${ctx.javaType(dataType)} ${ev.value} = ${first.value}; + ${rest.map(updateEval).mkString("\n")}""") + } +} + +/** + * A function that returns the greatest value of all parameters, skipping null values. + * It takes at least 2 parameters, and returns null iff all parameters are null. + */ +@ExpressionDescription( + usage = "_FUNC_(n1, ...) - Returns the greatest value of all parameters, skipping null values.") +case
spark git commit: [SPARK-16816] Modify java example which is also reflect in documentation exmaple
Repository: spark Updated Branches: refs/heads/master 2330f3ecb -> cbdff4935 [SPARK-16816] Modify java example which is also reflect in documentation exmaple ## What changes were proposed in this pull request? Modify java example which is also reflect in document. ## How was this patch tested? run test cases. Author: sandy Closes #14436 from phalodi/SPARK-16816. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cbdff493 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cbdff493 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cbdff493 Branch: refs/heads/master Commit: cbdff49357d6ce8d41b76b44628d90ead193eb5f Parents: 2330f3e Author: sandy Authored: Tue Aug 2 10:34:01 2016 -0700 Committer: Sean Owen Committed: Tue Aug 2 10:34:01 2016 -0700 -- .../examples/sql/JavaSQLDataSourceExample.java | 16 1 file changed, 16 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cbdff493/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java b/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java index 52e3b62..fc92446 100644 --- a/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java +++ b/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java @@ -19,10 +19,13 @@ package org.apache.spark.examples.sql; // $example on:schema_merging$ import java.io.Serializable; import java.util.ArrayList; +import java.util.Arrays; import java.util.List; // $example off:schema_merging$ // $example on:basic_parquet_example$ +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.MapFunction; import org.apache.spark.sql.Encoders; // $example on:schema_merging$ @@ -213,6 +216,19 @@ public class JavaSQLDataSourceExample { // +--+ // |Justin| // +--+ + +// Alternatively, a DataFrame can be created for a JSON dataset represented by +// an RDD[String] storing one JSON object per string. +List jsonData = Arrays.asList( + "{\"name\":\"Yin\",\"address\":{\"city\":\"Columbus\",\"state\":\"Ohio\"}}"); +JavaRDD anotherPeopleRDD = new JavaSparkContext(spark.sparkContext()).parallelize(jsonData); +Dataset anotherPeople = spark.read().json(anotherPeopleRDD); +anotherPeople.show(); +// +---++ +// |address|name| +// +---++ +// |[Columbus,Ohio]| Yin| +// +---++ // $example off:json_dataset$ } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16850][SQL] Improve type checking error message for greatest/least
Repository: spark Updated Branches: refs/heads/branch-2.0 a937c9ee4 -> f190bb83b [SPARK-16850][SQL] Improve type checking error message for greatest/least Greatest/least function does not have the most friendly error message for data types. This patch improves the error message to not show the Seq type, and use more human readable data types. Before: ``` org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all have the same type, got GREATEST (ArrayBuffer(DecimalType(2,1), StringType)).; line 1 pos 7 ``` After: ``` org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all have the same type, got GREATEST(decimal(2,1), string).; line 1 pos 7 ``` Manually verified the output and also added unit tests to ConditionalExpressionSuite. Author: petermaxlee Closes #14453 from petermaxlee/SPARK-16850. (cherry picked from commit a1ff72e1cce6f22249ccc4905e8cef30075beb2f) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f190bb83 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f190bb83 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f190bb83 Branch: refs/heads/branch-2.0 Commit: f190bb83beaafb65c8e6290e9ecaa61ac51e04bb Parents: a937c9e Author: petermaxlee Authored: Tue Aug 2 19:32:35 2016 +0800 Committer: Reynold Xin Committed: Tue Aug 2 10:22:18 2016 -0700 -- .../catalyst/expressions/conditionalExpressions.scala | 4 ++-- .../expressions/ConditionalExpressionSuite.scala | 13 + 2 files changed, 15 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f190bb83/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala index e97e089..5f2585f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala @@ -299,7 +299,7 @@ case class Least(children: Seq[Expression]) extends Expression { } else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) { TypeCheckResult.TypeCheckFailure( s"The expressions should all have the same type," + - s" got LEAST (${children.map(_.dataType)}).") + s" got LEAST(${children.map(_.dataType.simpleString).mkString(", ")}).") } else { TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName) } @@ -359,7 +359,7 @@ case class Greatest(children: Seq[Expression]) extends Expression { } else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) { TypeCheckResult.TypeCheckFailure( s"The expressions should all have the same type," + - s" got GREATEST (${children.map(_.dataType)}).") + s" got GREATEST(${children.map(_.dataType.simpleString).mkString(", ")}).") } else { TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName) } http://git-wip-us.apache.org/repos/asf/spark/blob/f190bb83/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala index 3c581ec..36185b8 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala @@ -21,6 +21,7 @@ import java.sql.{Date, Timestamp} import org.apache.spark.SparkFunSuite import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.TypeCheckFailure import org.apache.spark.sql.catalyst.dsl.expressions._ import org.apache.spark.sql.types._ @@ -181,6 +182,12 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper Literal(Timestamp.valueOf("2015-07-01 10:00:00", Timestamp.valueOf("2015-07-01 08:00:00"), InternalRow.empty) +// Type checking error +assert( + Least(Seq(Literal(1), Literal("1"))).checkInputDataType
spark git commit: [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals
Repository: spark Updated Branches: refs/heads/master 146001a9f -> 2330f3ecb [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals ## What changes were proposed in this pull request? In Spark 1.6 (with Hive support) we could use `CURRENT_DATE` and `CURRENT_TIMESTAMP` functions as literals (without adding braces), for example: ```SQL select /* Spark 1.6: */ current_date, /* Spark 1.6 & Spark 2.0: */ current_date() ``` This was accidentally dropped in Spark 2.0. This PR reinstates this functionality. ## How was this patch tested? Added a case to ExpressionParserSuite. Author: Herman van Hovell Closes #14442 from hvanhovell/SPARK-16836. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2330f3ec Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2330f3ec Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2330f3ec Branch: refs/heads/master Commit: 2330f3ecbbd89c7eaab9cc0d06726aa743b16334 Parents: 146001a Author: Herman van Hovell Authored: Tue Aug 2 10:09:47 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 2 10:09:47 2016 -0700 -- .../org/apache/spark/sql/catalyst/parser/SqlBase.g4| 5 - .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 13 + .../sql/catalyst/parser/ExpressionParserSuite.scala| 5 + .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 11 ++- 4 files changed, 32 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2330f3ec/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 5e10462..c7d5086 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -500,6 +500,7 @@ valueExpression primaryExpression : constant #constantDefault +| name=(CURRENT_DATE | CURRENT_TIMESTAMP) #timeFunctionCall | ASTERISK #star | qualifiedName '.' ASTERISK #star | '(' expression (',' expression)+ ')' #rowConstructor @@ -660,7 +661,7 @@ nonReserved | NULL | ORDER | OUTER | TABLE | TRUE | WITH | RLIKE | AND | CASE | CAST | DISTINCT | DIV | ELSE | END | FUNCTION | INTERVAL | MACRO | OR | STRATIFY | THEN | UNBOUNDED | WHEN -| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT +| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT | CURRENT_DATE | CURRENT_TIMESTAMP ; SELECT: 'SELECT'; @@ -880,6 +881,8 @@ OPTION: 'OPTION'; ANTI: 'ANTI'; LOCAL: 'LOCAL'; INPATH: 'INPATH'; +CURRENT_DATE: 'CURRENT_DATE'; +CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP'; STRING : '\'' ( ~('\''|'\\') | ('\\' .) )* '\'' http://git-wip-us.apache.org/repos/asf/spark/blob/2330f3ec/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index f2cc8d3..679adf2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -1023,6 +1023,19 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { } /** + * Create a current timestamp/date expression. These are different from regular function because + * they do not require the user to specify braces when calling them. + */ + override def visitTimeFunctionCall(ctx: TimeFunctionCallContext): Expression = withOrigin(ctx) { +ctx.name.getType match { + case SqlBaseParser.CURRENT_DATE => +CurrentDate() + case SqlBaseParser.CURRENT_TIMESTAMP => +CurrentTimestamp() +} + } + + /** * Create a function database (optional) and name pair. */ protected def visitFunctionName(ctx: QualifiedNameContext): FunctionIdentifier = { http://git-wip-us.apache.org/repos/asf/spark/blob/2330f3ec/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala ---
spark git commit: [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals
Repository: spark Updated Branches: refs/heads/branch-2.0 ef7927e8e -> a937c9ee4 [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals ## What changes were proposed in this pull request? In Spark 1.6 (with Hive support) we could use `CURRENT_DATE` and `CURRENT_TIMESTAMP` functions as literals (without adding braces), for example: ```SQL select /* Spark 1.6: */ current_date, /* Spark 1.6 & Spark 2.0: */ current_date() ``` This was accidentally dropped in Spark 2.0. This PR reinstates this functionality. ## How was this patch tested? Added a case to ExpressionParserSuite. Author: Herman van Hovell Closes #14442 from hvanhovell/SPARK-16836. (cherry picked from commit 2330f3ecbbd89c7eaab9cc0d06726aa743b16334) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a937c9ee Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a937c9ee Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a937c9ee Branch: refs/heads/branch-2.0 Commit: a937c9ee44e0766194fc8ca4bce2338453112a53 Parents: ef7927e Author: Herman van Hovell Authored: Tue Aug 2 10:09:47 2016 -0700 Committer: Reynold Xin Committed: Tue Aug 2 10:09:53 2016 -0700 -- .../org/apache/spark/sql/catalyst/parser/SqlBase.g4| 5 - .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 13 + .../sql/catalyst/parser/ExpressionParserSuite.scala| 5 + .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 11 ++- 4 files changed, 32 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a937c9ee/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 4c15f9c..de98a87 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -493,6 +493,7 @@ valueExpression primaryExpression : constant #constantDefault +| name=(CURRENT_DATE | CURRENT_TIMESTAMP) #timeFunctionCall | ASTERISK #star | qualifiedName '.' ASTERISK #star | '(' expression (',' expression)+ ')' #rowConstructor @@ -653,7 +654,7 @@ nonReserved | NULL | ORDER | OUTER | TABLE | TRUE | WITH | RLIKE | AND | CASE | CAST | DISTINCT | DIV | ELSE | END | FUNCTION | INTERVAL | MACRO | OR | STRATIFY | THEN | UNBOUNDED | WHEN -| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT +| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT | CURRENT_DATE | CURRENT_TIMESTAMP ; SELECT: 'SELECT'; @@ -873,6 +874,8 @@ OPTION: 'OPTION'; ANTI: 'ANTI'; LOCAL: 'LOCAL'; INPATH: 'INPATH'; +CURRENT_DATE: 'CURRENT_DATE'; +CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP'; STRING : '\'' ( ~('\''|'\\') | ('\\' .) )* '\'' http://git-wip-us.apache.org/repos/asf/spark/blob/a937c9ee/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index c7420a1..1a0e7ab 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -1023,6 +1023,19 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { } /** + * Create a current timestamp/date expression. These are different from regular function because + * they do not require the user to specify braces when calling them. + */ + override def visitTimeFunctionCall(ctx: TimeFunctionCallContext): Expression = withOrigin(ctx) { +ctx.name.getType match { + case SqlBaseParser.CURRENT_DATE => +CurrentDate() + case SqlBaseParser.CURRENT_TIMESTAMP => +CurrentTimestamp() +} + } + + /** * Create a function database (optional) and name pair. */ protected def visitFunctionName(ctx: QualifiedNameContext): FunctionIdentifier = { http://git-wip-us.apache.org/repos/asf/spark/blob/a937c9ee/sql/catalyst/src/test/scal
spark git commit: [SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs
Repository: spark Updated Branches: refs/heads/branch-2.0 22f0899bc -> ef7927e8e [SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs ## What changes were proposed in this pull request? There are two related bugs of Python-only UDTs. Because the test case of second one needs the first fix too. I put them into one PR. If it is not appropriate, please let me know. ### First bug: When MapObjects works on Python-only UDTs `RowEncoder` will use `PythonUserDefinedType.sqlType` for its deserializer expression. If the sql type is `ArrayType`, we will have `MapObjects` working on it. But `MapObjects` doesn't consider `PythonUserDefinedType` as its input data type. It causes error like: import pyspark.sql.group from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT from pyspark.sql.types import * schema = StructType().add("key", LongType()).add("val", PythonOnlyUDT()) df = spark.createDataFrame([(i % 3, PythonOnlyPoint(float(i), float(i))) for i in range(10)], schema=schema) df.show() File "/home/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o36.showString. : java.lang.RuntimeException: Error while decoding: scala.MatchError: org.apache.spark.sql.types.PythonUserDefinedTypef4ceede8 (of class org.apache.spark.sql.types.PythonUserDefinedType) ... ### Second bug: When Python-only UDTs is the element type of ArrayType import pyspark.sql.group from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT from pyspark.sql.types import * schema = StructType().add("key", LongType()).add("val", ArrayType(PythonOnlyUDT())) df = spark.createDataFrame([(i % 3, [PythonOnlyPoint(float(i), float(i))]) for i in range(10)], schema=schema) df.show() ## How was this patch tested? PySpark's sql tests. Author: Liang-Chi Hsieh Closes #13778 from viirya/fix-pyudt. (cherry picked from commit 146001a9ffefc7aaedd3d888d68c7a9b80bca545) Signed-off-by: Davies Liu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ef7927e8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ef7927e8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ef7927e8 Branch: refs/heads/branch-2.0 Commit: ef7927e8e77558f9a18eacc8491b0c28231e2769 Parents: 22f0899 Author: Liang-Chi Hsieh Authored: Tue Aug 2 10:08:18 2016 -0700 Committer: Davies Liu Committed: Tue Aug 2 10:08:34 2016 -0700 -- python/pyspark/sql/tests.py | 35 .../sql/catalyst/encoders/RowEncoder.scala | 9 - .../catalyst/expressions/objects/objects.scala | 17 -- 3 files changed, 58 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ef7927e8/python/pyspark/sql/tests.py -- diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py index a8ca386..87dbb50 100644 --- a/python/pyspark/sql/tests.py +++ b/python/pyspark/sql/tests.py @@ -575,6 +575,41 @@ class SQLTests(ReusedPySparkTestCase): _verify_type(PythonOnlyPoint(1.0, 2.0), PythonOnlyUDT()) self.assertRaises(ValueError, lambda: _verify_type([1.0, 2.0], PythonOnlyUDT())) +def test_simple_udt_in_df(self): +schema = StructType().add("key", LongType()).add("val", PythonOnlyUDT()) +df = self.spark.createDataFrame( +[(i % 3, PythonOnlyPoint(float(i), float(i))) for i in range(10)], +schema=schema) +df.show() + +def test_nested_udt_in_df(self): +schema = StructType().add("key", LongType()).add("val", ArrayType(PythonOnlyUDT())) +df = self.spark.createDataFrame( +[(i % 3, [PythonOnlyPoint(float(i), float(i))]) for i in range(10)], +schema=schema) +df.collect() + +schema = StructType().add("key", LongType()).add("val", + MapType(LongType(), PythonOnlyUDT())) +df = self.spark.createDataFrame( +[(i % 3, {i % 3: PythonOnlyPoint(float(i + 1), float(i + 1))}) for i in range(10)], +schema=schema) +df.collect() + +def test_complex_nested_udt_in_df(self): +from pyspark.sql.functions import udf + +schema = StructType().add("key", LongType()).add("val", PythonOnlyUDT()) +df = self.spark.createDataFrame( +[(i % 3, PythonOnlyPoint(float(i), float(i))) for i in range(10)], +schema=schema) +df.collect() + +gd = df.groupby("key").agg({"val": "collect_list"}) +gd.collect() +udf = udf(lambda k, v: [(k, v[0])], ArrayType(df.schema)) +gd.select(udf(*gd)).collect() +
spark git commit: [SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs
Repository: spark Updated Branches: refs/heads/master 1dab63d8d -> 146001a9f [SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs ## What changes were proposed in this pull request? There are two related bugs of Python-only UDTs. Because the test case of second one needs the first fix too. I put them into one PR. If it is not appropriate, please let me know. ### First bug: When MapObjects works on Python-only UDTs `RowEncoder` will use `PythonUserDefinedType.sqlType` for its deserializer expression. If the sql type is `ArrayType`, we will have `MapObjects` working on it. But `MapObjects` doesn't consider `PythonUserDefinedType` as its input data type. It causes error like: import pyspark.sql.group from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT from pyspark.sql.types import * schema = StructType().add("key", LongType()).add("val", PythonOnlyUDT()) df = spark.createDataFrame([(i % 3, PythonOnlyPoint(float(i), float(i))) for i in range(10)], schema=schema) df.show() File "/home/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o36.showString. : java.lang.RuntimeException: Error while decoding: scala.MatchError: org.apache.spark.sql.types.PythonUserDefinedTypef4ceede8 (of class org.apache.spark.sql.types.PythonUserDefinedType) ... ### Second bug: When Python-only UDTs is the element type of ArrayType import pyspark.sql.group from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT from pyspark.sql.types import * schema = StructType().add("key", LongType()).add("val", ArrayType(PythonOnlyUDT())) df = spark.createDataFrame([(i % 3, [PythonOnlyPoint(float(i), float(i))]) for i in range(10)], schema=schema) df.show() ## How was this patch tested? PySpark's sql tests. Author: Liang-Chi Hsieh Closes #13778 from viirya/fix-pyudt. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/146001a9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/146001a9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/146001a9 Branch: refs/heads/master Commit: 146001a9ffefc7aaedd3d888d68c7a9b80bca545 Parents: 1dab63d Author: Liang-Chi Hsieh Authored: Tue Aug 2 10:08:18 2016 -0700 Committer: Davies Liu Committed: Tue Aug 2 10:08:18 2016 -0700 -- python/pyspark/sql/tests.py | 35 .../sql/catalyst/encoders/RowEncoder.scala | 9 - .../catalyst/expressions/objects/objects.scala | 17 -- 3 files changed, 58 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/146001a9/python/pyspark/sql/tests.py -- diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py index a8ca386..87dbb50 100644 --- a/python/pyspark/sql/tests.py +++ b/python/pyspark/sql/tests.py @@ -575,6 +575,41 @@ class SQLTests(ReusedPySparkTestCase): _verify_type(PythonOnlyPoint(1.0, 2.0), PythonOnlyUDT()) self.assertRaises(ValueError, lambda: _verify_type([1.0, 2.0], PythonOnlyUDT())) +def test_simple_udt_in_df(self): +schema = StructType().add("key", LongType()).add("val", PythonOnlyUDT()) +df = self.spark.createDataFrame( +[(i % 3, PythonOnlyPoint(float(i), float(i))) for i in range(10)], +schema=schema) +df.show() + +def test_nested_udt_in_df(self): +schema = StructType().add("key", LongType()).add("val", ArrayType(PythonOnlyUDT())) +df = self.spark.createDataFrame( +[(i % 3, [PythonOnlyPoint(float(i), float(i))]) for i in range(10)], +schema=schema) +df.collect() + +schema = StructType().add("key", LongType()).add("val", + MapType(LongType(), PythonOnlyUDT())) +df = self.spark.createDataFrame( +[(i % 3, {i % 3: PythonOnlyPoint(float(i + 1), float(i + 1))}) for i in range(10)], +schema=schema) +df.collect() + +def test_complex_nested_udt_in_df(self): +from pyspark.sql.functions import udf + +schema = StructType().add("key", LongType()).add("val", PythonOnlyUDT()) +df = self.spark.createDataFrame( +[(i % 3, PythonOnlyPoint(float(i), float(i))) for i in range(10)], +schema=schema) +df.collect() + +gd = df.groupby("key").agg({"val": "collect_list"}) +gd.collect() +udf = udf(lambda k, v: [(k, v[0])], ArrayType(df.schema)) +gd.select(udf(*gd)).collect() + def test_udt_with_none(self): df = self.spark.range(0, 10, 1, 1) http://git-wip-us.apache.or
spark git commit: [SPARK-16837][SQL] TimeWindow incorrectly drops slideDuration in constructors
Repository: spark Updated Branches: refs/heads/branch-2.0 fc18e259a -> 22f0899bc [SPARK-16837][SQL] TimeWindow incorrectly drops slideDuration in constructors ## What changes were proposed in this pull request? Fix of incorrect arguments (dropping slideDuration and using windowDuration) in constructors for TimeWindow. The JIRA this addresses is here: https://issues.apache.org/jira/browse/SPARK-16837 ## How was this patch tested? Added a test to TimeWindowSuite to check that the results of TimeWindow object apply and TimeWindow class constructor are equivalent. Author: Tom Magrino Closes #14441 from tmagrino/windowing-fix. (cherry picked from commit 1dab63d8d3c59a3d6b4ee8e777810c44849e58b8) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/22f0899b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/22f0899b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/22f0899b Branch: refs/heads/branch-2.0 Commit: 22f0899bc78e1f2021084c6397a4c05ad6317bae Parents: fc18e25 Author: Tom Magrino Authored: Tue Aug 2 09:16:44 2016 -0700 Committer: Sean Owen Committed: Tue Aug 2 09:16:50 2016 -0700 -- .../spark/sql/catalyst/expressions/TimeWindow.scala | 4 ++-- .../sql/catalyst/expressions/TimeWindowSuite.scala | 12 2 files changed, 14 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/22f0899b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala index 66c4bf2..7ff61ee 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala @@ -45,12 +45,12 @@ case class TimeWindow( slideDuration: Expression, startTime: Expression) = { this(timeColumn, TimeWindow.parseExpression(windowDuration), - TimeWindow.parseExpression(windowDuration), TimeWindow.parseExpression(startTime)) + TimeWindow.parseExpression(slideDuration), TimeWindow.parseExpression(startTime)) } def this(timeColumn: Expression, windowDuration: Expression, slideDuration: Expression) = { this(timeColumn, TimeWindow.parseExpression(windowDuration), - TimeWindow.parseExpression(windowDuration), 0) + TimeWindow.parseExpression(slideDuration), 0) } def this(timeColumn: Expression, windowDuration: Expression) = { http://git-wip-us.apache.org/repos/asf/spark/blob/22f0899b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala index b82cf8d..d6c8fcf 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala @@ -108,4 +108,16 @@ class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with Priva TimeWindow.invokePrivate(parseExpression(Rand(123))) } } + + test("SPARK-16837: TimeWindow.apply equivalent to TimeWindow constructor") { +val slideLength = "1 second" +for (windowLength <- Seq("10 second", "1 minute", "2 hours")) { + val applyValue = TimeWindow(Literal(10L), windowLength, slideLength, "0 seconds") + val constructed = new TimeWindow(Literal(10L), +Literal(windowLength), +Literal(slideLength), +Literal("0 seconds")) + assert(applyValue == constructed) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16837][SQL] TimeWindow incorrectly drops slideDuration in constructors
Repository: spark Updated Branches: refs/heads/master 36827ddaf -> 1dab63d8d [SPARK-16837][SQL] TimeWindow incorrectly drops slideDuration in constructors ## What changes were proposed in this pull request? Fix of incorrect arguments (dropping slideDuration and using windowDuration) in constructors for TimeWindow. The JIRA this addresses is here: https://issues.apache.org/jira/browse/SPARK-16837 ## How was this patch tested? Added a test to TimeWindowSuite to check that the results of TimeWindow object apply and TimeWindow class constructor are equivalent. Author: Tom Magrino Closes #14441 from tmagrino/windowing-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1dab63d8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1dab63d8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1dab63d8 Branch: refs/heads/master Commit: 1dab63d8d3c59a3d6b4ee8e777810c44849e58b8 Parents: 36827dd Author: Tom Magrino Authored: Tue Aug 2 09:16:44 2016 -0700 Committer: Sean Owen Committed: Tue Aug 2 09:16:44 2016 -0700 -- .../spark/sql/catalyst/expressions/TimeWindow.scala | 4 ++-- .../sql/catalyst/expressions/TimeWindowSuite.scala | 12 2 files changed, 14 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1dab63d8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala index 66c4bf2..7ff61ee 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala @@ -45,12 +45,12 @@ case class TimeWindow( slideDuration: Expression, startTime: Expression) = { this(timeColumn, TimeWindow.parseExpression(windowDuration), - TimeWindow.parseExpression(windowDuration), TimeWindow.parseExpression(startTime)) + TimeWindow.parseExpression(slideDuration), TimeWindow.parseExpression(startTime)) } def this(timeColumn: Expression, windowDuration: Expression, slideDuration: Expression) = { this(timeColumn, TimeWindow.parseExpression(windowDuration), - TimeWindow.parseExpression(windowDuration), 0) + TimeWindow.parseExpression(slideDuration), 0) } def this(timeColumn: Expression, windowDuration: Expression) = { http://git-wip-us.apache.org/repos/asf/spark/blob/1dab63d8/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala index b82cf8d..d6c8fcf 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala @@ -108,4 +108,16 @@ class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with Priva TimeWindow.invokePrivate(parseExpression(Rand(123))) } } + + test("SPARK-16837: TimeWindow.apply equivalent to TimeWindow constructor") { +val slideLength = "1 second" +for (windowLength <- Seq("10 second", "1 minute", "2 hours")) { + val applyValue = TimeWindow(Literal(10L), windowLength, slideLength, "0 seconds") + val constructed = new TimeWindow(Literal(10L), +Literal(windowLength), +Literal(slideLength), +Literal("0 seconds")) + assert(applyValue == constructed) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16822][DOC] Support latex in scaladoc.
Repository: spark Updated Branches: refs/heads/master 511dede11 -> 36827ddaf [SPARK-16822][DOC] Support latex in scaladoc. ## What changes were proposed in this pull request? Support using latex in scaladoc by adding MathJax javascript to the js template. ## How was this patch tested? Generated scaladoc. Preview: - LogisticGradient: [before](https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.mllib.optimization.LogisticGradient) and [after](https://sparkdocs.lins05.pw/spark-16822/api/scala/index.html#org.apache.spark.mllib.optimization.LogisticGradient) - MinMaxScaler: [before](https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.ml.feature.MinMaxScaler) and [after](https://sparkdocs.lins05.pw/spark-16822/api/scala/index.html#org.apache.spark.ml.feature.MinMaxScaler) Author: Shuai Lin Closes #14438 from lins05/spark-16822-support-latex-in-scaladoc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/36827dda Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/36827dda Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/36827dda Branch: refs/heads/master Commit: 36827ddafeaa7a683362eb8da31065aaff9676d5 Parents: 511dede Author: Shuai Lin Authored: Tue Aug 2 09:14:08 2016 -0700 Committer: Sean Owen Committed: Tue Aug 2 09:14:08 2016 -0700 -- docs/js/api-docs.js | 20 .../apache/spark/ml/feature/MinMaxScaler.scala | 10 +- .../ml/regression/AFTSurvivalRegression.scala | 94 +-- .../spark/ml/regression/LinearRegression.scala | 120 +-- .../spark/mllib/clustering/LDAUtils.scala | 2 +- .../mllib/evaluation/RegressionMetrics.scala| 2 +- .../spark/mllib/optimization/Gradient.scala | 94 +-- 7 files changed, 225 insertions(+), 117 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/36827dda/docs/js/api-docs.js -- diff --git a/docs/js/api-docs.js b/docs/js/api-docs.js index ce89d89..96c63cc 100644 --- a/docs/js/api-docs.js +++ b/docs/js/api-docs.js @@ -41,3 +41,23 @@ function addBadges(allAnnotations, name, tag, html) { .add(annotations.closest("div.fullcomment").prevAll("h4.signature")) .prepend(html); } + +$(document).ready(function() { + var script = document.createElement('script'); + script.type = 'text/javascript'; + script.async = true; + script.onload = function(){ +MathJax.Hub.Config({ + displayAlign: "left", + tex2jax: { +inlineMath: [ ["$", "$"], ["(",")"] ], +displayMath: [ ["$$","$$"], ["\\[", "\\]"] ], +processEscapes: true, +skipTags: ['script', 'noscript', 'style', 'textarea', 'pre', 'a'] + } +}); + }; + script.src = ('https:' == document.location.protocol ? 'https://' : 'http://') + + 'cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'; + document.getElementsByTagName('head')[0].appendChild(script); +}); http://git-wip-us.apache.org/repos/asf/spark/blob/36827dda/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala index 068f11a..9f3d2ca 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala @@ -76,11 +76,15 @@ private[feature] trait MinMaxScalerParams extends Params with HasInputCol with H /** * Rescale each feature individually to a common range [min, max] linearly using column summary * statistics, which is also known as min-max normalization or Rescaling. The rescaled value for - * feature E is calculated as, + * feature E is calculated as: * - * `Rescaled(e_i) = \frac{e_i - E_{min}}{E_{max} - E_{min}} * (max - min) + min` + * + *$$ + *Rescaled(e_i) = \frac{e_i - E_{min}}{E_{max} - E_{min}} * (max - min) + min + *$$ + * * - * For the case `E_{max} == E_{min}`, `Rescaled(e_i) = 0.5 * (max + min)`. + * For the case $E_{max} == E_{min}$, $Rescaled(e_i) = 0.5 * (max + min)$. * Note that since zero values will probably be transformed to non-zero values, output of the * transformer will be DenseVector even for sparse input. */ http://git-wip-us.apache.org/repos/asf/spark/blob/36827dda/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala b/mllib/src/main/scala/org/apache/
spark git commit: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)
Repository: spark Updated Branches: refs/heads/branch-1.6 1b2e6f636 -> 8a22275de [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch) Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with Java 8 on Java 7 Compilation. Existing automatic tests Author: Maciej Brynski Closes #14459 from maver1ck/spark-15541-master. (cherry picked from commit 511dede1118f20a7756f614acb6fc88af52c9de9) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8a22275d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8a22275d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8a22275d Branch: refs/heads/branch-1.6 Commit: 8a22275dea74cd79ecd59438fd88bebcae13c944 Parents: 1b2e6f6 Author: Maciej Brynski Authored: Tue Aug 2 08:07:08 2016 -0700 Committer: Sean Owen Committed: Tue Aug 2 08:46:20 2016 -0700 -- .../main/scala/org/apache/spark/rpc/netty/Dispatcher.scala | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8a22275d/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala -- diff --git a/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala b/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala index 533c984..9364c7e 100644 --- a/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala +++ b/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala @@ -17,7 +17,7 @@ package org.apache.spark.rpc.netty -import java.util.concurrent.{ThreadPoolExecutor, ConcurrentHashMap, LinkedBlockingQueue, TimeUnit} +import java.util.concurrent.{ConcurrentHashMap, ConcurrentMap, LinkedBlockingQueue, ThreadPoolExecutor, TimeUnit} import javax.annotation.concurrent.GuardedBy import scala.collection.JavaConverters._ @@ -41,8 +41,10 @@ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv) extends Logging { val inbox = new Inbox(ref, endpoint) } - private val endpoints = new ConcurrentHashMap[String, EndpointData] - private val endpointRefs = new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef] + private val endpoints: ConcurrentMap[String, EndpointData] = +new ConcurrentHashMap[String, EndpointData] + private val endpointRefs: ConcurrentMap[RpcEndpoint, RpcEndpointRef] = +new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef] // Track the receivers whose inboxes may contain messages. private val receivers = new LinkedBlockingQueue[EndpointData] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)
Repository: spark Updated Branches: refs/heads/master dd8514fa2 -> 511dede11 [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch) ## What changes were proposed in this pull request? Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with Java 8 on Java 7 ## How was this patch tested? Compilation. Existing automatic tests Author: Maciej Brynski Closes #14459 from maver1ck/spark-15541-master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/511dede1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/511dede1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/511dede1 Branch: refs/heads/master Commit: 511dede1118f20a7756f614acb6fc88af52c9de9 Parents: dd8514f Author: Maciej Brynski Authored: Tue Aug 2 08:07:08 2016 -0700 Committer: Sean Owen Committed: Tue Aug 2 08:07:08 2016 -0700 -- .../main/scala/org/apache/spark/rpc/netty/Dispatcher.scala | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/511dede1/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala -- diff --git a/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala b/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala index d305de2..a02cf30 100644 --- a/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala +++ b/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala @@ -17,7 +17,7 @@ package org.apache.spark.rpc.netty -import java.util.concurrent.{ConcurrentHashMap, LinkedBlockingQueue, ThreadPoolExecutor, TimeUnit} +import java.util.concurrent.{ConcurrentHashMap, ConcurrentMap, LinkedBlockingQueue, ThreadPoolExecutor, TimeUnit} import javax.annotation.concurrent.GuardedBy import scala.collection.JavaConverters._ @@ -42,8 +42,10 @@ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv) extends Logging { val inbox = new Inbox(ref, endpoint) } - private val endpoints = new ConcurrentHashMap[String, EndpointData] - private val endpointRefs = new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef] + private val endpoints: ConcurrentMap[String, EndpointData] = +new ConcurrentHashMap[String, EndpointData] + private val endpointRefs: ConcurrentMap[RpcEndpoint, RpcEndpointRef] = +new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef] // Track the receivers whose inboxes may contain messages. private val receivers = new LinkedBlockingQueue[EndpointData] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)
Repository: spark Updated Branches: refs/heads/branch-2.0 c5516ab60 -> fc18e259a [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch) ## What changes were proposed in this pull request? Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with Java 8 on Java 7 ## How was this patch tested? Compilation. Existing automatic tests Author: Maciej Brynski Closes #14459 from maver1ck/spark-15541-master. (cherry picked from commit 511dede1118f20a7756f614acb6fc88af52c9de9) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fc18e259 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fc18e259 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fc18e259 Branch: refs/heads/branch-2.0 Commit: fc18e259a311c0f1dffe47edef0e42182afca8e9 Parents: c5516ab Author: Maciej Brynski Authored: Tue Aug 2 08:07:08 2016 -0700 Committer: Sean Owen Committed: Tue Aug 2 08:07:18 2016 -0700 -- .../main/scala/org/apache/spark/rpc/netty/Dispatcher.scala | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/fc18e259/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala -- diff --git a/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala b/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala index d305de2..a02cf30 100644 --- a/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala +++ b/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala @@ -17,7 +17,7 @@ package org.apache.spark.rpc.netty -import java.util.concurrent.{ConcurrentHashMap, LinkedBlockingQueue, ThreadPoolExecutor, TimeUnit} +import java.util.concurrent.{ConcurrentHashMap, ConcurrentMap, LinkedBlockingQueue, ThreadPoolExecutor, TimeUnit} import javax.annotation.concurrent.GuardedBy import scala.collection.JavaConverters._ @@ -42,8 +42,10 @@ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv) extends Logging { val inbox = new Inbox(ref, endpoint) } - private val endpoints = new ConcurrentHashMap[String, EndpointData] - private val endpointRefs = new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef] + private val endpoints: ConcurrentMap[String, EndpointData] = +new ConcurrentHashMap[String, EndpointData] + private val endpointRefs: ConcurrentMap[RpcEndpoint, RpcEndpointRef] = +new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef] // Track the receivers whose inboxes may contain messages. private val receivers = new LinkedBlockingQueue[EndpointData] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector
Repository: spark Updated Branches: refs/heads/branch-2.0 9d9956e8f -> c5516ab60 [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector ## What changes were proposed in this pull request? mllib.LDAExample uses ML pipeline and MLlib LDA algorithm. The former transforms original data into MLVector format, while the latter uses MLlibVector format. ## How was this patch tested? Test manually. Author: Xusen Yin Closes #14212 from yinxusen/SPARK-16558. (cherry picked from commit dd8514fa2059a695143073f852b1abee50e522bd) Signed-off-by: Yanbo Liang Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c5516ab6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c5516ab6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c5516ab6 Branch: refs/heads/branch-2.0 Commit: c5516ab60da860320693bbc245818cb6d8a282c8 Parents: 9d9956e Author: Xusen Yin Authored: Tue Aug 2 07:28:46 2016 -0700 Committer: Yanbo Liang Committed: Tue Aug 2 07:31:32 2016 -0700 -- .../main/scala/org/apache/spark/examples/mllib/LDAExample.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c5516ab6/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala -- diff --git a/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala b/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala index 3fbf8e0..ef67841 100644 --- a/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala +++ b/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala @@ -24,8 +24,9 @@ import scopt.OptionParser import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.ml.Pipeline import org.apache.spark.ml.feature.{CountVectorizer, CountVectorizerModel, RegexTokenizer, StopWordsRemover} +import org.apache.spark.ml.linalg.{Vector => MLVector} import org.apache.spark.mllib.clustering.{DistributedLDAModel, EMLDAOptimizer, LDA, OnlineLDAOptimizer} -import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.linalg.{Vector, Vectors} import org.apache.spark.rdd.RDD import org.apache.spark.sql.{Row, SparkSession} @@ -225,7 +226,7 @@ object LDAExample { val documents = model.transform(df) .select("features") .rdd - .map { case Row(features: Vector) => features } + .map { case Row(features: MLVector) => Vectors.fromML(features) } .zipWithIndex() .map(_.swap) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector
Repository: spark Updated Branches: refs/heads/master d9e0919d3 -> dd8514fa2 [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector ## What changes were proposed in this pull request? mllib.LDAExample uses ML pipeline and MLlib LDA algorithm. The former transforms original data into MLVector format, while the latter uses MLlibVector format. ## How was this patch tested? Test manually. Author: Xusen Yin Closes #14212 from yinxusen/SPARK-16558. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dd8514fa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dd8514fa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dd8514fa Branch: refs/heads/master Commit: dd8514fa2059a695143073f852b1abee50e522bd Parents: d9e0919 Author: Xusen Yin Authored: Tue Aug 2 07:28:46 2016 -0700 Committer: Yanbo Liang Committed: Tue Aug 2 07:28:46 2016 -0700 -- .../main/scala/org/apache/spark/examples/mllib/LDAExample.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dd8514fa/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala -- diff --git a/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala b/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala index 7e50b12..b923e62 100644 --- a/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala +++ b/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala @@ -24,8 +24,9 @@ import scopt.OptionParser import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.ml.Pipeline import org.apache.spark.ml.feature.{CountVectorizer, CountVectorizerModel, RegexTokenizer, StopWordsRemover} +import org.apache.spark.ml.linalg.{Vector => MLVector} import org.apache.spark.mllib.clustering.{DistributedLDAModel, EMLDAOptimizer, LDA, OnlineLDAOptimizer} -import org.apache.spark.mllib.linalg.Vector +import org.apache.spark.mllib.linalg.{Vector, Vectors} import org.apache.spark.rdd.RDD import org.apache.spark.sql.{Row, SparkSession} @@ -223,7 +224,7 @@ object LDAExample { val documents = model.transform(df) .select("features") .rdd - .map { case Row(features: Vector) => features } + .map { case Row(features: MLVector) => Vectors.fromML(features) } .zipWithIndex() .map(_.swap) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16851][ML] Incorrect threshould length in 'setThresholds()' evoke Exception
Repository: spark Updated Branches: refs/heads/master a1ff72e1c -> d9e0919d3 [SPARK-16851][ML] Incorrect threshould length in 'setThresholds()' evoke Exception ## What changes were proposed in this pull request? Add a length checking for threshoulds' length in method `setThreshoulds()` of classification models. ## How was this patch tested? unit tests Author: Zheng RuiFeng Closes #14457 from zhengruifeng/check_setThresholds. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d9e0919d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d9e0919d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d9e0919d Branch: refs/heads/master Commit: d9e0919d30e9f79a0eb1ceb8d1b5e9fc58cf085e Parents: a1ff72e Author: Zheng RuiFeng Authored: Tue Aug 2 07:22:41 2016 -0700 Committer: Yanbo Liang Committed: Tue Aug 2 07:22:41 2016 -0700 -- .../spark/ml/classification/ProbabilisticClassifier.scala | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d9e0919d/mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala b/mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala index 88642ab..19df8f7 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala @@ -83,7 +83,12 @@ abstract class ProbabilisticClassificationModel[ def setProbabilityCol(value: String): M = set(probabilityCol, value).asInstanceOf[M] /** @group setParam */ - def setThresholds(value: Array[Double]): M = set(thresholds, value).asInstanceOf[M] + def setThresholds(value: Array[Double]): M = { +require(value.length == numClasses, this.getClass.getSimpleName + + ".setThresholds() called with non-matching numClasses and thresholds.length." + + s" numClasses=$numClasses, but thresholds has length ${value.length}") +set(thresholds, value).asInstanceOf[M] + } /** * Transforms dataset by reading from [[featuresCol]], and appending new columns as specified by - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16850][SQL] Improve type checking error message for greatest/least
Repository: spark Updated Branches: refs/heads/master 10e1c0e63 -> a1ff72e1c [SPARK-16850][SQL] Improve type checking error message for greatest/least ## What changes were proposed in this pull request? Greatest/least function does not have the most friendly error message for data types. This patch improves the error message to not show the Seq type, and use more human readable data types. Before: ``` org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all have the same type, got GREATEST (ArrayBuffer(DecimalType(2,1), StringType)).; line 1 pos 7 ``` After: ``` org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all have the same type, got GREATEST(decimal(2,1), string).; line 1 pos 7 ``` ## How was this patch tested? Manually verified the output and also added unit tests to ConditionalExpressionSuite. Author: petermaxlee Closes #14453 from petermaxlee/SPARK-16850. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a1ff72e1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a1ff72e1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a1ff72e1 Branch: refs/heads/master Commit: a1ff72e1cce6f22249ccc4905e8cef30075beb2f Parents: 10e1c0e Author: petermaxlee Authored: Tue Aug 2 19:32:35 2016 +0800 Committer: Wenchen Fan Committed: Tue Aug 2 19:32:35 2016 +0800 -- .../catalyst/expressions/conditionalExpressions.scala | 4 ++-- .../expressions/ConditionalExpressionSuite.scala | 13 + 2 files changed, 15 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a1ff72e1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala index e97e089..5f2585f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala @@ -299,7 +299,7 @@ case class Least(children: Seq[Expression]) extends Expression { } else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) { TypeCheckResult.TypeCheckFailure( s"The expressions should all have the same type," + - s" got LEAST (${children.map(_.dataType)}).") + s" got LEAST(${children.map(_.dataType.simpleString).mkString(", ")}).") } else { TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName) } @@ -359,7 +359,7 @@ case class Greatest(children: Seq[Expression]) extends Expression { } else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) { TypeCheckResult.TypeCheckFailure( s"The expressions should all have the same type," + - s" got GREATEST (${children.map(_.dataType)}).") + s" got GREATEST(${children.map(_.dataType.simpleString).mkString(", ")}).") } else { TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName) } http://git-wip-us.apache.org/repos/asf/spark/blob/a1ff72e1/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala index 3c581ec..36185b8 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala @@ -21,6 +21,7 @@ import java.sql.{Date, Timestamp} import org.apache.spark.SparkFunSuite import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.TypeCheckFailure import org.apache.spark.sql.catalyst.dsl.expressions._ import org.apache.spark.sql.types._ @@ -181,6 +182,12 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper Literal(Timestamp.valueOf("2015-07-01 10:00:00", Timestamp.valueOf("2015-07-01 08:00:00"), InternalRow.empty) +// Type checking error +assert( + Least(Seq(Literal(1), Literal("1"))).checkInputDataTypes() == +TypeChec
spark git commit: [SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings
Repository: spark Updated Branches: refs/heads/branch-2.0 5fbf5f93e -> 9d9956e8f [SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings ## What changes were proposed in this pull request? This PR makes various minor updates to examples of all language bindings to make sure they are consistent with each other. Some typos and missing parts (JDBC example in Scala/Java/Python) are also fixed. ## How was this patch tested? Manually tested. Author: Cheng Lian Closes #14368 from liancheng/revise-examples. (cherry picked from commit 10e1c0e638774f5d746771b6dd251de2480f94eb) Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9d9956e8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9d9956e8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9d9956e8 Branch: refs/heads/branch-2.0 Commit: 9d9956e8f8abd41a603fde2347384428b7ec715c Parents: 5fbf5f9 Author: Cheng Lian Authored: Tue Aug 2 15:02:40 2016 +0800 Committer: Wenchen Fan Committed: Tue Aug 2 15:05:13 2016 +0800 -- docs/sql-programming-guide.md | 56 +++-- .../examples/sql/JavaSQLDataSourceExample.java | 23 +++- .../spark/examples/sql/JavaSparkSQLExample.java | 2 +- examples/src/main/python/sql/basic.py | 2 +- examples/src/main/python/sql/datasource.py | 32 -- examples/src/main/python/sql/hive.py| 2 +- examples/src/main/r/RSparkSQLExample.R | 113 ++- .../examples/sql/SQLDataSourceExample.scala | 22 +++- .../spark/examples/sql/SparkSQLExample.scala| 2 +- 9 files changed, 137 insertions(+), 117 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9d9956e8/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 33b170e..82b03a2 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -132,7 +132,7 @@ from a Hive table, or from [Spark data sources](#data-sources). As an example, the following creates a DataFrame based on the content of a JSON file: -{% include_example create_DataFrames r/RSparkSQLExample.R %} +{% include_example create_df r/RSparkSQLExample.R %} @@ -180,7 +180,7 @@ In addition to simple column references and expressions, DataFrames also have a -{% include_example dataframe_operations r/RSparkSQLExample.R %} +{% include_example untyped_ops r/RSparkSQLExample.R %} For a complete list of the types of operations that can be performed on a DataFrame refer to the [API Documentation](api/R/index.html). @@ -214,7 +214,7 @@ The `sql` function on a `SparkSession` enables applications to run SQL queries p The `sql` function enables applications to run SQL queries programmatically and returns the result as a `SparkDataFrame`. -{% include_example sql_query r/RSparkSQLExample.R %} +{% include_example run_sql r/RSparkSQLExample.R %} @@ -377,7 +377,7 @@ In the simplest form, the default data source (`parquet` unless otherwise config -{% include_example source_parquet r/RSparkSQLExample.R %} +{% include_example generic_load_save_functions r/RSparkSQLExample.R %} @@ -400,13 +400,11 @@ using this syntax. - {% include_example manual_load_options python/sql/datasource.py %} - - -{% include_example source_json r/RSparkSQLExample.R %} + +{% include_example manual_load_options r/RSparkSQLExample.R %} @@ -425,13 +423,11 @@ file directly with SQL. - {% include_example direct_sql python/sql/datasource.py %} - -{% include_example direct_query r/RSparkSQLExample.R %} +{% include_example direct_sql r/RSparkSQLExample.R %} @@ -523,7 +519,7 @@ Using the data from the above example: -{% include_example load_programmatically r/RSparkSQLExample.R %} +{% include_example basic_parquet_example r/RSparkSQLExample.R %} @@ -827,7 +823,7 @@ Note that the file that is offered as _a json file_ is not a typical JSON file. line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail. -{% include_example load_json_file r/RSparkSQLExample.R %} +{% include_example json_dataset r/RSparkSQLExample.R %} @@ -913,7 +909,7 @@ You may need to grant write privilege to the user who starts the spark applicati When working with Hive one must instantiate `SparkSession` with Hive support. This adds support for finding tables in the MetaStore and writing queries using HiveQL. -{% include_example hive_table r/RSparkSQLExample.R %} +{% include_example spark_hive r/RSparkSQLExample.R %} @@ -1055,43 +1051,19 @@ the Data Sources API. The following options are supported: - -
spark git commit: [SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings
Repository: spark Updated Branches: refs/heads/master 5184df06b -> 10e1c0e63 [SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings ## What changes were proposed in this pull request? This PR makes various minor updates to examples of all language bindings to make sure they are consistent with each other. Some typos and missing parts (JDBC example in Scala/Java/Python) are also fixed. ## How was this patch tested? Manually tested. Author: Cheng Lian Closes #14368 from liancheng/revise-examples. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/10e1c0e6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/10e1c0e6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/10e1c0e6 Branch: refs/heads/master Commit: 10e1c0e638774f5d746771b6dd251de2480f94eb Parents: 5184df0 Author: Cheng Lian Authored: Tue Aug 2 15:02:40 2016 +0800 Committer: Wenchen Fan Committed: Tue Aug 2 15:02:40 2016 +0800 -- docs/sql-programming-guide.md | 56 +++-- .../examples/sql/JavaSQLDataSourceExample.java | 23 +++- .../spark/examples/sql/JavaSparkSQLExample.java | 2 +- examples/src/main/python/sql/basic.py | 2 +- examples/src/main/python/sql/datasource.py | 32 -- examples/src/main/python/sql/hive.py| 2 +- examples/src/main/r/RSparkSQLExample.R | 113 ++- .../examples/sql/SQLDataSourceExample.scala | 22 +++- .../spark/examples/sql/SparkSQLExample.scala| 2 +- 9 files changed, 137 insertions(+), 117 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/10e1c0e6/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index d8c8698..5877f2b 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -132,7 +132,7 @@ from a Hive table, or from [Spark data sources](#data-sources). As an example, the following creates a DataFrame based on the content of a JSON file: -{% include_example create_DataFrames r/RSparkSQLExample.R %} +{% include_example create_df r/RSparkSQLExample.R %} @@ -180,7 +180,7 @@ In addition to simple column references and expressions, DataFrames also have a -{% include_example dataframe_operations r/RSparkSQLExample.R %} +{% include_example untyped_ops r/RSparkSQLExample.R %} For a complete list of the types of operations that can be performed on a DataFrame refer to the [API Documentation](api/R/index.html). @@ -214,7 +214,7 @@ The `sql` function on a `SparkSession` enables applications to run SQL queries p The `sql` function enables applications to run SQL queries programmatically and returns the result as a `SparkDataFrame`. -{% include_example sql_query r/RSparkSQLExample.R %} +{% include_example run_sql r/RSparkSQLExample.R %} @@ -377,7 +377,7 @@ In the simplest form, the default data source (`parquet` unless otherwise config -{% include_example source_parquet r/RSparkSQLExample.R %} +{% include_example generic_load_save_functions r/RSparkSQLExample.R %} @@ -400,13 +400,11 @@ using this syntax. - {% include_example manual_load_options python/sql/datasource.py %} - - -{% include_example source_json r/RSparkSQLExample.R %} + +{% include_example manual_load_options r/RSparkSQLExample.R %} @@ -425,13 +423,11 @@ file directly with SQL. - {% include_example direct_sql python/sql/datasource.py %} - -{% include_example direct_query r/RSparkSQLExample.R %} +{% include_example direct_sql r/RSparkSQLExample.R %} @@ -523,7 +519,7 @@ Using the data from the above example: -{% include_example load_programmatically r/RSparkSQLExample.R %} +{% include_example basic_parquet_example r/RSparkSQLExample.R %} @@ -839,7 +835,7 @@ Note that the file that is offered as _a json file_ is not a typical JSON file. line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail. -{% include_example load_json_file r/RSparkSQLExample.R %} +{% include_example json_dataset r/RSparkSQLExample.R %} @@ -925,7 +921,7 @@ You may need to grant write privilege to the user who starts the spark applicati When working with Hive one must instantiate `SparkSession` with Hive support. This adds support for finding tables in the MetaStore and writing queries using HiveQL. -{% include_example hive_table r/RSparkSQLExample.R %} +{% include_example spark_hive r/RSparkSQLExample.R %} @@ -1067,43 +1063,19 @@ the Data Sources API. The following options are supported: - -{% highlight scala %} -val jdbcDF = spark.read.format("jdbc").options( - Map("url" -> "jdbc:postgresql:db