spark git commit: [SQL][MINOR] use stricter type parameter to make it clear that parquet reader returns UnsafeRow

2016-08-02 Thread lian
Repository: spark
Updated Branches:
  refs/heads/master 386127377 -> ae226283e


[SQL][MINOR] use stricter type parameter to make it clear that parquet reader 
returns UnsafeRow

## What changes were proposed in this pull request?

a small code style change, it's better to make the type parameter more accurate.

## How was this patch tested?

N/A

Author: Wenchen Fan 

Closes #14458 from cloud-fan/parquet.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ae226283
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ae226283
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ae226283

Branch: refs/heads/master
Commit: ae226283e19ce396216c73b0ae2470efa122b65b
Parents: 3861273
Author: Wenchen Fan 
Authored: Wed Aug 3 08:23:26 2016 +0800
Committer: Cheng Lian 
Committed: Wed Aug 3 08:23:26 2016 +0800

--
 .../execution/datasources/parquet/ParquetFileFormat.scala |  4 ++--
 .../datasources/parquet/ParquetReadSupport.scala  | 10 +-
 .../datasources/parquet/ParquetRecordMaterializer.scala   |  6 +++---
 3 files changed, 10 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ae226283/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
index 772e031..c3e75f1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
@@ -370,11 +370,11 @@ private[sql] class ParquetFileFormat
 logDebug(s"Falling back to parquet-mr")
 val reader = pushed match {
   case Some(filter) =>
-new ParquetRecordReader[InternalRow](
+new ParquetRecordReader[UnsafeRow](
   new ParquetReadSupport,
   FilterCompat.get(filter, null))
   case _ =>
-new ParquetRecordReader[InternalRow](new ParquetReadSupport)
+new ParquetRecordReader[UnsafeRow](new ParquetReadSupport)
 }
 reader.initialize(split, hadoopAttemptContext)
 reader

http://git-wip-us.apache.org/repos/asf/spark/blob/ae226283/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
index 8a2e0d7..f1a35dd 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
@@ -29,12 +29,12 @@ import org.apache.parquet.schema._
 import org.apache.parquet.schema.Type.Repetition
 
 import org.apache.spark.internal.Logging
-import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.UnsafeRow
 import org.apache.spark.sql.types._
 
 /**
  * A Parquet [[ReadSupport]] implementation for reading Parquet records as 
Catalyst
- * [[InternalRow]]s.
+ * [[UnsafeRow]]s.
  *
  * The API interface of [[ReadSupport]] is a little bit over complicated 
because of historical
  * reasons.  In older versions of parquet-mr (say 1.6.0rc3 and prior), 
[[ReadSupport]] need to be
@@ -48,7 +48,7 @@ import org.apache.spark.sql.types._
  * Due to this reason, we no longer rely on [[ReadContext]] to pass requested 
schema from [[init()]]
  * to [[prepareForRead()]], but use a private `var` for simplicity.
  */
-private[parquet] class ParquetReadSupport extends ReadSupport[InternalRow] 
with Logging {
+private[parquet] class ParquetReadSupport extends ReadSupport[UnsafeRow] with 
Logging {
   private var catalystRequestedSchema: StructType = _
 
   /**
@@ -72,13 +72,13 @@ private[parquet] class ParquetReadSupport extends 
ReadSupport[InternalRow] with
   /**
* Called on executor side after [[init()]], before instantiating actual 
Parquet record readers.
* Responsible for instantiating [[RecordMaterializer]], which is used for 
converting Parquet
-   * records to Catalyst [[InternalRow]]s.
+   * records to Catalyst [[UnsafeRow]]s.
*/
   override def prepareForRead(
   conf: Configuration,
   keyValueMetaData: JMap[String, String],
   fileSchema: MessageType,
-  readContext: ReadContext): RecordMaterializer[Inter

spark git commit: [SPARK-16796][WEB UI] Visible passwords on Spark environment page

2016-08-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master b73a57060 -> 386127377


[SPARK-16796][WEB UI] Visible passwords on Spark environment page

## What changes were proposed in this pull request?

Mask spark.ssl.keyPassword, spark.ssl.keyStorePassword, 
spark.ssl.trustStorePassword in Web UI environment page.
(Changes their values to * in env. page)

## How was this patch tested?

I've built spark, run spark shell and checked that this values have been masked 
with *.

Also run tests:
./dev/run-tests

[info] ScalaTest
[info] Run completed in 1 hour, 9 minutes, 5 seconds.
[info] Total number of tests run: 2166
[info] Suites: completed 65, aborted 0
[info] Tests: succeeded 2166, failed 0, canceled 0, ignored 590, pending 0
[info] All tests passed.

![mask](https://cloud.githubusercontent.com/assets/15244468/17262154/7641e132-55e2-11e6-8a6c-30ead77c7372.png)

Author: Artur Sukhenko 

Closes #14409 from Devian-ua/maskpass.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/38612737
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/38612737
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/38612737

Branch: refs/heads/master
Commit: 3861273771c2631e88e1f37a498c644ad45ac1c0
Parents: b73a570
Author: Artur Sukhenko 
Authored: Tue Aug 2 16:13:12 2016 -0700
Committer: Sean Owen 
Committed: Tue Aug 2 16:13:12 2016 -0700

--
 .../main/scala/org/apache/spark/ui/env/EnvironmentPage.scala   | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/38612737/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala 
b/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala
index f0a1174..22136a6 100644
--- a/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala
@@ -26,11 +26,15 @@ import org.apache.spark.ui.{UIUtils, WebUIPage}
 private[ui] class EnvironmentPage(parent: EnvironmentTab) extends 
WebUIPage("") {
   private val listener = parent.listener
 
+  private def removePass(kv: (String, String)): (String, String) = {
+if (kv._1.toLowerCase.contains("password")) (kv._1, "**") else kv
+  }
+
   def render(request: HttpServletRequest): Seq[Node] = {
 val runtimeInformationTable = UIUtils.listingTable(
   propertyHeader, jvmRow, listener.jvmInformation, fixedWidth = true)
 val sparkPropertiesTable = UIUtils.listingTable(
-  propertyHeader, propertyRow, listener.sparkProperties, fixedWidth = true)
+  propertyHeader, propertyRow, listener.sparkProperties.map(removePass), 
fixedWidth = true)
 val systemPropertiesTable = UIUtils.listingTable(
   propertyHeader, propertyRow, listener.systemProperties, fixedWidth = 
true)
 val classpathEntriesTable = UIUtils.listingTable(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (branch-1.6)

2016-08-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 8a22275de -> 797e758b1


[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (branch-1.6)

## What changes were proposed in this pull request?

Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with 
Java 8 on Java 7

## How was this patch tested?

Compilation. Existing automatic tests

Author: Maciej Brynski 

Closes #14390 from maver1ck/spark-15541.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/797e758b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/797e758b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/797e758b

Branch: refs/heads/branch-1.6
Commit: 797e758b16946aa5779cc302f943eafec34c0c39
Parents: 8a22275
Author: Maciej Brynski 
Authored: Tue Aug 2 16:07:35 2016 -0700
Committer: Sean Owen 
Committed: Tue Aug 2 16:07:35 2016 -0700

--
 .../scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala  | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/797e758b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
index 8f4ce74..e66d2ad 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.sql.catalyst.analysis
 
-import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.{ConcurrentHashMap, ConcurrentMap}
 
 import scala.collection.JavaConverters._
 import scala.collection.mutable
@@ -80,7 +80,8 @@ trait Catalog {
 }
 
 class SimpleCatalog(val conf: CatalystConf) extends Catalog {
-  private[this] val tables = new ConcurrentHashMap[String, LogicalPlan]
+  private[this] val tables: ConcurrentMap[String, LogicalPlan] =
+new ConcurrentHashMap[String, LogicalPlan]
 
   override def registerTable(tableIdent: TableIdentifier, plan: LogicalPlan): 
Unit = {
 tables.put(getTableName(tableIdent), plan)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16858][SQL][TEST] Removal of TestHiveSharedState

2016-08-02 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master e9fc0b6a8 -> b73a57060


[SPARK-16858][SQL][TEST] Removal of TestHiveSharedState

### What changes were proposed in this pull request?
This PR is to remove `TestHiveSharedState`.

Also, this is also associated with the Hive refractoring for removing 
`HiveSharedState`.

### How was this patch tested?
The existing test cases

Author: gatorsmile 

Closes #14463 from gatorsmile/removeTestHiveSharedState.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b73a5706
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b73a5706
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b73a5706

Branch: refs/heads/master
Commit: b73a5706032eae7c87f7f2f8b0a72e7ee6d2e7e5
Parents: e9fc0b6
Author: gatorsmile 
Authored: Tue Aug 2 14:17:45 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 2 14:17:45 2016 -0700

--
 .../apache/spark/sql/hive/test/TestHive.scala   | 78 +---
 .../spark/sql/hive/ShowCreateTableSuite.scala   |  2 +-
 2 files changed, 20 insertions(+), 60 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b73a5706/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
index fbacd59..cdc8d61 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
@@ -24,7 +24,6 @@ import scala.collection.JavaConverters._
 import scala.collection.mutable
 import scala.language.implicitConversions
 
-import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.hive.conf.HiveConf.ConfVars
 import org.apache.hadoop.hive.ql.exec.FunctionRegistry
 import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
@@ -40,7 +39,6 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.execution.QueryExecution
 import org.apache.spark.sql.execution.command.CacheTableCommand
 import org.apache.spark.sql.hive._
-import org.apache.spark.sql.hive.client.HiveClient
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.util.{ShutdownHookManager, Utils}
 
@@ -86,8 +84,6 @@ class TestHiveContext(
 new TestHiveContext(sparkSession.newSession())
   }
 
-  override def sharedState: TestHiveSharedState = sparkSession.sharedState
-
   override def sessionState: TestHiveSessionState = sparkSession.sessionState
 
   def setCacheTables(c: Boolean): Unit = {
@@ -112,38 +108,43 @@ class TestHiveContext(
  * A [[SparkSession]] used in [[TestHiveContext]].
  *
  * @param sc SparkContext
- * @param scratchDirPath scratch directory used by Hive's metastore client
- * @param metastoreTemporaryConf configuration options for Hive's metastore
- * @param existingSharedState optional [[TestHiveSharedState]]
+ * @param existingSharedState optional [[HiveSharedState]]
  * @param loadTestTables if true, load the test tables. They can only be 
loaded when running
  *   in the JVM, i.e when calling from Python this flag 
has to be false.
  */
 private[hive] class TestHiveSparkSession(
 @transient private val sc: SparkContext,
-scratchDirPath: File,
-metastoreTemporaryConf: Map[String, String],
-@transient private val existingSharedState: Option[TestHiveSharedState],
+@transient private val existingSharedState: Option[HiveSharedState],
 private val loadTestTables: Boolean)
   extends SparkSession(sc) with Logging { self =>
 
   def this(sc: SparkContext, loadTestTables: Boolean) {
 this(
   sc,
-  TestHiveContext.makeScratchDir(),
-  HiveUtils.newTemporaryConfiguration(useInMemoryDerby = false),
-  None,
+  existingSharedState = None,
   loadTestTables)
   }
 
+  { // set the metastore temporary configuration
+val metastoreTempConf = 
HiveUtils.newTemporaryConfiguration(useInMemoryDerby = false) ++ Map(
+  ConfVars.METASTORE_INTEGER_JDO_PUSHDOWN.varname -> "true",
+  // scratch directory used by Hive's metastore client
+  ConfVars.SCRATCHDIR.varname -> 
TestHiveContext.makeScratchDir().toURI.toString,
+  ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY.varname -> "1")
+
+metastoreTempConf.foreach { case (k, v) =>
+  sc.hadoopConfiguration.set(k, v)
+}
+  }
+
   assume(sc.conf.get(CATALOG_IMPLEMENTATION) == "hive")
 
-  // TODO: Let's remove TestHiveSharedState and TestHiveSessionState. 
Otherwise,
+  // TODO: Let's remove HiveSharedState and TestHiveSessionState. Otherwise,
   // we are not really testing the reflection logic based on the setting of
   // CATALOG_IMPLEMENTATI

spark git commit: [SPARK-16787] SparkContext.addFile() should not throw if called twice with the same file

2016-08-02 Thread joshrosen
Repository: spark
Updated Branches:
  refs/heads/master a9beeaaae -> e9fc0b6a8


[SPARK-16787] SparkContext.addFile() should not throw if called twice with the 
same file

## What changes were proposed in this pull request?

The behavior of `SparkContext.addFile()` changed slightly with the introduction 
of the Netty-RPC-based file server, which was introduced in Spark 1.6 (where it 
was disabled by default) and became the default / only file server in Spark 
2.0.0.

Prior to 2.0, calling `SparkContext.addFile()` with files that have the same 
name and identical contents would succeed. This behavior was never explicitly 
documented but Spark has behaved this way since very early 1.x versions.

In 2.0 (or 1.6 with the Netty file server enabled), the second `addFile()` call 
will fail with a requirement error because NettyStreamManager tries to guard 
against duplicate file registration.

This problem also affects `addJar()` in a more subtle way: the 
`fileServer.addJar()` call will also fail with an exception but that exception 
is logged and ignored; I believe that the problematic exception-catching path 
was mistakenly copied from some old code which was only relevant to very old 
versions of Spark and YARN mode.

I believe that this change of behavior was unintentional, so this patch weakens 
the `require` check so that adding the same filename at the same path will 
succeed.

At file download time, Spark tasks will fail with exceptions if an executor 
already has a local copy of a file and that file's contents do not match the 
contents of the file being downloaded / added. As a result, it's important that 
we prevent files with the same name and different contents from being served 
because allowing that can effectively brick an executor by preventing it from 
successfully launching any new tasks. Before this patch's change, this was 
prevented by forbidding `addFile()` from being called twice on files with the 
same name. Because Spark does not defensively copy local files that are passed 
to `addFile` it is vulnerable to files' contents changing, so I think it's okay 
to rely on an implicit assumption that these files are intended to be immutable 
(since if they _are_ mutable then this can lead to either explicit task 
failures or implicit incorrectness (in case new executors silently get newer 
copies of the file while old executors continue to use an older versi
 on)). To guard against this, I have decided to only update the file addition 
timestamps on the first call to `addFile()`; duplicate calls will succeed but 
will not update the timestamp. This behavior is fine as long as we assume files 
are immutable, which seems reasonable given the behaviors described above.

As part of this change, I also improved the thread-safety of the `addedJars` 
and `addedFiles` maps; this is important because these maps may be concurrently 
read by a task launching thread and written by a driver thread in case the 
user's driver code is multi-threaded.

## How was this patch tested?

I added regression tests in `SparkContextSuite`.

Author: Josh Rosen 

Closes #14396 from JoshRosen/SPARK-16787.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e9fc0b6a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e9fc0b6a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e9fc0b6a

Branch: refs/heads/master
Commit: e9fc0b6a8b4ce62cab56d18581f588c67b811f5b
Parents: a9beeaa
Author: Josh Rosen 
Authored: Tue Aug 2 12:02:11 2016 -0700
Committer: Josh Rosen 
Committed: Tue Aug 2 12:02:11 2016 -0700

--
 .../scala/org/apache/spark/SparkContext.scala   | 36 ++
 .../spark/rpc/netty/NettyStreamManager.scala| 12 +++--
 .../scala/org/apache/spark/scheduler/Task.scala |  5 +-
 .../org/apache/spark/SparkContextSuite.scala| 51 
 4 files changed, 78 insertions(+), 26 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e9fc0b6a/core/src/main/scala/org/apache/spark/SparkContext.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index d48e2b4..48126c2 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -21,7 +21,7 @@ import java.io._
 import java.lang.reflect.Constructor
 import java.net.URI
 import java.util.{Arrays, Locale, Properties, ServiceLoader, UUID}
-import java.util.concurrent.ConcurrentMap
+import java.util.concurrent.{ConcurrentHashMap, ConcurrentMap}
 import java.util.concurrent.atomic.{AtomicBoolean, AtomicInteger, 
AtomicReference}
 
 import scala.collection.JavaConverters._
@@ -262,8 +262,8 @@ class SparkContext(config: Spark

spark git commit: [SPARK-16787] SparkContext.addFile() should not throw if called twice with the same file

2016-08-02 Thread joshrosen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 f190bb83b -> 063a507fc


[SPARK-16787] SparkContext.addFile() should not throw if called twice with the 
same file

## What changes were proposed in this pull request?

The behavior of `SparkContext.addFile()` changed slightly with the introduction 
of the Netty-RPC-based file server, which was introduced in Spark 1.6 (where it 
was disabled by default) and became the default / only file server in Spark 
2.0.0.

Prior to 2.0, calling `SparkContext.addFile()` with files that have the same 
name and identical contents would succeed. This behavior was never explicitly 
documented but Spark has behaved this way since very early 1.x versions.

In 2.0 (or 1.6 with the Netty file server enabled), the second `addFile()` call 
will fail with a requirement error because NettyStreamManager tries to guard 
against duplicate file registration.

This problem also affects `addJar()` in a more subtle way: the 
`fileServer.addJar()` call will also fail with an exception but that exception 
is logged and ignored; I believe that the problematic exception-catching path 
was mistakenly copied from some old code which was only relevant to very old 
versions of Spark and YARN mode.

I believe that this change of behavior was unintentional, so this patch weakens 
the `require` check so that adding the same filename at the same path will 
succeed.

At file download time, Spark tasks will fail with exceptions if an executor 
already has a local copy of a file and that file's contents do not match the 
contents of the file being downloaded / added. As a result, it's important that 
we prevent files with the same name and different contents from being served 
because allowing that can effectively brick an executor by preventing it from 
successfully launching any new tasks. Before this patch's change, this was 
prevented by forbidding `addFile()` from being called twice on files with the 
same name. Because Spark does not defensively copy local files that are passed 
to `addFile` it is vulnerable to files' contents changing, so I think it's okay 
to rely on an implicit assumption that these files are intended to be immutable 
(since if they _are_ mutable then this can lead to either explicit task 
failures or implicit incorrectness (in case new executors silently get newer 
copies of the file while old executors continue to use an older versi
 on)). To guard against this, I have decided to only update the file addition 
timestamps on the first call to `addFile()`; duplicate calls will succeed but 
will not update the timestamp. This behavior is fine as long as we assume files 
are immutable, which seems reasonable given the behaviors described above.

As part of this change, I also improved the thread-safety of the `addedJars` 
and `addedFiles` maps; this is important because these maps may be concurrently 
read by a task launching thread and written by a driver thread in case the 
user's driver code is multi-threaded.

## How was this patch tested?

I added regression tests in `SparkContextSuite`.

Author: Josh Rosen 

Closes #14396 from JoshRosen/SPARK-16787.

(cherry picked from commit e9fc0b6a8b4ce62cab56d18581f588c67b811f5b)
Signed-off-by: Josh Rosen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/063a507f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/063a507f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/063a507f

Branch: refs/heads/branch-2.0
Commit: 063a507fce862d14061b0c0464b7a51a0afde066
Parents: f190bb8
Author: Josh Rosen 
Authored: Tue Aug 2 12:02:11 2016 -0700
Committer: Josh Rosen 
Committed: Tue Aug 2 12:03:10 2016 -0700

--
 .../scala/org/apache/spark/SparkContext.scala   | 36 ++
 .../spark/rpc/netty/NettyStreamManager.scala| 12 +++--
 .../scala/org/apache/spark/scheduler/Task.scala |  5 +-
 .../org/apache/spark/SparkContextSuite.scala| 51 
 4 files changed, 78 insertions(+), 26 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/063a507f/core/src/main/scala/org/apache/spark/SparkContext.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index fe15052..d3e8de3 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -21,7 +21,7 @@ import java.io._
 import java.lang.reflect.Constructor
 import java.net.URI
 import java.util.{Arrays, Locale, Properties, ServiceLoader, UUID}
-import java.util.concurrent.ConcurrentMap
+import java.util.concurrent.{ConcurrentHashMap, ConcurrentMap}
 import java.util.concurrent.atomic.{AtomicBoolean, AtomicInteger, 
AtomicRe

spark git commit: [SPARK-16855][SQL] move Greatest and Least from conditionalExpressions.scala to arithmetic.scala

2016-08-02 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master cbdff4935 -> a9beeaaae


[SPARK-16855][SQL] move Greatest and Least from conditionalExpressions.scala to 
arithmetic.scala

## What changes were proposed in this pull request?

`Greatest` and `Least` are not conditional expressions, but arithmetic 
expressions.

## How was this patch tested?

N/A

Author: Wenchen Fan 

Closes #14460 from cloud-fan/move.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a9beeaaa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a9beeaaa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a9beeaaa

Branch: refs/heads/master
Commit: a9beeaaaeb52e9c940fe86a3d70801655401623c
Parents: cbdff49
Author: Wenchen Fan 
Authored: Tue Aug 2 11:08:32 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 2 11:08:32 2016 -0700

--
 .../sql/catalyst/expressions/arithmetic.scala   | 121 ++
 .../expressions/conditionalExpressions.scala| 122 ---
 .../expressions/ArithmeticExpressionSuite.scala | 107 
 .../ConditionalExpressionSuite.scala| 107 
 4 files changed, 228 insertions(+), 229 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a9beeaaa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 77d40a5..4aebef9 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.catalyst.expressions
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.util.TypeUtils
 import org.apache.spark.sql.types._
@@ -460,3 +461,123 @@ case class Pmod(left: Expression, right: Expression) 
extends BinaryArithmetic wi
 
   override def sql: String = s"$prettyName(${left.sql}, ${right.sql})"
 }
+
+/**
+ * A function that returns the least value of all parameters, skipping null 
values.
+ * It takes at least 2 parameters, and returns null iff all parameters are 
null.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n1, ...) - Returns the least value of all parameters, 
skipping null values.")
+case class Least(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = children.forall(_.nullable)
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = TypeUtils.getInterpretedOrdering(dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"LEAST requires at least 2 arguments")
+} else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) {
+  TypeCheckResult.TypeCheckFailure(
+s"The expressions should all have the same type," +
+  s" got LEAST(${children.map(_.dataType.simpleString).mkString(", 
")}).")
+} else {
+  TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName)
+}
+  }
+
+  override def dataType: DataType = children.head.dataType
+
+  override def eval(input: InternalRow): Any = {
+children.foldLeft[Any](null)((r, c) => {
+  val evalc = c.eval(input)
+  if (evalc != null) {
+if (r == null || ordering.lt(evalc, r)) evalc else r
+  } else {
+r
+  }
+})
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val first = evalChildren(0)
+val rest = evalChildren.drop(1)
+def updateEval(eval: ExprCode): String = {
+  s"""
+${eval.code}
+if (!${eval.isNull} && (${ev.isNull} ||
+  ${ctx.genGreater(dataType, ev.value, eval.value)})) {
+  ${ev.isNull} = false;
+  ${ev.value} = ${eval.value};
+}
+  """
+}
+ev.copy(code = s"""
+  ${first.code}
+  boolean ${ev.isNull} = ${first.isNull};
+  ${ctx.javaType(dataType)} ${ev.value} = ${first.value};
+  ${rest.map(updateEval).mkString("\n")}""")
+  }
+}
+
+/**
+ * A function that returns the greatest value of all parameters, skipping null 
values.
+ * It takes at least 2 parameters, and returns null iff all parameters are 
null.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n1, ...) - Returns the greatest value of all parameters, 
skipping null values.")
+case

spark git commit: [SPARK-16816] Modify java example which is also reflect in documentation exmaple

2016-08-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 2330f3ecb -> cbdff4935


[SPARK-16816] Modify java example which is also reflect in documentation exmaple

## What changes were proposed in this pull request?

Modify java example which is also reflect in document.

## How was this patch tested?

run test cases.

Author: sandy 

Closes #14436 from phalodi/SPARK-16816.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cbdff493
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cbdff493
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cbdff493

Branch: refs/heads/master
Commit: cbdff49357d6ce8d41b76b44628d90ead193eb5f
Parents: 2330f3e
Author: sandy 
Authored: Tue Aug 2 10:34:01 2016 -0700
Committer: Sean Owen 
Committed: Tue Aug 2 10:34:01 2016 -0700

--
 .../examples/sql/JavaSQLDataSourceExample.java  | 16 
 1 file changed, 16 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cbdff493/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
index 52e3b62..fc92446 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
@@ -19,10 +19,13 @@ package org.apache.spark.examples.sql;
 // $example on:schema_merging$
 import java.io.Serializable;
 import java.util.ArrayList;
+import java.util.Arrays;
 import java.util.List;
 // $example off:schema_merging$
 
 // $example on:basic_parquet_example$
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
 import org.apache.spark.api.java.function.MapFunction;
 import org.apache.spark.sql.Encoders;
 // $example on:schema_merging$
@@ -213,6 +216,19 @@ public class JavaSQLDataSourceExample {
 // +--+
 // |Justin|
 // +--+
+
+// Alternatively, a DataFrame can be created for a JSON dataset 
represented by
+// an RDD[String] storing one JSON object per string.
+List jsonData = Arrays.asList(
+
"{\"name\":\"Yin\",\"address\":{\"city\":\"Columbus\",\"state\":\"Ohio\"}}");
+JavaRDD anotherPeopleRDD = new 
JavaSparkContext(spark.sparkContext()).parallelize(jsonData);
+Dataset anotherPeople = spark.read().json(anotherPeopleRDD);
+anotherPeople.show();
+// +---++
+// |address|name|
+// +---++
+// |[Columbus,Ohio]| Yin|
+// +---++
 // $example off:json_dataset$
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16850][SQL] Improve type checking error message for greatest/least

2016-08-02 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 a937c9ee4 -> f190bb83b


[SPARK-16850][SQL] Improve type checking error message for greatest/least

Greatest/least function does not have the most friendly error message for data 
types. This patch improves the error message to not show the Seq type, and use 
more human readable data types.

Before:
```
org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS 
DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all 
have the same type, got GREATEST (ArrayBuffer(DecimalType(2,1), StringType)).; 
line 1 pos 7
```

After:
```
org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS 
DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all 
have the same type, got GREATEST(decimal(2,1), string).; line 1 pos 7
```

Manually verified the output and also added unit tests to 
ConditionalExpressionSuite.

Author: petermaxlee 

Closes #14453 from petermaxlee/SPARK-16850.

(cherry picked from commit a1ff72e1cce6f22249ccc4905e8cef30075beb2f)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f190bb83
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f190bb83
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f190bb83

Branch: refs/heads/branch-2.0
Commit: f190bb83beaafb65c8e6290e9ecaa61ac51e04bb
Parents: a937c9e
Author: petermaxlee 
Authored: Tue Aug 2 19:32:35 2016 +0800
Committer: Reynold Xin 
Committed: Tue Aug 2 10:22:18 2016 -0700

--
 .../catalyst/expressions/conditionalExpressions.scala  |  4 ++--
 .../expressions/ConditionalExpressionSuite.scala   | 13 +
 2 files changed, 15 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f190bb83/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
index e97e089..5f2585f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
@@ -299,7 +299,7 @@ case class Least(children: Seq[Expression]) extends 
Expression {
 } else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) {
   TypeCheckResult.TypeCheckFailure(
 s"The expressions should all have the same type," +
-  s" got LEAST (${children.map(_.dataType)}).")
+  s" got LEAST(${children.map(_.dataType.simpleString).mkString(", 
")}).")
 } else {
   TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName)
 }
@@ -359,7 +359,7 @@ case class Greatest(children: Seq[Expression]) extends 
Expression {
 } else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) {
   TypeCheckResult.TypeCheckFailure(
 s"The expressions should all have the same type," +
-  s" got GREATEST (${children.map(_.dataType)}).")
+  s" got GREATEST(${children.map(_.dataType.simpleString).mkString(", 
")}).")
 } else {
   TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName)
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/f190bb83/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
index 3c581ec..36185b8 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
@@ -21,6 +21,7 @@ import java.sql.{Date, Timestamp}
 
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.TypeCheckFailure
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.types._
 
@@ -181,6 +182,12 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 Literal(Timestamp.valueOf("2015-07-01 10:00:00",
   Timestamp.valueOf("2015-07-01 08:00:00"), InternalRow.empty)
 
+// Type checking error
+assert(
+  Least(Seq(Literal(1), Literal("1"))).checkInputDataType

spark git commit: [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals

2016-08-02 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 146001a9f -> 2330f3ecb


[SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals

## What changes were proposed in this pull request?
In Spark 1.6 (with Hive support) we could use `CURRENT_DATE` and 
`CURRENT_TIMESTAMP` functions as literals (without adding braces), for example:
```SQL
select /* Spark 1.6: */ current_date, /* Spark 1.6  & Spark 2.0: */ 
current_date()
```
This was accidentally dropped in Spark 2.0. This PR reinstates this 
functionality.

## How was this patch tested?
Added a case to ExpressionParserSuite.

Author: Herman van Hovell 

Closes #14442 from hvanhovell/SPARK-16836.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2330f3ec
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2330f3ec
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2330f3ec

Branch: refs/heads/master
Commit: 2330f3ecbbd89c7eaab9cc0d06726aa743b16334
Parents: 146001a
Author: Herman van Hovell 
Authored: Tue Aug 2 10:09:47 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 2 10:09:47 2016 -0700

--
 .../org/apache/spark/sql/catalyst/parser/SqlBase.g4|  5 -
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala  | 13 +
 .../sql/catalyst/parser/ExpressionParserSuite.scala|  5 +
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 11 ++-
 4 files changed, 32 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2330f3ec/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
--
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 5e10462..c7d5086 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -500,6 +500,7 @@ valueExpression
 
 primaryExpression
 : constant 
#constantDefault
+| name=(CURRENT_DATE | CURRENT_TIMESTAMP)  
#timeFunctionCall
 | ASTERISK 
#star
 | qualifiedName '.' ASTERISK   
#star
 | '(' expression (',' expression)+ ')' 
#rowConstructor
@@ -660,7 +661,7 @@ nonReserved
 | NULL | ORDER | OUTER | TABLE | TRUE | WITH | RLIKE
 | AND | CASE | CAST | DISTINCT | DIV | ELSE | END | FUNCTION | INTERVAL | 
MACRO | OR | STRATIFY | THEN
 | UNBOUNDED | WHEN
-| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT
+| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT | 
CURRENT_DATE | CURRENT_TIMESTAMP
 ;
 
 SELECT: 'SELECT';
@@ -880,6 +881,8 @@ OPTION: 'OPTION';
 ANTI: 'ANTI';
 LOCAL: 'LOCAL';
 INPATH: 'INPATH';
+CURRENT_DATE: 'CURRENT_DATE';
+CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP';
 
 STRING
 : '\'' ( ~('\''|'\\') | ('\\' .) )* '\''

http://git-wip-us.apache.org/repos/asf/spark/blob/2330f3ec/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index f2cc8d3..679adf2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -1023,6 +1023,19 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
   }
 
   /**
+   * Create a current timestamp/date expression. These are different from 
regular function because
+   * they do not require the user to specify braces when calling them.
+   */
+  override def visitTimeFunctionCall(ctx: TimeFunctionCallContext): Expression 
= withOrigin(ctx) {
+ctx.name.getType match {
+  case SqlBaseParser.CURRENT_DATE =>
+CurrentDate()
+  case SqlBaseParser.CURRENT_TIMESTAMP =>
+CurrentTimestamp()
+}
+  }
+
+  /**
* Create a function database (optional) and name pair.
*/
   protected def visitFunctionName(ctx: QualifiedNameContext): 
FunctionIdentifier = {

http://git-wip-us.apache.org/repos/asf/spark/blob/2330f3ec/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala
---

spark git commit: [SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals

2016-08-02 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 ef7927e8e -> a937c9ee4


[SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP literals

## What changes were proposed in this pull request?
In Spark 1.6 (with Hive support) we could use `CURRENT_DATE` and 
`CURRENT_TIMESTAMP` functions as literals (without adding braces), for example:
```SQL
select /* Spark 1.6: */ current_date, /* Spark 1.6  & Spark 2.0: */ 
current_date()
```
This was accidentally dropped in Spark 2.0. This PR reinstates this 
functionality.

## How was this patch tested?
Added a case to ExpressionParserSuite.

Author: Herman van Hovell 

Closes #14442 from hvanhovell/SPARK-16836.

(cherry picked from commit 2330f3ecbbd89c7eaab9cc0d06726aa743b16334)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a937c9ee
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a937c9ee
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a937c9ee

Branch: refs/heads/branch-2.0
Commit: a937c9ee44e0766194fc8ca4bce2338453112a53
Parents: ef7927e
Author: Herman van Hovell 
Authored: Tue Aug 2 10:09:47 2016 -0700
Committer: Reynold Xin 
Committed: Tue Aug 2 10:09:53 2016 -0700

--
 .../org/apache/spark/sql/catalyst/parser/SqlBase.g4|  5 -
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala  | 13 +
 .../sql/catalyst/parser/ExpressionParserSuite.scala|  5 +
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 11 ++-
 4 files changed, 32 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a937c9ee/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
--
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 4c15f9c..de98a87 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -493,6 +493,7 @@ valueExpression
 
 primaryExpression
 : constant 
#constantDefault
+| name=(CURRENT_DATE | CURRENT_TIMESTAMP)  
#timeFunctionCall
 | ASTERISK 
#star
 | qualifiedName '.' ASTERISK   
#star
 | '(' expression (',' expression)+ ')' 
#rowConstructor
@@ -653,7 +654,7 @@ nonReserved
 | NULL | ORDER | OUTER | TABLE | TRUE | WITH | RLIKE
 | AND | CASE | CAST | DISTINCT | DIV | ELSE | END | FUNCTION | INTERVAL | 
MACRO | OR | STRATIFY | THEN
 | UNBOUNDED | WHEN
-| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT
+| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT | 
CURRENT_DATE | CURRENT_TIMESTAMP
 ;
 
 SELECT: 'SELECT';
@@ -873,6 +874,8 @@ OPTION: 'OPTION';
 ANTI: 'ANTI';
 LOCAL: 'LOCAL';
 INPATH: 'INPATH';
+CURRENT_DATE: 'CURRENT_DATE';
+CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP';
 
 STRING
 : '\'' ( ~('\''|'\\') | ('\\' .) )* '\''

http://git-wip-us.apache.org/repos/asf/spark/blob/a937c9ee/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index c7420a1..1a0e7ab 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -1023,6 +1023,19 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
Logging {
   }
 
   /**
+   * Create a current timestamp/date expression. These are different from 
regular function because
+   * they do not require the user to specify braces when calling them.
+   */
+  override def visitTimeFunctionCall(ctx: TimeFunctionCallContext): Expression 
= withOrigin(ctx) {
+ctx.name.getType match {
+  case SqlBaseParser.CURRENT_DATE =>
+CurrentDate()
+  case SqlBaseParser.CURRENT_TIMESTAMP =>
+CurrentTimestamp()
+}
+  }
+
+  /**
* Create a function database (optional) and name pair.
*/
   protected def visitFunctionName(ctx: QualifiedNameContext): 
FunctionIdentifier = {

http://git-wip-us.apache.org/repos/asf/spark/blob/a937c9ee/sql/catalyst/src/test/scal

spark git commit: [SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs

2016-08-02 Thread davies
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 22f0899bc -> ef7927e8e


[SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs

## What changes were proposed in this pull request?

There are two related bugs of Python-only UDTs. Because the test case of second 
one needs the first fix too. I put them into one PR. If it is not appropriate, 
please let me know.

### First bug: When MapObjects works on Python-only UDTs

`RowEncoder` will use `PythonUserDefinedType.sqlType` for its deserializer 
expression. If the sql type is `ArrayType`, we will have `MapObjects` working 
on it. But `MapObjects` doesn't consider `PythonUserDefinedType` as its input 
data type. It causes error like:

import pyspark.sql.group
from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
from pyspark.sql.types import *

schema = StructType().add("key", LongType()).add("val", PythonOnlyUDT())
df = spark.createDataFrame([(i % 3, PythonOnlyPoint(float(i), float(i))) 
for i in range(10)], schema=schema)
df.show()

File "/home/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 
312, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while 
calling o36.showString.
: java.lang.RuntimeException: Error while decoding: scala.MatchError: 
org.apache.spark.sql.types.PythonUserDefinedTypef4ceede8 (of class 
org.apache.spark.sql.types.PythonUserDefinedType)
...

### Second bug: When Python-only UDTs is the element type of ArrayType

import pyspark.sql.group
from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
from pyspark.sql.types import *

schema = StructType().add("key", LongType()).add("val", 
ArrayType(PythonOnlyUDT()))
df = spark.createDataFrame([(i % 3, [PythonOnlyPoint(float(i), float(i))]) 
for i in range(10)], schema=schema)
df.show()

## How was this patch tested?
PySpark's sql tests.

Author: Liang-Chi Hsieh 

Closes #13778 from viirya/fix-pyudt.

(cherry picked from commit 146001a9ffefc7aaedd3d888d68c7a9b80bca545)
Signed-off-by: Davies Liu 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ef7927e8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ef7927e8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ef7927e8

Branch: refs/heads/branch-2.0
Commit: ef7927e8e77558f9a18eacc8491b0c28231e2769
Parents: 22f0899
Author: Liang-Chi Hsieh 
Authored: Tue Aug 2 10:08:18 2016 -0700
Committer: Davies Liu 
Committed: Tue Aug 2 10:08:34 2016 -0700

--
 python/pyspark/sql/tests.py | 35 
 .../sql/catalyst/encoders/RowEncoder.scala  |  9 -
 .../catalyst/expressions/objects/objects.scala  | 17 --
 3 files changed, 58 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ef7927e8/python/pyspark/sql/tests.py
--
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index a8ca386..87dbb50 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -575,6 +575,41 @@ class SQLTests(ReusedPySparkTestCase):
 _verify_type(PythonOnlyPoint(1.0, 2.0), PythonOnlyUDT())
 self.assertRaises(ValueError, lambda: _verify_type([1.0, 2.0], 
PythonOnlyUDT()))
 
+def test_simple_udt_in_df(self):
+schema = StructType().add("key", LongType()).add("val", 
PythonOnlyUDT())
+df = self.spark.createDataFrame(
+[(i % 3, PythonOnlyPoint(float(i), float(i))) for i in range(10)],
+schema=schema)
+df.show()
+
+def test_nested_udt_in_df(self):
+schema = StructType().add("key", LongType()).add("val", 
ArrayType(PythonOnlyUDT()))
+df = self.spark.createDataFrame(
+[(i % 3, [PythonOnlyPoint(float(i), float(i))]) for i in 
range(10)],
+schema=schema)
+df.collect()
+
+schema = StructType().add("key", LongType()).add("val",
+ MapType(LongType(), 
PythonOnlyUDT()))
+df = self.spark.createDataFrame(
+[(i % 3, {i % 3: PythonOnlyPoint(float(i + 1), float(i + 1))}) for 
i in range(10)],
+schema=schema)
+df.collect()
+
+def test_complex_nested_udt_in_df(self):
+from pyspark.sql.functions import udf
+
+schema = StructType().add("key", LongType()).add("val", 
PythonOnlyUDT())
+df = self.spark.createDataFrame(
+[(i % 3, PythonOnlyPoint(float(i), float(i))) for i in range(10)],
+schema=schema)
+df.collect()
+
+gd = df.groupby("key").agg({"val": "collect_list"})
+gd.collect()
+udf = udf(lambda k, v: [(k, v[0])], ArrayType(df.schema))
+gd.select(udf(*gd)).collect()
+

spark git commit: [SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs

2016-08-02 Thread davies
Repository: spark
Updated Branches:
  refs/heads/master 1dab63d8d -> 146001a9f


[SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs

## What changes were proposed in this pull request?

There are two related bugs of Python-only UDTs. Because the test case of second 
one needs the first fix too. I put them into one PR. If it is not appropriate, 
please let me know.

### First bug: When MapObjects works on Python-only UDTs

`RowEncoder` will use `PythonUserDefinedType.sqlType` for its deserializer 
expression. If the sql type is `ArrayType`, we will have `MapObjects` working 
on it. But `MapObjects` doesn't consider `PythonUserDefinedType` as its input 
data type. It causes error like:

import pyspark.sql.group
from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
from pyspark.sql.types import *

schema = StructType().add("key", LongType()).add("val", PythonOnlyUDT())
df = spark.createDataFrame([(i % 3, PythonOnlyPoint(float(i), float(i))) 
for i in range(10)], schema=schema)
df.show()

File "/home/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 
312, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while 
calling o36.showString.
: java.lang.RuntimeException: Error while decoding: scala.MatchError: 
org.apache.spark.sql.types.PythonUserDefinedTypef4ceede8 (of class 
org.apache.spark.sql.types.PythonUserDefinedType)
...

### Second bug: When Python-only UDTs is the element type of ArrayType

import pyspark.sql.group
from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
from pyspark.sql.types import *

schema = StructType().add("key", LongType()).add("val", 
ArrayType(PythonOnlyUDT()))
df = spark.createDataFrame([(i % 3, [PythonOnlyPoint(float(i), float(i))]) 
for i in range(10)], schema=schema)
df.show()

## How was this patch tested?
PySpark's sql tests.

Author: Liang-Chi Hsieh 

Closes #13778 from viirya/fix-pyudt.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/146001a9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/146001a9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/146001a9

Branch: refs/heads/master
Commit: 146001a9ffefc7aaedd3d888d68c7a9b80bca545
Parents: 1dab63d
Author: Liang-Chi Hsieh 
Authored: Tue Aug 2 10:08:18 2016 -0700
Committer: Davies Liu 
Committed: Tue Aug 2 10:08:18 2016 -0700

--
 python/pyspark/sql/tests.py | 35 
 .../sql/catalyst/encoders/RowEncoder.scala  |  9 -
 .../catalyst/expressions/objects/objects.scala  | 17 --
 3 files changed, 58 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/146001a9/python/pyspark/sql/tests.py
--
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index a8ca386..87dbb50 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -575,6 +575,41 @@ class SQLTests(ReusedPySparkTestCase):
 _verify_type(PythonOnlyPoint(1.0, 2.0), PythonOnlyUDT())
 self.assertRaises(ValueError, lambda: _verify_type([1.0, 2.0], 
PythonOnlyUDT()))
 
+def test_simple_udt_in_df(self):
+schema = StructType().add("key", LongType()).add("val", 
PythonOnlyUDT())
+df = self.spark.createDataFrame(
+[(i % 3, PythonOnlyPoint(float(i), float(i))) for i in range(10)],
+schema=schema)
+df.show()
+
+def test_nested_udt_in_df(self):
+schema = StructType().add("key", LongType()).add("val", 
ArrayType(PythonOnlyUDT()))
+df = self.spark.createDataFrame(
+[(i % 3, [PythonOnlyPoint(float(i), float(i))]) for i in 
range(10)],
+schema=schema)
+df.collect()
+
+schema = StructType().add("key", LongType()).add("val",
+ MapType(LongType(), 
PythonOnlyUDT()))
+df = self.spark.createDataFrame(
+[(i % 3, {i % 3: PythonOnlyPoint(float(i + 1), float(i + 1))}) for 
i in range(10)],
+schema=schema)
+df.collect()
+
+def test_complex_nested_udt_in_df(self):
+from pyspark.sql.functions import udf
+
+schema = StructType().add("key", LongType()).add("val", 
PythonOnlyUDT())
+df = self.spark.createDataFrame(
+[(i % 3, PythonOnlyPoint(float(i), float(i))) for i in range(10)],
+schema=schema)
+df.collect()
+
+gd = df.groupby("key").agg({"val": "collect_list"})
+gd.collect()
+udf = udf(lambda k, v: [(k, v[0])], ArrayType(df.schema))
+gd.select(udf(*gd)).collect()
+
 def test_udt_with_none(self):
 df = self.spark.range(0, 10, 1, 1)
 

http://git-wip-us.apache.or

spark git commit: [SPARK-16837][SQL] TimeWindow incorrectly drops slideDuration in constructors

2016-08-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 fc18e259a -> 22f0899bc


[SPARK-16837][SQL] TimeWindow incorrectly drops slideDuration in constructors

## What changes were proposed in this pull request?

Fix of incorrect arguments (dropping slideDuration and using windowDuration) in 
constructors for TimeWindow.

The JIRA this addresses is here: 
https://issues.apache.org/jira/browse/SPARK-16837

## How was this patch tested?

Added a test to TimeWindowSuite to check that the results of TimeWindow object 
apply and TimeWindow class constructor are equivalent.

Author: Tom Magrino 

Closes #14441 from tmagrino/windowing-fix.

(cherry picked from commit 1dab63d8d3c59a3d6b4ee8e777810c44849e58b8)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/22f0899b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/22f0899b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/22f0899b

Branch: refs/heads/branch-2.0
Commit: 22f0899bc78e1f2021084c6397a4c05ad6317bae
Parents: fc18e25
Author: Tom Magrino 
Authored: Tue Aug 2 09:16:44 2016 -0700
Committer: Sean Owen 
Committed: Tue Aug 2 09:16:50 2016 -0700

--
 .../spark/sql/catalyst/expressions/TimeWindow.scala |  4 ++--
 .../sql/catalyst/expressions/TimeWindowSuite.scala  | 12 
 2 files changed, 14 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/22f0899b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
index 66c4bf2..7ff61ee 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
@@ -45,12 +45,12 @@ case class TimeWindow(
   slideDuration: Expression,
   startTime: Expression) = {
 this(timeColumn, TimeWindow.parseExpression(windowDuration),
-  TimeWindow.parseExpression(windowDuration), 
TimeWindow.parseExpression(startTime))
+  TimeWindow.parseExpression(slideDuration), 
TimeWindow.parseExpression(startTime))
   }
 
   def this(timeColumn: Expression, windowDuration: Expression, slideDuration: 
Expression) = {
 this(timeColumn, TimeWindow.parseExpression(windowDuration),
-  TimeWindow.parseExpression(windowDuration), 0)
+  TimeWindow.parseExpression(slideDuration), 0)
   }
 
   def this(timeColumn: Expression, windowDuration: Expression) = {

http://git-wip-us.apache.org/repos/asf/spark/blob/22f0899b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala
index b82cf8d..d6c8fcf 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala
@@ -108,4 +108,16 @@ class TimeWindowSuite extends SparkFunSuite with 
ExpressionEvalHelper with Priva
   TimeWindow.invokePrivate(parseExpression(Rand(123)))
 }
   }
+
+  test("SPARK-16837: TimeWindow.apply equivalent to TimeWindow constructor") {
+val slideLength = "1 second"
+for (windowLength <- Seq("10 second", "1 minute", "2 hours")) {
+  val applyValue = TimeWindow(Literal(10L), windowLength, slideLength, "0 
seconds")
+  val constructed = new TimeWindow(Literal(10L),
+Literal(windowLength),
+Literal(slideLength),
+Literal("0 seconds"))
+  assert(applyValue == constructed)
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16837][SQL] TimeWindow incorrectly drops slideDuration in constructors

2016-08-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 36827ddaf -> 1dab63d8d


[SPARK-16837][SQL] TimeWindow incorrectly drops slideDuration in constructors

## What changes were proposed in this pull request?

Fix of incorrect arguments (dropping slideDuration and using windowDuration) in 
constructors for TimeWindow.

The JIRA this addresses is here: 
https://issues.apache.org/jira/browse/SPARK-16837

## How was this patch tested?

Added a test to TimeWindowSuite to check that the results of TimeWindow object 
apply and TimeWindow class constructor are equivalent.

Author: Tom Magrino 

Closes #14441 from tmagrino/windowing-fix.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1dab63d8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1dab63d8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1dab63d8

Branch: refs/heads/master
Commit: 1dab63d8d3c59a3d6b4ee8e777810c44849e58b8
Parents: 36827dd
Author: Tom Magrino 
Authored: Tue Aug 2 09:16:44 2016 -0700
Committer: Sean Owen 
Committed: Tue Aug 2 09:16:44 2016 -0700

--
 .../spark/sql/catalyst/expressions/TimeWindow.scala |  4 ++--
 .../sql/catalyst/expressions/TimeWindowSuite.scala  | 12 
 2 files changed, 14 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1dab63d8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
index 66c4bf2..7ff61ee 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala
@@ -45,12 +45,12 @@ case class TimeWindow(
   slideDuration: Expression,
   startTime: Expression) = {
 this(timeColumn, TimeWindow.parseExpression(windowDuration),
-  TimeWindow.parseExpression(windowDuration), 
TimeWindow.parseExpression(startTime))
+  TimeWindow.parseExpression(slideDuration), 
TimeWindow.parseExpression(startTime))
   }
 
   def this(timeColumn: Expression, windowDuration: Expression, slideDuration: 
Expression) = {
 this(timeColumn, TimeWindow.parseExpression(windowDuration),
-  TimeWindow.parseExpression(windowDuration), 0)
+  TimeWindow.parseExpression(slideDuration), 0)
   }
 
   def this(timeColumn: Expression, windowDuration: Expression) = {

http://git-wip-us.apache.org/repos/asf/spark/blob/1dab63d8/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala
index b82cf8d..d6c8fcf 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala
@@ -108,4 +108,16 @@ class TimeWindowSuite extends SparkFunSuite with 
ExpressionEvalHelper with Priva
   TimeWindow.invokePrivate(parseExpression(Rand(123)))
 }
   }
+
+  test("SPARK-16837: TimeWindow.apply equivalent to TimeWindow constructor") {
+val slideLength = "1 second"
+for (windowLength <- Seq("10 second", "1 minute", "2 hours")) {
+  val applyValue = TimeWindow(Literal(10L), windowLength, slideLength, "0 
seconds")
+  val constructed = new TimeWindow(Literal(10L),
+Literal(windowLength),
+Literal(slideLength),
+Literal("0 seconds"))
+  assert(applyValue == constructed)
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16822][DOC] Support latex in scaladoc.

2016-08-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 511dede11 -> 36827ddaf


[SPARK-16822][DOC] Support latex in scaladoc.

## What changes were proposed in this pull request?

Support using latex in scaladoc by adding MathJax javascript to the js template.

## How was this patch tested?

Generated scaladoc.  Preview:

- LogisticGradient: 
[before](https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.mllib.optimization.LogisticGradient)
 and 
[after](https://sparkdocs.lins05.pw/spark-16822/api/scala/index.html#org.apache.spark.mllib.optimization.LogisticGradient)

- MinMaxScaler: 
[before](https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.ml.feature.MinMaxScaler)
 and 
[after](https://sparkdocs.lins05.pw/spark-16822/api/scala/index.html#org.apache.spark.ml.feature.MinMaxScaler)

Author: Shuai Lin 

Closes #14438 from lins05/spark-16822-support-latex-in-scaladoc.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/36827dda
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/36827dda
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/36827dda

Branch: refs/heads/master
Commit: 36827ddafeaa7a683362eb8da31065aaff9676d5
Parents: 511dede
Author: Shuai Lin 
Authored: Tue Aug 2 09:14:08 2016 -0700
Committer: Sean Owen 
Committed: Tue Aug 2 09:14:08 2016 -0700

--
 docs/js/api-docs.js |  20 
 .../apache/spark/ml/feature/MinMaxScaler.scala  |  10 +-
 .../ml/regression/AFTSurvivalRegression.scala   |  94 +--
 .../spark/ml/regression/LinearRegression.scala  | 120 +--
 .../spark/mllib/clustering/LDAUtils.scala   |   2 +-
 .../mllib/evaluation/RegressionMetrics.scala|   2 +-
 .../spark/mllib/optimization/Gradient.scala |  94 +--
 7 files changed, 225 insertions(+), 117 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/36827dda/docs/js/api-docs.js
--
diff --git a/docs/js/api-docs.js b/docs/js/api-docs.js
index ce89d89..96c63cc 100644
--- a/docs/js/api-docs.js
+++ b/docs/js/api-docs.js
@@ -41,3 +41,23 @@ function addBadges(allAnnotations, name, tag, html) {
 .add(annotations.closest("div.fullcomment").prevAll("h4.signature"))
 .prepend(html);
 }
+
+$(document).ready(function() {
+  var script = document.createElement('script');
+  script.type = 'text/javascript';
+  script.async = true;
+  script.onload = function(){
+MathJax.Hub.Config({
+  displayAlign: "left",
+  tex2jax: {
+inlineMath: [ ["$", "$"], ["(",")"] ],
+displayMath: [ ["$$","$$"], ["\\[", "\\]"] ],
+processEscapes: true,
+skipTags: ['script', 'noscript', 'style', 'textarea', 'pre', 'a']
+  }
+});
+  };
+  script.src = ('https:' == document.location.protocol ? 'https://' : 
'http://') +
+
'cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+  document.getElementsByTagName('head')[0].appendChild(script);
+});

http://git-wip-us.apache.org/repos/asf/spark/blob/36827dda/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala
index 068f11a..9f3d2ca 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala
@@ -76,11 +76,15 @@ private[feature] trait MinMaxScalerParams extends Params 
with HasInputCol with H
 /**
  * Rescale each feature individually to a common range [min, max] linearly 
using column summary
  * statistics, which is also known as min-max normalization or Rescaling. The 
rescaled value for
- * feature E is calculated as,
+ * feature E is calculated as:
  *
- * `Rescaled(e_i) = \frac{e_i - E_{min}}{E_{max} - E_{min}} * (max - min) + 
min`
+ * 
+ *$$
+ *Rescaled(e_i) = \frac{e_i - E_{min}}{E_{max} - E_{min}} * (max - min) + 
min
+ *$$
+ * 
  *
- * For the case `E_{max} == E_{min}`, `Rescaled(e_i) = 0.5 * (max + min)`.
+ * For the case $E_{max} == E_{min}$, $Rescaled(e_i) = 0.5 * (max + min)$.
  * Note that since zero values will probably be transformed to non-zero 
values, output of the
  * transformer will be DenseVector even for sparse input.
  */

http://git-wip-us.apache.org/repos/asf/spark/blob/36827dda/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala
 
b/mllib/src/main/scala/org/apache/

spark git commit: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)

2016-08-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 1b2e6f636 -> 8a22275de


[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)

Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with 
Java 8 on Java 7

Compilation. Existing automatic tests

Author: Maciej Brynski 

Closes #14459 from maver1ck/spark-15541-master.

(cherry picked from commit 511dede1118f20a7756f614acb6fc88af52c9de9)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8a22275d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8a22275d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8a22275d

Branch: refs/heads/branch-1.6
Commit: 8a22275dea74cd79ecd59438fd88bebcae13c944
Parents: 1b2e6f6
Author: Maciej Brynski 
Authored: Tue Aug 2 08:07:08 2016 -0700
Committer: Sean Owen 
Committed: Tue Aug 2 08:46:20 2016 -0700

--
 .../main/scala/org/apache/spark/rpc/netty/Dispatcher.scala   | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8a22275d/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala 
b/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala
index 533c984..9364c7e 100644
--- a/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala
+++ b/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.rpc.netty
 
-import java.util.concurrent.{ThreadPoolExecutor, ConcurrentHashMap, 
LinkedBlockingQueue, TimeUnit}
+import java.util.concurrent.{ConcurrentHashMap, ConcurrentMap, 
LinkedBlockingQueue, ThreadPoolExecutor, TimeUnit}
 import javax.annotation.concurrent.GuardedBy
 
 import scala.collection.JavaConverters._
@@ -41,8 +41,10 @@ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv) 
extends Logging {
 val inbox = new Inbox(ref, endpoint)
   }
 
-  private val endpoints = new ConcurrentHashMap[String, EndpointData]
-  private val endpointRefs = new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef]
+  private val endpoints: ConcurrentMap[String, EndpointData] =
+new ConcurrentHashMap[String, EndpointData]
+  private val endpointRefs: ConcurrentMap[RpcEndpoint, RpcEndpointRef] =
+new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef]
 
   // Track the receivers whose inboxes may contain messages.
   private val receivers = new LinkedBlockingQueue[EndpointData]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)

2016-08-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master dd8514fa2 -> 511dede11


[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)

## What changes were proposed in this pull request?

Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with 
Java 8 on Java 7

## How was this patch tested?

Compilation. Existing automatic tests

Author: Maciej Brynski 

Closes #14459 from maver1ck/spark-15541-master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/511dede1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/511dede1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/511dede1

Branch: refs/heads/master
Commit: 511dede1118f20a7756f614acb6fc88af52c9de9
Parents: dd8514f
Author: Maciej Brynski 
Authored: Tue Aug 2 08:07:08 2016 -0700
Committer: Sean Owen 
Committed: Tue Aug 2 08:07:08 2016 -0700

--
 .../main/scala/org/apache/spark/rpc/netty/Dispatcher.scala   | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/511dede1/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala 
b/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala
index d305de2..a02cf30 100644
--- a/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala
+++ b/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.rpc.netty
 
-import java.util.concurrent.{ConcurrentHashMap, LinkedBlockingQueue, 
ThreadPoolExecutor, TimeUnit}
+import java.util.concurrent.{ConcurrentHashMap, ConcurrentMap, 
LinkedBlockingQueue, ThreadPoolExecutor, TimeUnit}
 import javax.annotation.concurrent.GuardedBy
 
 import scala.collection.JavaConverters._
@@ -42,8 +42,10 @@ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv) 
extends Logging {
 val inbox = new Inbox(ref, endpoint)
   }
 
-  private val endpoints = new ConcurrentHashMap[String, EndpointData]
-  private val endpointRefs = new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef]
+  private val endpoints: ConcurrentMap[String, EndpointData] =
+new ConcurrentHashMap[String, EndpointData]
+  private val endpointRefs: ConcurrentMap[RpcEndpoint, RpcEndpointRef] =
+new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef]
 
   // Track the receivers whose inboxes may contain messages.
   private val receivers = new LinkedBlockingQueue[EndpointData]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)

2016-08-02 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 c5516ab60 -> fc18e259a


[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)

## What changes were proposed in this pull request?

Casting ConcurrentHashMap to ConcurrentMap allows to run code compiled with 
Java 8 on Java 7

## How was this patch tested?

Compilation. Existing automatic tests

Author: Maciej Brynski 

Closes #14459 from maver1ck/spark-15541-master.

(cherry picked from commit 511dede1118f20a7756f614acb6fc88af52c9de9)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fc18e259
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fc18e259
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fc18e259

Branch: refs/heads/branch-2.0
Commit: fc18e259a311c0f1dffe47edef0e42182afca8e9
Parents: c5516ab
Author: Maciej Brynski 
Authored: Tue Aug 2 08:07:08 2016 -0700
Committer: Sean Owen 
Committed: Tue Aug 2 08:07:18 2016 -0700

--
 .../main/scala/org/apache/spark/rpc/netty/Dispatcher.scala   | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fc18e259/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala 
b/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala
index d305de2..a02cf30 100644
--- a/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala
+++ b/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.rpc.netty
 
-import java.util.concurrent.{ConcurrentHashMap, LinkedBlockingQueue, 
ThreadPoolExecutor, TimeUnit}
+import java.util.concurrent.{ConcurrentHashMap, ConcurrentMap, 
LinkedBlockingQueue, ThreadPoolExecutor, TimeUnit}
 import javax.annotation.concurrent.GuardedBy
 
 import scala.collection.JavaConverters._
@@ -42,8 +42,10 @@ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv) 
extends Logging {
 val inbox = new Inbox(ref, endpoint)
   }
 
-  private val endpoints = new ConcurrentHashMap[String, EndpointData]
-  private val endpointRefs = new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef]
+  private val endpoints: ConcurrentMap[String, EndpointData] =
+new ConcurrentHashMap[String, EndpointData]
+  private val endpointRefs: ConcurrentMap[RpcEndpoint, RpcEndpointRef] =
+new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef]
 
   // Track the receivers whose inboxes may contain messages.
   private val receivers = new LinkedBlockingQueue[EndpointData]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector

2016-08-02 Thread yliang
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 9d9956e8f -> c5516ab60


[SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector 
instead of MLlib Vector

## What changes were proposed in this pull request?

mllib.LDAExample uses ML pipeline and MLlib LDA algorithm. The former 
transforms original data into MLVector format, while the latter uses 
MLlibVector format.

## How was this patch tested?

Test manually.

Author: Xusen Yin 

Closes #14212 from yinxusen/SPARK-16558.

(cherry picked from commit dd8514fa2059a695143073f852b1abee50e522bd)
Signed-off-by: Yanbo Liang 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c5516ab6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c5516ab6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c5516ab6

Branch: refs/heads/branch-2.0
Commit: c5516ab60da860320693bbc245818cb6d8a282c8
Parents: 9d9956e
Author: Xusen Yin 
Authored: Tue Aug 2 07:28:46 2016 -0700
Committer: Yanbo Liang 
Committed: Tue Aug 2 07:31:32 2016 -0700

--
 .../main/scala/org/apache/spark/examples/mllib/LDAExample.scala | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c5516ab6/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala 
b/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
index 3fbf8e0..ef67841 100644
--- a/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
+++ b/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
@@ -24,8 +24,9 @@ import scopt.OptionParser
 import org.apache.spark.{SparkConf, SparkContext}
 import org.apache.spark.ml.Pipeline
 import org.apache.spark.ml.feature.{CountVectorizer, CountVectorizerModel, 
RegexTokenizer, StopWordsRemover}
+import org.apache.spark.ml.linalg.{Vector => MLVector}
 import org.apache.spark.mllib.clustering.{DistributedLDAModel, EMLDAOptimizer, 
LDA, OnlineLDAOptimizer}
-import org.apache.spark.mllib.linalg.Vector
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.{Row, SparkSession}
 
@@ -225,7 +226,7 @@ object LDAExample {
 val documents = model.transform(df)
   .select("features")
   .rdd
-  .map { case Row(features: Vector) => features }
+  .map { case Row(features: MLVector) => Vectors.fromML(features) }
   .zipWithIndex()
   .map(_.swap)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector

2016-08-02 Thread yliang
Repository: spark
Updated Branches:
  refs/heads/master d9e0919d3 -> dd8514fa2


[SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector 
instead of MLlib Vector

## What changes were proposed in this pull request?

mllib.LDAExample uses ML pipeline and MLlib LDA algorithm. The former 
transforms original data into MLVector format, while the latter uses 
MLlibVector format.

## How was this patch tested?

Test manually.

Author: Xusen Yin 

Closes #14212 from yinxusen/SPARK-16558.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dd8514fa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dd8514fa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dd8514fa

Branch: refs/heads/master
Commit: dd8514fa2059a695143073f852b1abee50e522bd
Parents: d9e0919
Author: Xusen Yin 
Authored: Tue Aug 2 07:28:46 2016 -0700
Committer: Yanbo Liang 
Committed: Tue Aug 2 07:28:46 2016 -0700

--
 .../main/scala/org/apache/spark/examples/mllib/LDAExample.scala | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dd8514fa/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala 
b/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
index 7e50b12..b923e62 100644
--- a/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
+++ b/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
@@ -24,8 +24,9 @@ import scopt.OptionParser
 import org.apache.spark.{SparkConf, SparkContext}
 import org.apache.spark.ml.Pipeline
 import org.apache.spark.ml.feature.{CountVectorizer, CountVectorizerModel, 
RegexTokenizer, StopWordsRemover}
+import org.apache.spark.ml.linalg.{Vector => MLVector}
 import org.apache.spark.mllib.clustering.{DistributedLDAModel, EMLDAOptimizer, 
LDA, OnlineLDAOptimizer}
-import org.apache.spark.mllib.linalg.Vector
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.{Row, SparkSession}
 
@@ -223,7 +224,7 @@ object LDAExample {
 val documents = model.transform(df)
   .select("features")
   .rdd
-  .map { case Row(features: Vector) => features }
+  .map { case Row(features: MLVector) => Vectors.fromML(features) }
   .zipWithIndex()
   .map(_.swap)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16851][ML] Incorrect threshould length in 'setThresholds()' evoke Exception

2016-08-02 Thread yliang
Repository: spark
Updated Branches:
  refs/heads/master a1ff72e1c -> d9e0919d3


[SPARK-16851][ML] Incorrect threshould length in 'setThresholds()' evoke 
Exception

## What changes were proposed in this pull request?
Add a length checking for threshoulds' length in method `setThreshoulds()`  of 
classification models.

## How was this patch tested?
unit tests

Author: Zheng RuiFeng 

Closes #14457 from zhengruifeng/check_setThresholds.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d9e0919d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d9e0919d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d9e0919d

Branch: refs/heads/master
Commit: d9e0919d30e9f79a0eb1ceb8d1b5e9fc58cf085e
Parents: a1ff72e
Author: Zheng RuiFeng 
Authored: Tue Aug 2 07:22:41 2016 -0700
Committer: Yanbo Liang 
Committed: Tue Aug 2 07:22:41 2016 -0700

--
 .../spark/ml/classification/ProbabilisticClassifier.scala | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d9e0919d/mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala
index 88642ab..19df8f7 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala
@@ -83,7 +83,12 @@ abstract class ProbabilisticClassificationModel[
   def setProbabilityCol(value: String): M = set(probabilityCol, 
value).asInstanceOf[M]
 
   /** @group setParam */
-  def setThresholds(value: Array[Double]): M = set(thresholds, 
value).asInstanceOf[M]
+  def setThresholds(value: Array[Double]): M = {
+require(value.length == numClasses, this.getClass.getSimpleName +
+  ".setThresholds() called with non-matching numClasses and 
thresholds.length." +
+  s" numClasses=$numClasses, but thresholds has length ${value.length}")
+set(thresholds, value).asInstanceOf[M]
+  }
 
   /**
* Transforms dataset by reading from [[featuresCol]], and appending new 
columns as specified by


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16850][SQL] Improve type checking error message for greatest/least

2016-08-02 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/master 10e1c0e63 -> a1ff72e1c


[SPARK-16850][SQL] Improve type checking error message for greatest/least

## What changes were proposed in this pull request?
Greatest/least function does not have the most friendly error message for data 
types. This patch improves the error message to not show the Seq type, and use 
more human readable data types.

Before:
```
org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS 
DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all 
have the same type, got GREATEST (ArrayBuffer(DecimalType(2,1), StringType)).; 
line 1 pos 7
```

After:
```
org.apache.spark.sql.AnalysisException: cannot resolve 'greatest(CAST(1.0 AS 
DECIMAL(2,1)), "1.0")' due to data type mismatch: The expressions should all 
have the same type, got GREATEST(decimal(2,1), string).; line 1 pos 7
```

## How was this patch tested?
Manually verified the output and also added unit tests to 
ConditionalExpressionSuite.

Author: petermaxlee 

Closes #14453 from petermaxlee/SPARK-16850.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a1ff72e1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a1ff72e1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a1ff72e1

Branch: refs/heads/master
Commit: a1ff72e1cce6f22249ccc4905e8cef30075beb2f
Parents: 10e1c0e
Author: petermaxlee 
Authored: Tue Aug 2 19:32:35 2016 +0800
Committer: Wenchen Fan 
Committed: Tue Aug 2 19:32:35 2016 +0800

--
 .../catalyst/expressions/conditionalExpressions.scala  |  4 ++--
 .../expressions/ConditionalExpressionSuite.scala   | 13 +
 2 files changed, 15 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a1ff72e1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
index e97e089..5f2585f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
@@ -299,7 +299,7 @@ case class Least(children: Seq[Expression]) extends 
Expression {
 } else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) {
   TypeCheckResult.TypeCheckFailure(
 s"The expressions should all have the same type," +
-  s" got LEAST (${children.map(_.dataType)}).")
+  s" got LEAST(${children.map(_.dataType.simpleString).mkString(", 
")}).")
 } else {
   TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName)
 }
@@ -359,7 +359,7 @@ case class Greatest(children: Seq[Expression]) extends 
Expression {
 } else if (children.map(_.dataType).distinct.count(_ != NullType) > 1) {
   TypeCheckResult.TypeCheckFailure(
 s"The expressions should all have the same type," +
-  s" got GREATEST (${children.map(_.dataType)}).")
+  s" got GREATEST(${children.map(_.dataType.simpleString).mkString(", 
")}).")
 } else {
   TypeUtils.checkForOrderingExpr(dataType, "function " + prettyName)
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/a1ff72e1/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
index 3c581ec..36185b8 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
@@ -21,6 +21,7 @@ import java.sql.{Date, Timestamp}
 
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.TypeCheckFailure
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.types._
 
@@ -181,6 +182,12 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 Literal(Timestamp.valueOf("2015-07-01 10:00:00",
   Timestamp.valueOf("2015-07-01 08:00:00"), InternalRow.empty)
 
+// Type checking error
+assert(
+  Least(Seq(Literal(1), Literal("1"))).checkInputDataTypes() ==
+TypeChec

spark git commit: [SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings

2016-08-02 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 5fbf5f93e -> 9d9956e8f


[SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings

## What changes were proposed in this pull request?

This PR makes various minor updates to examples of all language bindings to 
make sure they are consistent with each other. Some typos and missing parts 
(JDBC example in Scala/Java/Python) are also fixed.

## How was this patch tested?

Manually tested.

Author: Cheng Lian 

Closes #14368 from liancheng/revise-examples.

(cherry picked from commit 10e1c0e638774f5d746771b6dd251de2480f94eb)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9d9956e8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9d9956e8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9d9956e8

Branch: refs/heads/branch-2.0
Commit: 9d9956e8f8abd41a603fde2347384428b7ec715c
Parents: 5fbf5f9
Author: Cheng Lian 
Authored: Tue Aug 2 15:02:40 2016 +0800
Committer: Wenchen Fan 
Committed: Tue Aug 2 15:05:13 2016 +0800

--
 docs/sql-programming-guide.md   |  56 +++--
 .../examples/sql/JavaSQLDataSourceExample.java  |  23 +++-
 .../spark/examples/sql/JavaSparkSQLExample.java |   2 +-
 examples/src/main/python/sql/basic.py   |   2 +-
 examples/src/main/python/sql/datasource.py  |  32 --
 examples/src/main/python/sql/hive.py|   2 +-
 examples/src/main/r/RSparkSQLExample.R  | 113 ++-
 .../examples/sql/SQLDataSourceExample.scala |  22 +++-
 .../spark/examples/sql/SparkSQLExample.scala|   2 +-
 9 files changed, 137 insertions(+), 117 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9d9956e8/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 33b170e..82b03a2 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -132,7 +132,7 @@ from a Hive table, or from [Spark data 
sources](#data-sources).
 
 As an example, the following creates a DataFrame based on the content of a 
JSON file:
 
-{% include_example create_DataFrames r/RSparkSQLExample.R %}
+{% include_example create_df r/RSparkSQLExample.R %}
 
 
 
@@ -180,7 +180,7 @@ In addition to simple column references and expressions, 
DataFrames also have a
 
 
 
-{% include_example dataframe_operations r/RSparkSQLExample.R %}
+{% include_example untyped_ops r/RSparkSQLExample.R %}
 
 For a complete list of the types of operations that can be performed on a 
DataFrame refer to the [API Documentation](api/R/index.html).
 
@@ -214,7 +214,7 @@ The `sql` function on a `SparkSession` enables applications 
to run SQL queries p
 
 The `sql` function enables applications to run SQL queries programmatically 
and returns the result as a `SparkDataFrame`.
 
-{% include_example sql_query r/RSparkSQLExample.R %}
+{% include_example run_sql r/RSparkSQLExample.R %}
 
 
 
@@ -377,7 +377,7 @@ In the simplest form, the default data source (`parquet` 
unless otherwise config
 
 
 
-{% include_example source_parquet r/RSparkSQLExample.R %}
+{% include_example generic_load_save_functions r/RSparkSQLExample.R %}
 
 
 
@@ -400,13 +400,11 @@ using this syntax.
 
 
 
-
 {% include_example manual_load_options python/sql/datasource.py %}
 
-
-
-{% include_example source_json r/RSparkSQLExample.R %}
 
+
+{% include_example manual_load_options r/RSparkSQLExample.R %}
 
 
 
@@ -425,13 +423,11 @@ file directly with SQL.
 
 
 
-
 {% include_example direct_sql python/sql/datasource.py %}
 
 
 
-
-{% include_example direct_query r/RSparkSQLExample.R %}
+{% include_example direct_sql r/RSparkSQLExample.R %}
 
 
 
@@ -523,7 +519,7 @@ Using the data from the above example:
 
 
 
-{% include_example load_programmatically r/RSparkSQLExample.R %}
+{% include_example basic_parquet_example r/RSparkSQLExample.R %}
 
 
 
@@ -827,7 +823,7 @@ Note that the file that is offered as _a json file_ is not 
a typical JSON file.
 line must contain a separate, self-contained valid JSON object. As a 
consequence,
 a regular multi-line JSON file will most often fail.
 
-{% include_example load_json_file r/RSparkSQLExample.R %}
+{% include_example json_dataset r/RSparkSQLExample.R %}
 
 
 
@@ -913,7 +909,7 @@ You may need to grant write privilege to the user who 
starts the spark applicati
 When working with Hive one must instantiate `SparkSession` with Hive support. 
This
 adds support for finding tables in the MetaStore and writing queries using 
HiveQL.
 
-{% include_example hive_table r/RSparkSQLExample.R %}
+{% include_example spark_hive r/RSparkSQLExample.R %}
 
 
 
@@ -1055,43 +1051,19 @@ the Data Sources API. The following options are 
supported:
 
 
 
-
-

spark git commit: [SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings

2016-08-02 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/master 5184df06b -> 10e1c0e63


[SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings

## What changes were proposed in this pull request?

This PR makes various minor updates to examples of all language bindings to 
make sure they are consistent with each other. Some typos and missing parts 
(JDBC example in Scala/Java/Python) are also fixed.

## How was this patch tested?

Manually tested.

Author: Cheng Lian 

Closes #14368 from liancheng/revise-examples.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/10e1c0e6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/10e1c0e6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/10e1c0e6

Branch: refs/heads/master
Commit: 10e1c0e638774f5d746771b6dd251de2480f94eb
Parents: 5184df0
Author: Cheng Lian 
Authored: Tue Aug 2 15:02:40 2016 +0800
Committer: Wenchen Fan 
Committed: Tue Aug 2 15:02:40 2016 +0800

--
 docs/sql-programming-guide.md   |  56 +++--
 .../examples/sql/JavaSQLDataSourceExample.java  |  23 +++-
 .../spark/examples/sql/JavaSparkSQLExample.java |   2 +-
 examples/src/main/python/sql/basic.py   |   2 +-
 examples/src/main/python/sql/datasource.py  |  32 --
 examples/src/main/python/sql/hive.py|   2 +-
 examples/src/main/r/RSparkSQLExample.R  | 113 ++-
 .../examples/sql/SQLDataSourceExample.scala |  22 +++-
 .../spark/examples/sql/SparkSQLExample.scala|   2 +-
 9 files changed, 137 insertions(+), 117 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/10e1c0e6/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index d8c8698..5877f2b 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -132,7 +132,7 @@ from a Hive table, or from [Spark data 
sources](#data-sources).
 
 As an example, the following creates a DataFrame based on the content of a 
JSON file:
 
-{% include_example create_DataFrames r/RSparkSQLExample.R %}
+{% include_example create_df r/RSparkSQLExample.R %}
 
 
 
@@ -180,7 +180,7 @@ In addition to simple column references and expressions, 
DataFrames also have a
 
 
 
-{% include_example dataframe_operations r/RSparkSQLExample.R %}
+{% include_example untyped_ops r/RSparkSQLExample.R %}
 
 For a complete list of the types of operations that can be performed on a 
DataFrame refer to the [API Documentation](api/R/index.html).
 
@@ -214,7 +214,7 @@ The `sql` function on a `SparkSession` enables applications 
to run SQL queries p
 
 The `sql` function enables applications to run SQL queries programmatically 
and returns the result as a `SparkDataFrame`.
 
-{% include_example sql_query r/RSparkSQLExample.R %}
+{% include_example run_sql r/RSparkSQLExample.R %}
 
 
 
@@ -377,7 +377,7 @@ In the simplest form, the default data source (`parquet` 
unless otherwise config
 
 
 
-{% include_example source_parquet r/RSparkSQLExample.R %}
+{% include_example generic_load_save_functions r/RSparkSQLExample.R %}
 
 
 
@@ -400,13 +400,11 @@ using this syntax.
 
 
 
-
 {% include_example manual_load_options python/sql/datasource.py %}
 
-
-
-{% include_example source_json r/RSparkSQLExample.R %}
 
+
+{% include_example manual_load_options r/RSparkSQLExample.R %}
 
 
 
@@ -425,13 +423,11 @@ file directly with SQL.
 
 
 
-
 {% include_example direct_sql python/sql/datasource.py %}
 
 
 
-
-{% include_example direct_query r/RSparkSQLExample.R %}
+{% include_example direct_sql r/RSparkSQLExample.R %}
 
 
 
@@ -523,7 +519,7 @@ Using the data from the above example:
 
 
 
-{% include_example load_programmatically r/RSparkSQLExample.R %}
+{% include_example basic_parquet_example r/RSparkSQLExample.R %}
 
 
 
@@ -839,7 +835,7 @@ Note that the file that is offered as _a json file_ is not 
a typical JSON file.
 line must contain a separate, self-contained valid JSON object. As a 
consequence,
 a regular multi-line JSON file will most often fail.
 
-{% include_example load_json_file r/RSparkSQLExample.R %}
+{% include_example json_dataset r/RSparkSQLExample.R %}
 
 
 
@@ -925,7 +921,7 @@ You may need to grant write privilege to the user who 
starts the spark applicati
 When working with Hive one must instantiate `SparkSession` with Hive support. 
This
 adds support for finding tables in the MetaStore and writing queries using 
HiveQL.
 
-{% include_example hive_table r/RSparkSQLExample.R %}
+{% include_example spark_hive r/RSparkSQLExample.R %}
 
 
 
@@ -1067,43 +1063,19 @@ the Data Sources API. The following options are 
supported:
 
 
 
-
-{% highlight scala %}
-val jdbcDF = spark.read.format("jdbc").options(
-  Map("url" -> "jdbc:postgresql:db