[spark] branch branch-3.0 updated (233dc12 -> 6f55ed4)

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 233dc12  [SPARK-31290][R] Add back the deprecated R APIs
 add 6f55ed4  [SPARK-31318][SQL] Split Parquet/Avro configs for rebasing 
dates/timestamps in read and in write

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/avro/AvroDeserializer.scala   |  3 +-
 .../org/apache/spark/sql/avro/AvroSerializer.scala |  3 +-
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 23 +++--
 .../org/apache/spark/sql/internal/SQLConf.scala| 40 +-
 .../parquet/VectorizedColumnReader.java|  2 +-
 .../datasources/parquet/ParquetRowConverter.scala  |  2 +-
 .../datasources/parquet/ParquetWriteSupport.scala  |  3 +-
 .../benchmark/DateTimeRebaseBenchmark.scala|  4 +--
 .../datasources/parquet/ParquetIOSuite.scala   | 16 +
 9 files changed, 63 insertions(+), 33 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (dba525c -> c5323d2)

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from dba525c  [SPARK-31313][K8S][TEST] Add `m01` node name to support 
Minikube 1.8.x
 add c5323d2  [SPARK-31318][SQL] Split Parquet/Avro configs for rebasing 
dates/timestamps in read and in write

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/avro/AvroDeserializer.scala   |  3 +-
 .../org/apache/spark/sql/avro/AvroSerializer.scala |  3 +-
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 23 ++--
 .../org/apache/spark/sql/internal/SQLConf.scala| 42 +-
 .../parquet/VectorizedColumnReader.java|  2 +-
 .../datasources/parquet/ParquetRowConverter.scala  |  2 +-
 .../datasources/parquet/ParquetWriteSupport.scala  |  3 +-
 .../benchmark/DateTimeRebaseBenchmark.scala|  4 +--
 .../datasources/parquet/ParquetIOSuite.scala   | 16 +
 9 files changed, 65 insertions(+), 33 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (fd0b228 -> dba525c)

2020-03-31 Thread dbtsai
This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fd0b228  [SPARK-31290][R] Add back the deprecated R APIs
 add dba525c  [SPARK-31313][K8S][TEST] Add `m01` node name to support 
Minikube 1.8.x

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/k8s/integrationtest/PVTestsSuite.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated (e226f68 -> 22e0a5a)

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e226f68  [SPARK-31306][DOCS] update rand() function documentation to 
indicate exclusive upper bound
 add 22e0a5a  [SPARK-31312][SQL][2.4] Cache Class instance for the UDF 
instance in HiveFunctionWrapper

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/hive/HiveShim.scala |  19 ++-
 .../src/test/noclasspath/TestUDTF-spark-26560.jar  | Bin 7462 -> 0 bytes
 sql/hive/src/test/noclasspath/hive-test-udfs.jar   | Bin 0 -> 35660 bytes
 .../spark/sql/hive/HiveUDFDynamicLoadSuite.scala   | 190 +
 .../spark/sql/hive/execution/SQLQuerySuite.scala   |  47 -
 5 files changed, 204 insertions(+), 52 deletions(-)
 delete mode 100644 sql/hive/src/test/noclasspath/TestUDTF-spark-26560.jar
 create mode 100644 sql/hive/src/test/noclasspath/hive-test-udfs.jar
 create mode 100644 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUDFDynamicLoadSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31290][R] Add back the deprecated R APIs

2020-03-31 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 233dc12  [SPARK-31290][R] Add back the deprecated R APIs
233dc12 is described below

commit 233dc1260af6df7a8e9a689ba5c6fe3e81a5bc1f
Author: Huaxin Gao 
AuthorDate: Wed Apr 1 10:38:03 2020 +0900

[SPARK-31290][R] Add back the deprecated R APIs

### What changes were proposed in this pull request?
Add back the deprecated R APIs removed by 
https://github.com/apache/spark/pull/22843/ and 
https://github.com/apache/spark/pull/22815.

These APIs are

- `sparkR.init`
- `sparkRSQL.init`
- `sparkRHive.init`
- `registerTempTable`
- `createExternalTable`
- `dropTempTable`

No need to port the function such as
```r
createExternalTable <- function(x, ...) {
  dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, 
...)", x, ...)
}
```
because this was for the backward compatibility when SQLContext exists 
before assuming from https://github.com/apache/spark/pull/9192,  but seems we 
don't need it anymore since SparkR replaced SQLContext with Spark Session at 
https://github.com/apache/spark/pull/13635.

### Why are the changes needed?
Amend Spark's Semantic Versioning Policy

### Does this PR introduce any user-facing change?
Yes
The removed R APIs are put back.

### How was this patch tested?
Add back the removed tests

Closes #28058 from huaxingao/r.

Authored-by: Huaxin Gao 
Signed-off-by: HyukjinKwon 
(cherry picked from commit fd0b2281272daba590c6bb277688087d0b26053f)
Signed-off-by: HyukjinKwon 
---
 R/pkg/NAMESPACE   |  7 +++
 R/pkg/R/DataFrame.R   | 26 ++
 R/pkg/R/catalog.R | 54 +++
 R/pkg/R/generics.R|  3 ++
 R/pkg/R/sparkR.R  | 98 +++
 R/pkg/tests/fulltests/test_sparkSQL.R | 13 -
 docs/sparkr-migration-guide.md|  3 +-
 7 files changed, 200 insertions(+), 4 deletions(-)

diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index 7ed2e36..9fd7bb4 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -28,6 +28,7 @@ importFrom("utils", "download.file", "object.size", 
"packageVersion", "tail", "u
 
 # S3 methods exported
 export("sparkR.session")
+export("sparkR.init")
 export("sparkR.session.stop")
 export("sparkR.stop")
 export("sparkR.conf")
@@ -41,6 +42,9 @@ export("sparkR.callJStatic")
 
 export("install.spark")
 
+export("sparkRSQL.init",
+   "sparkRHive.init")
+
 # MLlib integration
 exportMethods("glm",
   "spark.glm",
@@ -148,6 +152,7 @@ exportMethods("arrange",
   "printSchema",
   "randomSplit",
   "rbind",
+  "registerTempTable",
   "rename",
   "repartition",
   "repartitionByRange",
@@ -420,8 +425,10 @@ export("as.DataFrame",
"cacheTable",
"clearCache",
"createDataFrame",
+   "createExternalTable",
"createTable",
"currentDatabase",
+   "dropTempTable",
"dropTempView",
"listColumns",
"listDatabases",
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 593d3ca..14d2076 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -521,6 +521,32 @@ setMethod("createOrReplaceTempView",
   invisible(callJMethod(x@sdf, "createOrReplaceTempView", 
viewName))
   })
 
+#' (Deprecated) Register Temporary Table
+#'
+#' Registers a SparkDataFrame as a Temporary Table in the SparkSession
+#' @param x A SparkDataFrame
+#' @param tableName A character vector containing the name of the table
+#'
+#' @seealso \link{createOrReplaceTempView}
+#' @rdname registerTempTable-deprecated
+#' @name registerTempTable
+#' @aliases registerTempTable,SparkDataFrame,character-method
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' path <- "path/to/file.json"
+#' df <- read.json(path)
+#' registerTempTable(df, "json_df")
+#' new_df <- sql("SELECT * FROM json_df")
+#'}
+#' @note registerTempTable since 1.4.0
+setMethod("registerTempTable",
+  signature(x = "SparkDataFrame", tableName = "character"),
+  function(x, tableName) {
+  .Deprecated("createOrReplaceTempView")
+  invisible(callJMethod(x@sdf, "createOrReplaceTempView", 
tableName))
+  })
+
 #' insertInto
 #'
 #' Insert the contents of a SparkDataFrame into a table registered in the 
current SparkSession.
diff --git a/R/pkg/R/catalog.R b/R/pkg/R/catalog.R
index 7641f8a..275737f 100644
--- a/R/pkg/R/catalog.R
+++ b/R/pkg/R/catalog.R
@@ -17,6 +17,35 @@
 
 # catalog.R: SparkSession catalog functions
 
+#' (Deprecated) Create an external table
+#'
+#' 

[spark] branch master updated: [SPARK-31290][R] Add back the deprecated R APIs

2020-03-31 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fd0b228  [SPARK-31290][R] Add back the deprecated R APIs
fd0b228 is described below

commit fd0b2281272daba590c6bb277688087d0b26053f
Author: Huaxin Gao 
AuthorDate: Wed Apr 1 10:38:03 2020 +0900

[SPARK-31290][R] Add back the deprecated R APIs

### What changes were proposed in this pull request?
Add back the deprecated R APIs removed by 
https://github.com/apache/spark/pull/22843/ and 
https://github.com/apache/spark/pull/22815.

These APIs are

- `sparkR.init`
- `sparkRSQL.init`
- `sparkRHive.init`
- `registerTempTable`
- `createExternalTable`
- `dropTempTable`

No need to port the function such as
```r
createExternalTable <- function(x, ...) {
  dispatchFunc("createExternalTable(tableName, path = NULL, source = NULL, 
...)", x, ...)
}
```
because this was for the backward compatibility when SQLContext exists 
before assuming from https://github.com/apache/spark/pull/9192,  but seems we 
don't need it anymore since SparkR replaced SQLContext with Spark Session at 
https://github.com/apache/spark/pull/13635.

### Why are the changes needed?
Amend Spark's Semantic Versioning Policy

### Does this PR introduce any user-facing change?
Yes
The removed R APIs are put back.

### How was this patch tested?
Add back the removed tests

Closes #28058 from huaxingao/r.

Authored-by: Huaxin Gao 
Signed-off-by: HyukjinKwon 
---
 R/pkg/NAMESPACE   |  7 +++
 R/pkg/R/DataFrame.R   | 26 ++
 R/pkg/R/catalog.R | 54 +++
 R/pkg/R/generics.R|  3 ++
 R/pkg/R/sparkR.R  | 98 +++
 R/pkg/tests/fulltests/test_sparkSQL.R | 13 -
 docs/sparkr-migration-guide.md|  3 +-
 7 files changed, 200 insertions(+), 4 deletions(-)

diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index 56eceb8..fb879e4 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -28,6 +28,7 @@ importFrom("utils", "download.file", "object.size", 
"packageVersion", "tail", "u
 
 # S3 methods exported
 export("sparkR.session")
+export("sparkR.init")
 export("sparkR.session.stop")
 export("sparkR.stop")
 export("sparkR.conf")
@@ -41,6 +42,9 @@ export("sparkR.callJStatic")
 
 export("install.spark")
 
+export("sparkRSQL.init",
+   "sparkRHive.init")
+
 # MLlib integration
 exportMethods("glm",
   "spark.glm",
@@ -148,6 +152,7 @@ exportMethods("arrange",
   "printSchema",
   "randomSplit",
   "rbind",
+  "registerTempTable",
   "rename",
   "repartition",
   "repartitionByRange",
@@ -431,8 +436,10 @@ export("as.DataFrame",
"cacheTable",
"clearCache",
"createDataFrame",
+   "createExternalTable",
"createTable",
"currentDatabase",
+   "dropTempTable",
"dropTempView",
"listColumns",
"listDatabases",
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 593d3ca..14d2076 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -521,6 +521,32 @@ setMethod("createOrReplaceTempView",
   invisible(callJMethod(x@sdf, "createOrReplaceTempView", 
viewName))
   })
 
+#' (Deprecated) Register Temporary Table
+#'
+#' Registers a SparkDataFrame as a Temporary Table in the SparkSession
+#' @param x A SparkDataFrame
+#' @param tableName A character vector containing the name of the table
+#'
+#' @seealso \link{createOrReplaceTempView}
+#' @rdname registerTempTable-deprecated
+#' @name registerTempTable
+#' @aliases registerTempTable,SparkDataFrame,character-method
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' path <- "path/to/file.json"
+#' df <- read.json(path)
+#' registerTempTable(df, "json_df")
+#' new_df <- sql("SELECT * FROM json_df")
+#'}
+#' @note registerTempTable since 1.4.0
+setMethod("registerTempTable",
+  signature(x = "SparkDataFrame", tableName = "character"),
+  function(x, tableName) {
+  .Deprecated("createOrReplaceTempView")
+  invisible(callJMethod(x@sdf, "createOrReplaceTempView", 
tableName))
+  })
+
 #' insertInto
 #'
 #' Insert the contents of a SparkDataFrame into a table registered in the 
current SparkSession.
diff --git a/R/pkg/R/catalog.R b/R/pkg/R/catalog.R
index 7641f8a..275737f 100644
--- a/R/pkg/R/catalog.R
+++ b/R/pkg/R/catalog.R
@@ -17,6 +17,35 @@
 
 # catalog.R: SparkSession catalog functions
 
+#' (Deprecated) Create an external table
+#'
+#' Creates an external table based on the dataset in a data source,
+#' Returns a SparkDataFrame associated with the 

[spark] branch master updated: [SPARK-31308][PYSPARK] Merging pyFiles to files argument for Non-PySpark applications

2020-03-31 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 20fc6fa  [SPARK-31308][PYSPARK] Merging pyFiles to files argument for 
Non-PySpark applications
20fc6fa is described below

commit 20fc6fa8398b9dc47b9ae7df52133a306f89b25f
Author: Liang-Chi Hsieh 
AuthorDate: Tue Mar 31 18:08:55 2020 -0700

[SPARK-31308][PYSPARK] Merging pyFiles to files argument for Non-PySpark 
applications

### What changes were proposed in this pull request?

This PR (SPARK-31308) proposed to add python dependencies even it is not 
Python applications.

### Why are the changes needed?

For now, we add `pyFiles` argument to `files` argument only for Python 
applications, in SparkSubmit. Like the reason in #21420, "for some Spark 
applications, though they're a java program, they require not only jar 
dependencies, but also python dependencies.", we need to add `pyFiles` to 
`files` even it is not Python applications.

### Does this PR introduce any user-facing change?

Yes. After this change, for non-PySpark applications, the Python files 
specified by `pyFiles` are also added to `files` like PySpark applications.

### How was this patch tested?

Manually test on jupyter notebook or do `spark-submit` with `--verbose`.

```
Spark config:
...
(spark.files,file:/Users/dongjoon/PRS/SPARK-PR-28077/a.py)
(spark.submit.deployMode,client)
(spark.master,local[*])
```

Closes #28077 from viirya/pyfile.

Lead-authored-by: Liang-Chi Hsieh 
Co-authored-by: Liang-Chi Hsieh 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index 4d67dfa..1271a3d 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -474,10 +474,12 @@ private[spark] class SparkSubmit extends Logging {
 args.mainClass = "org.apache.spark.deploy.PythonRunner"
 args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) ++ 
args.childArgs
   }
-  if (clusterManager != YARN) {
-// The YARN backend handles python files differently, so don't merge 
the lists.
-args.files = mergeFileLists(args.files, args.pyFiles)
-  }
+}
+
+// Non-PySpark applications can need Python dependencies.
+if (deployMode == CLIENT && clusterManager != YARN) {
+  // The YARN backend handles python files differently, so don't merge the 
lists.
+  args.files = mergeFileLists(args.files, args.pyFiles)
 }
 
 if (localPyFiles != null) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1a7f964 -> 5ec1814)

2020-03-31 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1a7f964  [SPARK-31305][SQL][DOCS] Add a page to list all commands in 
SQL Reference
 add 5ec1814  [SPARK-31248][CORE][TEST] Fix flaky 
ExecutorAllocationManagerSuite.interleaving add and remove

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala | 2 +-
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference

2020-03-31 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 01b26c4  [SPARK-31305][SQL][DOCS] Add a page to list all commands in 
SQL Reference
01b26c4 is described below

commit 01b26c49009d8136f1f962e87ce7e35db43533ab
Author: Huaxin Gao 
AuthorDate: Wed Apr 1 08:42:15 2020 +0900

[SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference

### What changes were proposed in this pull request?
Add a page to list all commands in SQL Reference...

### Why are the changes needed?
so it's easier for user to find a specific command.

### Does this PR introduce any user-facing change?
before:

![image](https://user-images.githubusercontent.com/13592258/77938658-ec03e700-726a-11ea-983c-7a559cc0aae2.png)

after:

![image](https://user-images.githubusercontent.com/13592258/77937899-d3df9800-7269-11ea-85db-749a9521576a.png)


![image](https://user-images.githubusercontent.com/13592258/77937924-db9f3c80-7269-11ea-9441-7603feee421c.png)

Also move ```use database``` from query category to ddl category.

### How was this patch tested?
Manually build and check

Closes #28074 from huaxingao/list-all.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 1a7f9649b67d2108cb14e9e466855dfe52db6d66)
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml   |  4 +--
 docs/sql-ref-syntax-ddl.md |  1 +
 docs/sql-ref-syntax.md | 62 +-
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 3bf4952..6534c50 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -123,6 +123,8 @@
   url: sql-ref-syntax-ddl-truncate-table.html
 - text: REPAIR TABLE
   url: sql-ref-syntax-ddl-repair-table.html
+- text: USE DATABASE
+  url: sql-ref-syntax-qry-select-usedb.html
 - text: Data Manipulation Statements
   url: sql-ref-syntax-dml.html
   subitems:
@@ -152,8 +154,6 @@
   url: sql-ref-syntax-qry-select-distribute-by.html
 - text: LIMIT Clause 
   url: sql-ref-syntax-qry-select-limit.html
-- text: USE database
-  url: sql-ref-syntax-qry-select-usedb.html
 - text: EXPLAIN
   url: sql-ref-syntax-qry-explain.html
 - text: Auxiliary Statements
diff --git a/docs/sql-ref-syntax-ddl.md b/docs/sql-ref-syntax-ddl.md
index 954020a..ab4e95a 100644
--- a/docs/sql-ref-syntax-ddl.md
+++ b/docs/sql-ref-syntax-ddl.md
@@ -36,3 +36,4 @@ Data Definition Statements are used to create or modify the 
structure of databas
 - [DROP VIEW](sql-ref-syntax-ddl-drop-view.html)
 - [TRUNCATE TABLE](sql-ref-syntax-ddl-truncate-table.html)
 - [REPAIR TABLE](sql-ref-syntax-ddl-repair-table.html)
+- [USE DATABASE](sql-ref-syntax-qry-select-usedb.html)
diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md
index 2510278..3db97ac 100644
--- a/docs/sql-ref-syntax.md
+++ b/docs/sql-ref-syntax.md
@@ -19,4 +19,64 @@ license: |
   limitations under the License.
 ---
 
-Spark SQL is Apache Spark's module for working with structured data. The SQL 
Syntax section describes the SQL syntax in detail along with usage examples 
when applicable.
+Spark SQL is Apache Spark's module for working with structured data. The SQL 
Syntax section describes the SQL syntax in detail along with usage examples 
when applicable. This document provides a list of Data Definition and Data 
Manipulation Statements, as well as Data Retrieval and Auxiliary Statements.
+
+### DDL Statements
+- [ALTER DATABASE](sql-ref-syntax-ddl-alter-database.html)
+- [ALTER TABLE](sql-ref-syntax-ddl-alter-table.html)
+- [ALTER VIEW](sql-ref-syntax-ddl-alter-view.html)
+- [CREATE DATABASE](sql-ref-syntax-ddl-create-database.html)
+- [CREATE FUNCTION](sql-ref-syntax-ddl-create-function.html)
+- [CREATE TABLE](sql-ref-syntax-ddl-create-table.html)
+- [CREATE VIEW](sql-ref-syntax-ddl-create-view.html)
+- [DROP DATABASE](sql-ref-syntax-ddl-drop-database.html)
+- [DROP FUNCTION](sql-ref-syntax-ddl-drop-function.html)
+- [DROP TABLE](sql-ref-syntax-ddl-drop-table.html)
+- [DROP VIEW](sql-ref-syntax-ddl-drop-view.html)
+- [REPAIR TABLE](sql-ref-syntax-ddl-repair-table.html)
+- [TRUNCATE TABLE](sql-ref-syntax-ddl-truncate-table.html)
+- [USE DATABASE](sql-ref-syntax-qry-select-usedb.html)
+
+### DML Statements
+- [INSERT INTO](sql-ref-syntax-dml-insert-into.html)
+- [INSERT OVERWRITE](sql-ref-syntax-dml-insert-overwrite-table.html)
+- [INSERT OVERWRITE 
DIRECTORY](sql-ref-syntax-dml-insert-overwrite-directory.html)
+- [INSERT OVERWRITE DIRECTORY with 

[spark] branch master updated: [SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference

2020-03-31 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1a7f964  [SPARK-31305][SQL][DOCS] Add a page to list all commands in 
SQL Reference
1a7f964 is described below

commit 1a7f9649b67d2108cb14e9e466855dfe52db6d66
Author: Huaxin Gao 
AuthorDate: Wed Apr 1 08:42:15 2020 +0900

[SPARK-31305][SQL][DOCS] Add a page to list all commands in SQL Reference

### What changes were proposed in this pull request?
Add a page to list all commands in SQL Reference...

### Why are the changes needed?
so it's easier for user to find a specific command.

### Does this PR introduce any user-facing change?
before:

![image](https://user-images.githubusercontent.com/13592258/77938658-ec03e700-726a-11ea-983c-7a559cc0aae2.png)

after:

![image](https://user-images.githubusercontent.com/13592258/77937899-d3df9800-7269-11ea-85db-749a9521576a.png)


![image](https://user-images.githubusercontent.com/13592258/77937924-db9f3c80-7269-11ea-9441-7603feee421c.png)

Also move ```use database``` from query category to ddl category.

### How was this patch tested?
Manually build and check

Closes #28074 from huaxingao/list-all.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml   |  4 +--
 docs/sql-ref-syntax-ddl.md |  1 +
 docs/sql-ref-syntax.md | 62 +-
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 3bf4952..6534c50 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -123,6 +123,8 @@
   url: sql-ref-syntax-ddl-truncate-table.html
 - text: REPAIR TABLE
   url: sql-ref-syntax-ddl-repair-table.html
+- text: USE DATABASE
+  url: sql-ref-syntax-qry-select-usedb.html
 - text: Data Manipulation Statements
   url: sql-ref-syntax-dml.html
   subitems:
@@ -152,8 +154,6 @@
   url: sql-ref-syntax-qry-select-distribute-by.html
 - text: LIMIT Clause 
   url: sql-ref-syntax-qry-select-limit.html
-- text: USE database
-  url: sql-ref-syntax-qry-select-usedb.html
 - text: EXPLAIN
   url: sql-ref-syntax-qry-explain.html
 - text: Auxiliary Statements
diff --git a/docs/sql-ref-syntax-ddl.md b/docs/sql-ref-syntax-ddl.md
index 954020a..ab4e95a 100644
--- a/docs/sql-ref-syntax-ddl.md
+++ b/docs/sql-ref-syntax-ddl.md
@@ -36,3 +36,4 @@ Data Definition Statements are used to create or modify the 
structure of databas
 - [DROP VIEW](sql-ref-syntax-ddl-drop-view.html)
 - [TRUNCATE TABLE](sql-ref-syntax-ddl-truncate-table.html)
 - [REPAIR TABLE](sql-ref-syntax-ddl-repair-table.html)
+- [USE DATABASE](sql-ref-syntax-qry-select-usedb.html)
diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md
index 2510278..3db97ac 100644
--- a/docs/sql-ref-syntax.md
+++ b/docs/sql-ref-syntax.md
@@ -19,4 +19,64 @@ license: |
   limitations under the License.
 ---
 
-Spark SQL is Apache Spark's module for working with structured data. The SQL 
Syntax section describes the SQL syntax in detail along with usage examples 
when applicable.
+Spark SQL is Apache Spark's module for working with structured data. The SQL 
Syntax section describes the SQL syntax in detail along with usage examples 
when applicable. This document provides a list of Data Definition and Data 
Manipulation Statements, as well as Data Retrieval and Auxiliary Statements.
+
+### DDL Statements
+- [ALTER DATABASE](sql-ref-syntax-ddl-alter-database.html)
+- [ALTER TABLE](sql-ref-syntax-ddl-alter-table.html)
+- [ALTER VIEW](sql-ref-syntax-ddl-alter-view.html)
+- [CREATE DATABASE](sql-ref-syntax-ddl-create-database.html)
+- [CREATE FUNCTION](sql-ref-syntax-ddl-create-function.html)
+- [CREATE TABLE](sql-ref-syntax-ddl-create-table.html)
+- [CREATE VIEW](sql-ref-syntax-ddl-create-view.html)
+- [DROP DATABASE](sql-ref-syntax-ddl-drop-database.html)
+- [DROP FUNCTION](sql-ref-syntax-ddl-drop-function.html)
+- [DROP TABLE](sql-ref-syntax-ddl-drop-table.html)
+- [DROP VIEW](sql-ref-syntax-ddl-drop-view.html)
+- [REPAIR TABLE](sql-ref-syntax-ddl-repair-table.html)
+- [TRUNCATE TABLE](sql-ref-syntax-ddl-truncate-table.html)
+- [USE DATABASE](sql-ref-syntax-qry-select-usedb.html)
+
+### DML Statements
+- [INSERT INTO](sql-ref-syntax-dml-insert-into.html)
+- [INSERT OVERWRITE](sql-ref-syntax-dml-insert-overwrite-table.html)
+- [INSERT OVERWRITE 
DIRECTORY](sql-ref-syntax-dml-insert-overwrite-directory.html)
+- [INSERT OVERWRITE DIRECTORY with Hive 
format](sql-ref-syntax-dml-insert-overwrite-directory-hive.html)
+- [LOAD](sql-ref-syntax-dml-load.html)
+
+### 

[spark] branch master updated: [SPARK-31304][ML][EXAMPLES] Add examples for ml.stat.ANOVATest

2020-03-31 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e65c21e  [SPARK-31304][ML][EXAMPLES] Add examples for ml.stat.ANOVATest
e65c21e is described below

commit e65c21e093a643573f7ced4998dd9050557ec328
Author: Qianyang Yu 
AuthorDate: Tue Mar 31 16:33:26 2020 -0500

[SPARK-31304][ML][EXAMPLES] Add examples for ml.stat.ANOVATest

### What changes were proposed in this pull request?

Add ANOVATest example for ml.stat.ANOVATest in python/java/scala

### Why are the changes needed?

Improve ML example

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

manually run the example

Closes #28073 from kevinyu98/add-ANOVA-example.

Authored-by: Qianyang Yu 
Signed-off-by: Sean Owen 
---
 .../spark/examples/ml/JavaANOVATestExample.java| 75 ++
 examples/src/main/python/ml/anova_test_example.py  | 52 +++
 .../spark/examples/ml/ANOVATestExample.scala   | 63 ++
 3 files changed, 190 insertions(+)

diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaANOVATestExample.java 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaANOVATestExample.java
new file mode 100644
index 000..3b2de1f
--- /dev/null
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaANOVATestExample.java
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import org.apache.spark.sql.SparkSession;
+
+// $example on$
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.spark.ml.linalg.Vectors;
+import org.apache.spark.ml.linalg.VectorUDT;
+import org.apache.spark.ml.stat.ANOVATest;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.RowFactory;
+import org.apache.spark.sql.types.*;
+// $example off$
+
+/**
+ * An example for ANOVA testing.
+ * Run with
+ * 
+ * bin/run-example ml.JavaANOVATestExample
+ * 
+ */
+public class JavaANOVATestExample {
+
+  public static void main(String[] args) {
+SparkSession spark = SparkSession
+  .builder()
+  .appName("JavaANOVATestExample")
+  .getOrCreate();
+
+// $example on$
+List data = Arrays.asList(
+  RowFactory.create(3.0, Vectors.dense(1.7, 4.4, 7.6, 5.8, 9.6, 2.3)),
+  RowFactory.create(2.0, Vectors.dense(8.8, 7.3, 5.7, 7.3, 2.2, 4.1)),
+  RowFactory.create(1.0, Vectors.dense(1.2, 9.5, 2.5, 3.1, 8.7, 2.5)),
+  RowFactory.create(2.0, Vectors.dense(3.7, 9.2, 6.1, 4.1, 7.5, 3.8)),
+  RowFactory.create(4.0, Vectors.dense(8.9, 5.2, 7.8, 8.3, 5.2, 3.0)),
+  RowFactory.create(4.0, Vectors.dense(7.9, 8.5, 9.2, 4.0, 9.4, 2.1))
+);
+
+StructType schema = new StructType(new StructField[]{
+  new StructField("label", DataTypes.DoubleType, false, Metadata.empty()),
+  new StructField("features", new VectorUDT(), false, Metadata.empty()),
+});
+
+Dataset df = spark.createDataFrame(data, schema);
+Row r = ANOVATest.test(df, "features", "label").head();
+System.out.println("pValues: " + r.get(0).toString());
+System.out.println("degreesOfFreedom: " + r.getList(1).toString());
+System.out.println("fValues: " + r.get(2).toString());
+
+// $example off$
+
+spark.stop();
+  }
+}
diff --git a/examples/src/main/python/ml/anova_test_example.py 
b/examples/src/main/python/ml/anova_test_example.py
new file mode 100644
index 000..3fffdbd
--- /dev/null
+++ b/examples/src/main/python/ml/anova_test_example.py
@@ -0,0 +1,52 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#

[spark] branch master updated (590b9a0 -> 34c7ec8)

2020-03-31 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 590b9a0  [SPARK-31010][SQL][FOLLOW-UP] Add Java UDF suggestion in 
error message of untyped Scala UDF
 add 34c7ec8  [SPARK-31253][SQL] Add metrics to AQE shuffle reader

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/ShuffledRowRDD.scala   |  16 ++-
 .../adaptive/CoalesceShufflePartitions.scala   |   6 +-
 .../adaptive/CustomShuffleReaderExec.scala | 114 ++---
 .../adaptive/OptimizeLocalShuffleReader.scala  |  15 ++-
 .../execution/adaptive/OptimizeSkewedJoin.scala|  82 ---
 .../sql/execution/adaptive/QueryStageExec.scala|   5 +
 .../execution/CoalesceShufflePartitionsSuite.scala |  23 +++--
 .../adaptive/AdaptiveQueryExecSuite.scala  |  74 -
 8 files changed, 229 insertions(+), 106 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (2a6aa8e -> 590b9a0)

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2a6aa8e  [SPARK-31312][SQL] Cache Class instance for the UDF instance 
in HiveFunctionWrapper
 add 590b9a0  [SPARK-31010][SQL][FOLLOW-UP] Add Java UDF suggestion in 
error message of untyped Scala UDF

No new revisions were added by this update.

Summary of changes:
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31010][SQL][FOLLOW-UP] Add Java UDF suggestion in error message of untyped Scala UDF

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 207344d  [SPARK-31010][SQL][FOLLOW-UP] Add Java UDF suggestion in 
error message of untyped Scala UDF
207344d is described below

commit 207344d0da86496b377c2c5f5ad613c6d02f4c33
Author: yi.wu 
AuthorDate: Tue Mar 31 17:35:26 2020 +

[SPARK-31010][SQL][FOLLOW-UP] Add Java UDF suggestion in error message of 
untyped Scala UDF

### What changes were proposed in this pull request?

Added Java UDF suggestion in the in error message of untyped Scala UDF.

### Why are the changes needed?

To help user migrate their use case from deprecate untyped Scala UDF to 
other supported UDF.

### Does this PR introduce any user-facing change?

No. It haven't been released.

### How was this patch tested?

Pass Jenkins.

Closes #28070 from Ngone51/spark_31010.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 590b9a0132b68d9523e663997def957b2e46dfb1)
Signed-off-by: Wenchen Fan 
---
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index fd4e77f..782be98 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -4841,9 +4841,13 @@ object functions {
 "information. Spark may blindly pass null to the Scala closure with 
primitive-type " +
 "argument, and the closure will see the default value of the Java type 
for the null " +
 "argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for 
null input. " +
-"You could use typed Scala UDF APIs (e.g. `udf((x: Int) => x)`) to 
avoid this problem, " +
-s"or set ${SQLConf.LEGACY_ALLOW_UNTYPED_SCALA_UDF.key} to true and use 
this API with " +
-s"caution."
+"To get rid of this error, you could:\n" +
+"1. use typed Scala UDF APIs, e.g. `udf((x: Int) => x)`\n" +
+"2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { " +
+"override def call(s: String): Integer = s.length() }, IntegerType)`, 
" +
+"if input types are all non primitive\n" +
+s"3. set ${SQLConf.LEGACY_ALLOW_UNTYPED_SCALA_UDF.key} to true and " +
+s"use this API with caution"
   throw new AnalysisException(errorMsg)
 }
 SparkUserDefinedFunction(f, dataType, inputEncoders = Nil)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30775][DOC] Improve the description of executor metrics in the monitoring documentation

2020-03-31 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ca3887a  [SPARK-30775][DOC] Improve the description of executor 
metrics in the monitoring documentation
ca3887a is described below

commit ca3887a0de31fa78097ca7ee92ead914a3ce050c
Author: Luca Canali 
AuthorDate: Mon Mar 30 18:00:54 2020 -0700

[SPARK-30775][DOC] Improve the description of executor metrics in the 
monitoring documentation

### What changes were proposed in this pull request?
This PR (SPARK-30775) aims to improve the description of the executor 
metrics in the monitoring documentation.

### Why are the changes needed?
Improve and clarify monitoring documentation by:
- adding reference to the Prometheus end point, as implemented in 
[SPARK-29064]
- extending the list and descripion of executor metrics, following up from 
[SPARK-27157]

### Does this PR introduce any user-facing change?
Documentation update.

### How was this patch tested?
n.a.

Closes #27526 from LucaCanali/docPrometheusMetricsFollowupSpark29064.

Authored-by: Luca Canali 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit aa98ac52dbbe3fc2d3b152af9324a71f48439a38)
Signed-off-by: Dongjoon Hyun 
---
 docs/monitoring.md | 58 +++---
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/docs/monitoring.md b/docs/monitoring.md
index ba3f1dc..131cd2a 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -689,9 +689,12 @@ A list of the available metrics, with a short description:
 ### Executor Metrics
 
 Executor-level metrics are sent from each executor to the driver as part of 
the Heartbeat to describe the performance metrics of Executor itself like JVM 
heap memory, GC information.
-Executor metric values and their measured peak values per executor are exposed 
via the REST API at the end point `/applications/[app-id]/executors`.
-In addition, aggregated per-stage peak values of the executor metrics are 
written to the event log if `spark.eventLog.logStageExecutorMetrics` is true.
-Executor metrics are also exposed via the Spark metrics system based on the 
Dropwizard metrics library.
+Executor metric values and their measured memory peak values per executor are 
exposed via the REST API in JSON format and in Prometheus format.
+The JSON end point is exposed at: `/applications/[app-id]/executors`, and the 
Prometheus endpoint at: `/metrics/executors/prometheus`.
+The Prometheus endpoint is conditional to a configuration parameter: 
`spark.ui.prometheus.enabled=true` (the default is `false`).
+In addition, aggregated per-stage peak values of the executor memory metrics 
are written to the event log if
+`spark.eventLog.logStageExecutorMetrics` is true.  
+Executor memory metrics are also exposed via the Spark metrics system based on 
the Dropwizard metrics library.
 A list of the available metrics, with a short description:
 
 
@@ -699,21 +702,62 @@ A list of the available metrics, with a short description:
   Short description
   
   
+rddBlocks
+RDD blocks in the block manager of this executor.
+  
+  
+memoryUsed
+Storage memory used by this executor.
+  
+  
+diskUsed
+Disk space used for RDD storage by this executor.
+  
+  
+totalCores
+Number of cores available in this executor.
+  
+  
+maxTasks
+Maximum number of tasks that can run concurrently in this 
executor.
+  
+  
+activeTasks
+Number of tasks currently executing.
+  
+  
+failedTasks
+Number of tasks that have failed in this executor.
+  
+  
+completedTasks
+Number of tasks that have completed in this executor.
+  
+  
+totalTasks
+Total number of tasks (running, failed and completed) in this 
executor.
+  
+  
+totalDuration
+Elapsed time the JVM spent executing tasks in this executor.
+The value is expressed in milliseconds.
+  
+  
 totalGCTime
-Elapsed time the JVM spent in garbage collection summed in this 
Executor.
+Elapsed time the JVM spent in garbage collection summed in this 
executor.
 The value is expressed in milliseconds.
   
   
 totalInputBytes
-Total input bytes summed in this Executor.
+Total input bytes summed in this executor.
   
   
 totalShuffleRead
-Total shuffer read bytes summed in this Executor.
+Total shuffle read bytes summed in this executor.
   
   
 totalShuffleWrite
-Total shuffer write bytes summed in this Executor.
+Total shuffle write bytes summed in this executor.
   
   
 maxMemory


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[spark] branch branch-3.0 updated: [SPARK-30775][DOC] Improve the description of executor metrics in the monitoring documentation

2020-03-31 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ca3887a  [SPARK-30775][DOC] Improve the description of executor 
metrics in the monitoring documentation
ca3887a is described below

commit ca3887a0de31fa78097ca7ee92ead914a3ce050c
Author: Luca Canali 
AuthorDate: Mon Mar 30 18:00:54 2020 -0700

[SPARK-30775][DOC] Improve the description of executor metrics in the 
monitoring documentation

### What changes were proposed in this pull request?
This PR (SPARK-30775) aims to improve the description of the executor 
metrics in the monitoring documentation.

### Why are the changes needed?
Improve and clarify monitoring documentation by:
- adding reference to the Prometheus end point, as implemented in 
[SPARK-29064]
- extending the list and descripion of executor metrics, following up from 
[SPARK-27157]

### Does this PR introduce any user-facing change?
Documentation update.

### How was this patch tested?
n.a.

Closes #27526 from LucaCanali/docPrometheusMetricsFollowupSpark29064.

Authored-by: Luca Canali 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit aa98ac52dbbe3fc2d3b152af9324a71f48439a38)
Signed-off-by: Dongjoon Hyun 
---
 docs/monitoring.md | 58 +++---
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/docs/monitoring.md b/docs/monitoring.md
index ba3f1dc..131cd2a 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -689,9 +689,12 @@ A list of the available metrics, with a short description:
 ### Executor Metrics
 
 Executor-level metrics are sent from each executor to the driver as part of 
the Heartbeat to describe the performance metrics of Executor itself like JVM 
heap memory, GC information.
-Executor metric values and their measured peak values per executor are exposed 
via the REST API at the end point `/applications/[app-id]/executors`.
-In addition, aggregated per-stage peak values of the executor metrics are 
written to the event log if `spark.eventLog.logStageExecutorMetrics` is true.
-Executor metrics are also exposed via the Spark metrics system based on the 
Dropwizard metrics library.
+Executor metric values and their measured memory peak values per executor are 
exposed via the REST API in JSON format and in Prometheus format.
+The JSON end point is exposed at: `/applications/[app-id]/executors`, and the 
Prometheus endpoint at: `/metrics/executors/prometheus`.
+The Prometheus endpoint is conditional to a configuration parameter: 
`spark.ui.prometheus.enabled=true` (the default is `false`).
+In addition, aggregated per-stage peak values of the executor memory metrics 
are written to the event log if
+`spark.eventLog.logStageExecutorMetrics` is true.  
+Executor memory metrics are also exposed via the Spark metrics system based on 
the Dropwizard metrics library.
 A list of the available metrics, with a short description:
 
 
@@ -699,21 +702,62 @@ A list of the available metrics, with a short description:
   Short description
   
   
+rddBlocks
+RDD blocks in the block manager of this executor.
+  
+  
+memoryUsed
+Storage memory used by this executor.
+  
+  
+diskUsed
+Disk space used for RDD storage by this executor.
+  
+  
+totalCores
+Number of cores available in this executor.
+  
+  
+maxTasks
+Maximum number of tasks that can run concurrently in this 
executor.
+  
+  
+activeTasks
+Number of tasks currently executing.
+  
+  
+failedTasks
+Number of tasks that have failed in this executor.
+  
+  
+completedTasks
+Number of tasks that have completed in this executor.
+  
+  
+totalTasks
+Total number of tasks (running, failed and completed) in this 
executor.
+  
+  
+totalDuration
+Elapsed time the JVM spent executing tasks in this executor.
+The value is expressed in milliseconds.
+  
+  
 totalGCTime
-Elapsed time the JVM spent in garbage collection summed in this 
Executor.
+Elapsed time the JVM spent in garbage collection summed in this 
executor.
 The value is expressed in milliseconds.
   
   
 totalInputBytes
-Total input bytes summed in this Executor.
+Total input bytes summed in this executor.
   
   
 totalShuffleRead
-Total shuffer read bytes summed in this Executor.
+Total shuffle read bytes summed in this executor.
   
   
 totalShuffleWrite
-Total shuffer write bytes summed in this Executor.
+Total shuffle write bytes summed in this executor.
   
   
 maxMemory


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[spark] branch branch-3.0 updated: [SPARK-29574][K8S][FOLLOWUP] Fix bash comparison error in Docker entrypoint.sh

2020-03-31 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5a96ee7  [SPARK-29574][K8S][FOLLOWUP] Fix bash comparison error in 
Docker entrypoint.sh
5a96ee7 is described below

commit 5a96ee7619ea07edefd030c66641e6e473a890e0
Author: Đặng Minh Dũng 
AuthorDate: Mon Mar 30 15:41:57 2020 -0700

[SPARK-29574][K8S][FOLLOWUP] Fix bash comparison error in Docker 
entrypoint.sh

A small change to fix an error in Docker `entrypoint.sh`

When spark running on Kubernetes, I got the following logs:
```log
+ '[' -n ']'
+ '[' -z ']'
++ /bin/hadoop classpath
/opt/entrypoint.sh: line 62: /bin/hadoop: No such file or directory
+ export SPARK_DIST_CLASSPATH=
+ SPARK_DIST_CLASSPATH=
```
This is because you are missing some quotes on bash comparisons.

No

CI

Closes #28075 from dungdm93/patch-1.

Authored-by: Đặng Minh Dũng 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 1d0fc9aa85b3ad3326b878de49b748413dee1dd9)
Signed-off-by: Dongjoon Hyun 
---
 .../kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
index 6ee3523..8218c29 100755
--- 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
+++ 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
@@ -58,8 +58,8 @@ fi
 
 # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
 # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
-if [ -n ${HADOOP_HOME}  ] && [ -z ${SPARK_DIST_CLASSPATH}  ]; then
-  export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)  
+if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
+  export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
 fi
 
 if ! [ -z ${HADOOP_CONF_DIR+x} ]; then


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-29574][K8S][FOLLOWUP] Fix bash comparison error in Docker entrypoint.sh

2020-03-31 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5a96ee7  [SPARK-29574][K8S][FOLLOWUP] Fix bash comparison error in 
Docker entrypoint.sh
5a96ee7 is described below

commit 5a96ee7619ea07edefd030c66641e6e473a890e0
Author: Đặng Minh Dũng 
AuthorDate: Mon Mar 30 15:41:57 2020 -0700

[SPARK-29574][K8S][FOLLOWUP] Fix bash comparison error in Docker 
entrypoint.sh

A small change to fix an error in Docker `entrypoint.sh`

When spark running on Kubernetes, I got the following logs:
```log
+ '[' -n ']'
+ '[' -z ']'
++ /bin/hadoop classpath
/opt/entrypoint.sh: line 62: /bin/hadoop: No such file or directory
+ export SPARK_DIST_CLASSPATH=
+ SPARK_DIST_CLASSPATH=
```
This is because you are missing some quotes on bash comparisons.

No

CI

Closes #28075 from dungdm93/patch-1.

Authored-by: Đặng Minh Dũng 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 1d0fc9aa85b3ad3326b878de49b748413dee1dd9)
Signed-off-by: Dongjoon Hyun 
---
 .../kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
index 6ee3523..8218c29 100755
--- 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
+++ 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
@@ -58,8 +58,8 @@ fi
 
 # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
 # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
-if [ -n ${HADOOP_HOME}  ] && [ -z ${SPARK_DIST_CLASSPATH}  ]; then
-  export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)  
+if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
+  export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"
 fi
 
 if ! [ -z ${HADOOP_CONF_DIR+x} ]; then


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new bd2b6aa  [SPARK-31312][SQL] Cache Class instance for the UDF instance 
in HiveFunctionWrapper
bd2b6aa is described below

commit bd2b6aa42c8a5472c464bb1ee1a8f59a97f699f9
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Tue Mar 31 16:17:26 2020 +

[SPARK-31312][SQL] Cache Class instance for the UDF instance in 
HiveFunctionWrapper

### What changes were proposed in this pull request?

This patch proposes to cache Class instance for the UDF instance in 
HiveFunctionWrapper to fix the case where Hive simple UDF is somehow 
transformed (expression is copied) and evaluated later with another classloader 
(for the case current thread context classloader is somehow changed). In this 
case, Spark throws CNFE as of now.

It's only occurred for Hive simple UDF, as HiveFunctionWrapper caches the 
UDF instance whereas it doesn't do for `UDF` type. The comment says Spark has 
to create instance every time for UDF, so we cannot simply do the same. This 
patch caches Class instance instead, and switch current thread context 
classloader to which loads the Class instance.

This patch extends the test boundary as well. We only tested with 
GenericUDTF for SPARK-26560, and this patch actually requires only UDF. But to 
avoid regression for other types as well, this patch adds all available types 
(UDF, GenericUDF, AbstractGenericUDAFResolver, UDAF, GenericUDTF) into the 
boundary of tests.

Credit to cloud-fan as he discovered the problem and proposed the solution.

### Why are the changes needed?

Above section describes why it's a bug and how it's fixed.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

New UTs added.

Closes #28079 from HeartSaVioR/SPARK-31312.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 2a6aa8e87bec39f6bfec67e151ef8566b75caecd)
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/sql/hive/HiveShim.scala |  18 +-
 .../src/test/noclasspath/TestUDTF-spark-26560.jar  | Bin 7462 -> 0 bytes
 sql/hive/src/test/noclasspath/hive-test-udfs.jar   | Bin 0 -> 35660 bytes
 .../spark/sql/hive/HiveUDFDynamicLoadSuite.scala   | 190 +
 .../spark/sql/hive/execution/SQLQuerySuite.scala   |  47 -
 5 files changed, 203 insertions(+), 52 deletions(-)

diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
index 3beef6b..04a6a8f 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
@@ -118,9 +118,12 @@ private[hive] object HiveShim {
*
* @param functionClassName UDF class name
* @param instance optional UDF instance which contains additional 
information (for macro)
+   * @param clazz optional class instance to create UDF instance
*/
-  private[hive] case class HiveFunctionWrapper(var functionClassName: String,
-private var instance: AnyRef = null) extends java.io.Externalizable {
+  private[hive] case class HiveFunctionWrapper(
+  var functionClassName: String,
+  private var instance: AnyRef = null,
+  private var clazz: Class[_ <: AnyRef] = null) extends 
java.io.Externalizable {
 
 // for Serialization
 def this() = this(null)
@@ -232,8 +235,10 @@ private[hive] object HiveShim {
 in.readFully(functionInBytes)
 
 // deserialize the function object via Hive Utilities
+clazz = Utils.getContextOrSparkClassLoader.loadClass(functionClassName)
+  .asInstanceOf[Class[_ <: AnyRef]]
 instance = deserializePlan[AnyRef](new 
java.io.ByteArrayInputStream(functionInBytes),
-  Utils.getContextOrSparkClassLoader.loadClass(functionClassName))
+  clazz)
   }
 }
 
@@ -241,8 +246,11 @@ private[hive] object HiveShim {
   if (instance != null) {
 instance.asInstanceOf[UDFType]
   } else {
-val func = Utils.getContextOrSparkClassLoader
-  
.loadClass(functionClassName).getConstructor().newInstance().asInstanceOf[UDFType]
+if (clazz == null) {
+  clazz = 
Utils.getContextOrSparkClassLoader.loadClass(functionClassName)
+.asInstanceOf[Class[_ <: AnyRef]]
+}
+val func = clazz.getConstructor().newInstance().asInstanceOf[UDFType]
 if (!func.isInstanceOf[UDF]) {
   // We cache the function if it's no the Simple UDF,
   // as we always have to create new instance for Simple UDF
diff --git a/sql/hive/src/test/noclasspath/TestUDTF-spark-26560.jar 

[spark] branch master updated: [SPARK-31312][SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2a6aa8e  [SPARK-31312][SQL] Cache Class instance for the UDF instance 
in HiveFunctionWrapper
2a6aa8e is described below

commit 2a6aa8e87bec39f6bfec67e151ef8566b75caecd
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Tue Mar 31 16:17:26 2020 +

[SPARK-31312][SQL] Cache Class instance for the UDF instance in 
HiveFunctionWrapper

### What changes were proposed in this pull request?

This patch proposes to cache Class instance for the UDF instance in 
HiveFunctionWrapper to fix the case where Hive simple UDF is somehow 
transformed (expression is copied) and evaluated later with another classloader 
(for the case current thread context classloader is somehow changed). In this 
case, Spark throws CNFE as of now.

It's only occurred for Hive simple UDF, as HiveFunctionWrapper caches the 
UDF instance whereas it doesn't do for `UDF` type. The comment says Spark has 
to create instance every time for UDF, so we cannot simply do the same. This 
patch caches Class instance instead, and switch current thread context 
classloader to which loads the Class instance.

This patch extends the test boundary as well. We only tested with 
GenericUDTF for SPARK-26560, and this patch actually requires only UDF. But to 
avoid regression for other types as well, this patch adds all available types 
(UDF, GenericUDF, AbstractGenericUDAFResolver, UDAF, GenericUDTF) into the 
boundary of tests.

Credit to cloud-fan as he discovered the problem and proposed the solution.

### Why are the changes needed?

Above section describes why it's a bug and how it's fixed.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

New UTs added.

Closes #28079 from HeartSaVioR/SPARK-31312.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/sql/hive/HiveShim.scala |  18 +-
 .../src/test/noclasspath/TestUDTF-spark-26560.jar  | Bin 7462 -> 0 bytes
 sql/hive/src/test/noclasspath/hive-test-udfs.jar   | Bin 0 -> 35660 bytes
 .../spark/sql/hive/HiveUDFDynamicLoadSuite.scala   | 190 +
 .../spark/sql/hive/execution/SQLQuerySuite.scala   |  47 -
 5 files changed, 203 insertions(+), 52 deletions(-)

diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
index 3beef6b..04a6a8f 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
@@ -118,9 +118,12 @@ private[hive] object HiveShim {
*
* @param functionClassName UDF class name
* @param instance optional UDF instance which contains additional 
information (for macro)
+   * @param clazz optional class instance to create UDF instance
*/
-  private[hive] case class HiveFunctionWrapper(var functionClassName: String,
-private var instance: AnyRef = null) extends java.io.Externalizable {
+  private[hive] case class HiveFunctionWrapper(
+  var functionClassName: String,
+  private var instance: AnyRef = null,
+  private var clazz: Class[_ <: AnyRef] = null) extends 
java.io.Externalizable {
 
 // for Serialization
 def this() = this(null)
@@ -232,8 +235,10 @@ private[hive] object HiveShim {
 in.readFully(functionInBytes)
 
 // deserialize the function object via Hive Utilities
+clazz = Utils.getContextOrSparkClassLoader.loadClass(functionClassName)
+  .asInstanceOf[Class[_ <: AnyRef]]
 instance = deserializePlan[AnyRef](new 
java.io.ByteArrayInputStream(functionInBytes),
-  Utils.getContextOrSparkClassLoader.loadClass(functionClassName))
+  clazz)
   }
 }
 
@@ -241,8 +246,11 @@ private[hive] object HiveShim {
   if (instance != null) {
 instance.asInstanceOf[UDFType]
   } else {
-val func = Utils.getContextOrSparkClassLoader
-  
.loadClass(functionClassName).getConstructor().newInstance().asInstanceOf[UDFType]
+if (clazz == null) {
+  clazz = 
Utils.getContextOrSparkClassLoader.loadClass(functionClassName)
+.asInstanceOf[Class[_ <: AnyRef]]
+}
+val func = clazz.getConstructor().newInstance().asInstanceOf[UDFType]
 if (!func.isInstanceOf[UDF]) {
   // We cache the function if it's no the Simple UDF,
   // as we always have to create new instance for Simple UDF
diff --git a/sql/hive/src/test/noclasspath/TestUDTF-spark-26560.jar 
b/sql/hive/src/test/noclasspath/TestUDTF-spark-26560.jar
deleted file mode 100644
index b73b17d..000
Binary files 

[spark] branch branch-3.0 updated: [SPARK-31230][SQL] Use statement plans in DataFrameWriter(V2)

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 20bb334  [SPARK-31230][SQL] Use statement plans in DataFrameWriter(V2)
20bb334 is described below

commit 20bb33453f85aeb5d2448252a9dd23d3ab85d251
Author: Wenchen Fan 
AuthorDate: Tue Mar 31 23:19:46 2020 +0800

[SPARK-31230][SQL] Use statement plans in DataFrameWriter(V2)

### What changes were proposed in this pull request?

Create statement plans in `DataFrameWriter(V2)`, like the SQL API.

### Why are the changes needed?

It's better to leave all the resolution work to the analyzer.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #27992 from cloud-fan/statement.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 8b01473e8bffe349b1ed993b61420d7d68896cd8)
Signed-off-by: Wenchen Fan 
---
 .../sql/catalyst/analysis/ResolveCatalogs.scala|  8 ++--
 .../spark/sql/catalyst/parser/AstBuilder.scala |  4 +-
 .../sql/catalyst/plans/logical/statements.scala|  2 +
 .../apache/spark/sql/connector/InMemoryTable.scala |  1 +
 .../org/apache/spark/sql/DataFrameWriter.scala | 55 --
 .../org/apache/spark/sql/DataFrameWriterV2.scala   | 43 -
 .../catalyst/analysis/ResolveSessionCatalog.scala  |  8 ++--
 .../execution/command/PlanResolutionSuite.scala|  4 +-
 8 files changed, 66 insertions(+), 59 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala
index 895dfbb..403e4e8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala
@@ -134,7 +134,7 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
 ignoreIfExists = c.ifNotExists)
 
 case c @ CreateTableAsSelectStatement(
- NonSessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+ NonSessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _, 
_) =>
   CreateTableAsSelect(
 catalog.asTableCatalog,
 tbl.asIdentifier,
@@ -142,7 +142,7 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
 c.partitioning ++ c.bucketSpec.map(_.asTransform),
 c.asSelect,
 convertTableProperties(c.properties, c.options, c.location, c.comment, 
c.provider),
-writeOptions = c.options,
+writeOptions = c.writeOptions,
 ignoreIfExists = c.ifNotExists)
 
 case RefreshTableStatement(NonSessionCatalogAndTable(catalog, tbl)) =>
@@ -161,7 +161,7 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
 orCreate = c.orCreate)
 
 case c @ ReplaceTableAsSelectStatement(
- NonSessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+ NonSessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _, 
_) =>
   ReplaceTableAsSelect(
 catalog.asTableCatalog,
 tbl.asIdentifier,
@@ -169,7 +169,7 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
 c.partitioning ++ c.bucketSpec.map(_.asTransform),
 c.asSelect,
 convertTableProperties(c.properties, c.options, c.location, c.comment, 
c.provider),
-writeOptions = c.options,
+writeOptions = c.writeOptions,
 orCreate = c.orCreate)
 
 case DropTableStatement(NonSessionCatalogAndTable(catalog, tbl), ifExists, 
_) =>
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 09d316b6..cd4c895 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -2779,7 +2779,7 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   case Some(query) =>
 CreateTableAsSelectStatement(
   table, query, partitioning, bucketSpec, properties, provider, 
options, location, comment,
-  ifNotExists = ifNotExists)
+  writeOptions = Map.empty, ifNotExists = ifNotExists)
 
   case None if temp =>
 // CREATE TEMPORARY TABLE ... USING ... is not supported by the 
catalyst parser.
@@ -2834,7 +2834,7 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 
   case Some(query) =>
 ReplaceTableAsSelectStatement(table, query, partitioning, bucketSpec, 
properties,
-  provider, 

[spark] branch master updated: [SPARK-31230][SQL] Use statement plans in DataFrameWriter(V2)

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8b01473  [SPARK-31230][SQL] Use statement plans in DataFrameWriter(V2)
8b01473 is described below

commit 8b01473e8bffe349b1ed993b61420d7d68896cd8
Author: Wenchen Fan 
AuthorDate: Tue Mar 31 23:19:46 2020 +0800

[SPARK-31230][SQL] Use statement plans in DataFrameWriter(V2)

### What changes were proposed in this pull request?

Create statement plans in `DataFrameWriter(V2)`, like the SQL API.

### Why are the changes needed?

It's better to leave all the resolution work to the analyzer.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #27992 from cloud-fan/statement.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 .../sql/catalyst/analysis/ResolveCatalogs.scala|  8 ++--
 .../spark/sql/catalyst/parser/AstBuilder.scala |  4 +-
 .../sql/catalyst/plans/logical/statements.scala|  2 +
 .../apache/spark/sql/connector/InMemoryTable.scala |  1 +
 .../org/apache/spark/sql/DataFrameWriter.scala | 55 --
 .../org/apache/spark/sql/DataFrameWriterV2.scala   | 43 -
 .../catalyst/analysis/ResolveSessionCatalog.scala  |  8 ++--
 .../execution/command/PlanResolutionSuite.scala|  4 +-
 8 files changed, 66 insertions(+), 59 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala
index 463793e..2a0a944 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala
@@ -156,7 +156,7 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
 ignoreIfExists = c.ifNotExists)
 
 case c @ CreateTableAsSelectStatement(
- NonSessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+ NonSessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _, 
_) =>
   CreateTableAsSelect(
 catalog.asTableCatalog,
 tbl.asIdentifier,
@@ -164,7 +164,7 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
 c.partitioning ++ c.bucketSpec.map(_.asTransform),
 c.asSelect,
 convertTableProperties(c.properties, c.options, c.location, c.comment, 
c.provider),
-writeOptions = c.options,
+writeOptions = c.writeOptions,
 ignoreIfExists = c.ifNotExists)
 
 case RefreshTableStatement(NonSessionCatalogAndTable(catalog, tbl)) =>
@@ -183,7 +183,7 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
 orCreate = c.orCreate)
 
 case c @ ReplaceTableAsSelectStatement(
- NonSessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+ NonSessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _, 
_) =>
   ReplaceTableAsSelect(
 catalog.asTableCatalog,
 tbl.asIdentifier,
@@ -191,7 +191,7 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
 c.partitioning ++ c.bucketSpec.map(_.asTransform),
 c.asSelect,
 convertTableProperties(c.properties, c.options, c.location, c.comment, 
c.provider),
-writeOptions = c.options,
+writeOptions = c.writeOptions,
 orCreate = c.orCreate)
 
 case DropTableStatement(NonSessionCatalogAndTable(catalog, tbl), ifExists, 
_) =>
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 0f0ee80..cc41863 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -2779,7 +2779,7 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   case Some(query) =>
 CreateTableAsSelectStatement(
   table, query, partitioning, bucketSpec, properties, provider, 
options, location, comment,
-  ifNotExists = ifNotExists)
+  writeOptions = Map.empty, ifNotExists = ifNotExists)
 
   case None if temp =>
 // CREATE TEMPORARY TABLE ... USING ... is not supported by the 
catalyst parser.
@@ -2834,7 +2834,7 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 
   case Some(query) =>
 ReplaceTableAsSelectStatement(table, query, partitioning, bucketSpec, 
properties,
-  provider, options, location, comment, orCreate = orCreate)
+  provider, options, location, comment, writeOptions = 

svn commit: r38759 - in /dev/spark/v3.0.0-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2020-03-31 Thread rxin
Author: rxin
Date: Tue Mar 31 13:45:27 2020
New Revision: 38759

Log:
Apache Spark v3.0.0-rc1 docs


[This commit notification would consist of 1911 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31314][CORE] Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 08bb5f0  [SPARK-31314][CORE] Revert SPARK-29285 to fix shuffle 
regression caused by creating temporary file eagerly
08bb5f0 is described below

commit 08bb5f0ffeb4f5e37417f15931717784db544730
Author: Yuanjian Li 
AuthorDate: Tue Mar 31 19:01:08 2020 +0800

[SPARK-31314][CORE] Revert SPARK-29285 to fix shuffle regression caused by 
creating temporary file eagerly

### What changes were proposed in this pull request?
This reverts commit 8cf76f8d61b393bb3abd9780421b978e98db8cae. #25962

### Why are the changes needed?
In SPARK-29285, we change to create shuffle temporary eagerly. This is 
helpful for not to fail the entire task in the scenario of occasional disk 
failure. But for the applications that many tasks don't actually create shuffle 
files, it caused overhead. See the below benchmark:
Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 
5 times.
Data: TPC-DS scale=99 generate by spark-tpcds-datagen
Results:
| | Base
| Revert
  |

|-|-|-|
| Q20 | Vector(4.096865667, 2.76231748, 2.722007606, 2.514433591, 
2.400373579)  Median 2.722007606  | Vector(3.763185446, 2.586498463, 
2.593472842, 2.320522846, 2.224627274)  Median 2.586498463 |
| Q33 | Vector(5.872176321, 4.854397586, 4.568787136, 4.393378146, 
4.423996818)  Median 4.568787136 | Vector(5.38746785, 4.361236877, 4.082311276, 
3.867206824, 3.783188024)  Median 4.082311276  |
| Q52 | Vector(3.978870321, 3.225437871, 3.282411608, 2.869674887, 
2.644490664)  Median 3.225437871 | Vector(4.000381522, 3.196025108, 
3.248787619, 2.767444508, 2.606163423)  Median 3.196025108 |
| Q56 | Vector(6.238045133, 4.820535173, 4.609965579, 4.313509894, 
4.221256227)  Median 4.609965579 | Vector(6.241611339, 4.225592467, 
4.195202502, 3.757085755, 3.657525982)  Median 4.195202502 |

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing tests.

Closes #28072 from xuanyuanking/SPARK-29285-revert.

Authored-by: Yuanjian Li 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 07c50784d34e10bbfafac7498c0b70c4ec08048a)
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/storage/DiskBlockManager.scala| 36 --
 .../main/scala/org/apache/spark/util/Utils.scala   |  2 +-
 .../spark/storage/DiskBlockManagerSuite.scala  | 43 +-
 3 files changed, 10 insertions(+), 71 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala 
b/core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala
index ee43b76..f211394 100644
--- a/core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala
@@ -20,8 +20,6 @@ package org.apache.spark.storage
 import java.io.{File, IOException}
 import java.util.UUID
 
-import scala.util.control.NonFatal
-
 import org.apache.spark.SparkConf
 import org.apache.spark.executor.ExecutorExitCode
 import org.apache.spark.internal.{config, Logging}
@@ -119,38 +117,20 @@ private[spark] class DiskBlockManager(conf: SparkConf, 
deleteFilesOnStop: Boolea
 
   /** Produces a unique block id and File suitable for storing local 
intermediate results. */
   def createTempLocalBlock(): (TempLocalBlockId, File) = {
-var blockId = TempLocalBlockId(UUID.randomUUID())
-var tempLocalFile = getFile(blockId)
-var count = 0
-while (!canCreateFile(tempLocalFile) && count < 
Utils.MAX_DIR_CREATION_ATTEMPTS) {
-  blockId = TempLocalBlockId(UUID.randomUUID())
-  tempLocalFile = getFile(blockId)
-  count += 1
+var blockId = new TempLocalBlockId(UUID.randomUUID())
+while (getFile(blockId).exists()) {
+  blockId = new TempLocalBlockId(UUID.randomUUID())
 }
-(blockId, tempLocalFile)
+(blockId, getFile(blockId))
   }
 
   /** Produces a unique block id and File suitable for storing shuffled 
intermediate results. */
   def createTempShuffleBlock(): (TempShuffleBlockId, File) = {
-var blockId = TempShuffleBlockId(UUID.randomUUID())
-var tempShuffleFile = getFile(blockId)
-var count = 0
-while (!canCreateFile(tempShuffleFile) && count < 
Utils.MAX_DIR_CREATION_ATTEMPTS) {
-  blockId = TempShuffleBlockId(UUID.randomUUID())
-  tempShuffleFile = getFile(blockId)
- 

[spark] branch master updated (bb0b416 -> 07c5078)

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bb0b416  [SPARK-31297][SQL] Speed up dates rebasing
 add 07c5078  [SPARK-31314][CORE] Revert SPARK-29285 to fix shuffle 
regression caused by creating temporary file eagerly

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/storage/DiskBlockManager.scala| 36 --
 .../main/scala/org/apache/spark/util/Utils.scala   |  2 +-
 .../spark/storage/DiskBlockManagerSuite.scala  | 43 +-
 3 files changed, 10 insertions(+), 71 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31297][SQL] Speed up dates rebasing

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e7885b8  [SPARK-31297][SQL] Speed up dates rebasing
e7885b8 is described below

commit e7885b8a6686bc9179f741f1394dbbf7a9e211ef
Author: Maxim Gekk 
AuthorDate: Tue Mar 31 17:38:47 2020 +0800

[SPARK-31297][SQL] Speed up dates rebasing

### What changes were proposed in this pull request?
In the PR, I propose to replace current implementation of the 
`rebaseGregorianToJulianDays()` and `rebaseJulianToGregorianDays()` functions 
in `DateTimeUtils` by new one which is based on the fact that difference 
between Proleptic Gregorian and the hybrid (Julian+Gregorian) calendars was 
changed only 14 times for entire supported range of valid dates `[0001-01-01, 
-12-31]`:

| date | Proleptic Greg. days | Hybrid (Julian+Greg) days | diff|
|  | |||
|0001-01-01|-719162|-719164|-2|
|0100-03-01|-682944|-682945|-1|
|0200-03-01|-646420|-646420|0|
|0300-03-01|-609896|-609895|1|
|0500-03-01|-536847|-536845|2|
|0600-03-01|-500323|-500320|3|
|0700-03-01|-463799|-463795|4|
|0900-03-01|-390750|-390745|5|
|1000-03-01|-354226|-354220|6|
|1100-03-01|-317702|-317695|7|
|1300-03-01|-244653|-244645|8|
|1400-03-01|-208129|-208120|9|
|1500-03-01|-171605|-171595|10|
|1582-10-15|-141427|-141427|0|

For the given days since the epoch, the proposed implementation finds the 
range of days which the input days belongs to, and adds the diff in days 
between calendars to the input. The result is rebased days since the epoch in 
the target calendar.

For example, if need to rebase -65 days from Proleptic Gregorian 
calendar to the hybrid calendar. In that case, the input falls to the bucket 
[-682944, -646420), the diff associated with the range is -1. To get the 
rebased days in Julian calendar, we should add -1 to -65, and the result is 
-650001.

### Why are the changes needed?
To make dates rebasing faster.

### Does this PR introduce any user-facing change?
No, the results should be the same for valid range of the `DATE` type 
`[0001-01-01, -12-31]`.

### How was this patch tested?
- Added 2 tests to `DateTimeUtilsSuite` for the 
`rebaseGregorianToJulianDays()` and `rebaseJulianToGregorianDays()` functions. 
The tests check that results of old and new implementation (optimized version) 
are the same for all supported dates.
- Re-run `DateTimeRebaseBenchmark` on:

| Item | Description |
|  | |
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 
(ami-06f2f779464715dc5) |
| Java | OpenJDK8/11 |

Closes #28067 from MaxGekk/optimize-rebasing.

Lead-authored-by: Maxim Gekk 
Co-authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit bb0b416f0b3a2747a420b17d1bf659891bae3274)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 79 +++---
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 58 +++-
 .../DateTimeRebaseBenchmark-jdk11-results.txt  | 64 +-
 .../benchmarks/DateTimeRebaseBenchmark-results.txt | 64 +-
 4 files changed, 174 insertions(+), 91 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index 2b646cc..44cabe2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@@ -1040,6 +1040,44 @@ object DateTimeUtils {
   }
 
   /**
+   * Rebases days since the epoch from an original to an target calendar, from 
instance
+   * from a hybrid (Julian + Gregorian) to Proleptic Gregorian calendar.
+   *
+   * It finds the latest switch day which is less than `days`, and adds the 
difference
+   * in days associated with the switch day to the given `days`. The function 
is based
+   * on linear search which starts from the most recent switch days. This 
allows to perform
+   * less comparisons for modern dates.
+   *
+   * @param switchDays The days when difference in days between original and 
target
+   *   calendar was changed.
+   * @param diffs The differences in days between calendars.
+   * @param days The number of days since the epoch 1970-01-01 to be rebased 
to the
+   * target calendar.
+   * @return The rebased day
+   */
+  private def rebaseDays(switchDays: Array[Int], diffs: Array[Int], days: 
Int): Int = {
+var i = switchDays.length - 1
+

svn commit: r38754 - /dev/spark/v3.0.0-rc1-bin/

2020-03-31 Thread rxin
Author: rxin
Date: Tue Mar 31 09:57:10 2020
New Revision: 38754

Log:
Apache Spark v3.0.0-rc1

Added:
dev/spark/v3.0.0-rc1-bin/
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz   (with 
props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.sha512

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc Tue Mar 31 09:57:10 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6C/0sQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZtCiD/9GtNXfxGR9oh2B4k+fg38uCrloGUYo3Dx9
+eJU6G55fbKtXK24dKlxZQCVDpwLihycnLULcV+/D75vWa4tSoG6n/FTHimCnUJWQ
+UkEsxqhWuGi25rUx4VsOQeHPYIP9/2pVGVyanFzRp+yAyldATGG36u3Xv5lqox6b
+6pARVwC6FZWKuk1b47xbRfYKUoNTkObhGjcKKyigexqx/nZOp99NP+sVlEqRD/l/
+B7l3kgAVq3XlZKUCkMhWgAHT6rPNkvwBdYZFce9gJHuG75Zw5rQ2hHesEqDOVlC1
+kqJPtpmb2U93ItBF6ArlmXcm+60rLa++B8cyrEsKLIyYxRpHH1bQmLB9TTzDeFpz
+e+WWlUiDpC1Lorzvg+44MeOXSj9EhNgqsYypGKhlh6WTN8A+BRzvJRMpDMLElRz6
+lHaceqn9NC4eE5tzcyXAFL+8Y644nCTIZQuND72LvIv7rO0YXq/6yeudM+SDeANU
+vscR4LiQ7/a3oSpxoIuA0MjKz6gWUaYFgsb8OuUC4VQPJKQZG+57SOazq1VTlB6/
+Ur8pePIUxU52EmzmIp08ws8v+NOo9pMxw7lyBwpmGX0/ax6p9v1xVcCeXqH4HYvA
+9d7a7hZy9yoguAGsVkibSym8e6XITCDoXLb9/HPEhfdyxFgi87DVjKZ84HkyFw9/
+OzHhumSp/Q==
+=zl/N
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512
==
--- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 Tue Mar 31 09:57:10 2020
@@ -0,0 +1,3 @@
+SparkR_3.0.0.tar.gz: C2D9C0A5 E71C5B56 48AC15AA 998ABD06 2FDB4D5C D2B7C344
+ B1949A7B 28508364 A9A45767 F2642F17 7EBFF4B0 55823EBD
+ BE76A2CE 5604660F 62D1654D 8271287B

Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc Tue Mar 31 09:57:10 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6C/0wQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZkfTD/4zQ5FuCr+giluZHaBnaZy7PAtSkoTjAWKX
+8zObXESsoTlIIjHEpBUmUU6O0tZODFOF7Zau9HkftroGurYxpTWE5nX0e//71JuC
+smBWLCgAeOlNEdeZUd2zm7pPWJfwRpsOcEfexb+RvaFQriw559Erxb5NoWHFIkg/
+tsjtjitMqLxcMlzZW7A/89zqmrnzBu1vhh/q8STzA0Ub6Jq+JzD4e6yatYAzjRj3
++Um7+NL+g/2tmweH8f9TtYzQFcowm6DdXi53fWZX55oVc1xBRTNuSnAdCJlkgEPg
+nUxEcuXUvHn/NbNNHPBwP6xMKyKqJu8+4vNLzr2ZxaxArPYF2FqTl8sFNxwVBM1Y
+PnKun7iZiLq5JqC2OopiDa8FJP0JQkYVyBWAx3BOscsAELfdlZHlPdekcLE6YHHV
+pde79YJ0tzUFIdH/Ulw4Jag4Ixunrg+ajmLS8n9ncpX0I81Zv8IJDaBf0cBboFw8
+kTqAvNkcsoGdRn1OiQnlE2IUib/R0fk7MktOyoZpfKzbCzxBZgLTO4FKTbRCydQX
+I8UhuRhELHCI7YXJHwbk0Swp6+h36dUQtLxFfD/OZdDQABOK+nEVjNsBIHb7ULDB
+pCckj8HBHwaynvNLogS1KJHThW8LEXAmVQFCD39XTNMnhfCUePyzlAC4RPByIFR4

[spark] branch master updated: [SPARK-31297][SQL] Speed up dates rebasing

2020-03-31 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bb0b416  [SPARK-31297][SQL] Speed up dates rebasing
bb0b416 is described below

commit bb0b416f0b3a2747a420b17d1bf659891bae3274
Author: Maxim Gekk 
AuthorDate: Tue Mar 31 17:38:47 2020 +0800

[SPARK-31297][SQL] Speed up dates rebasing

### What changes were proposed in this pull request?
In the PR, I propose to replace current implementation of the 
`rebaseGregorianToJulianDays()` and `rebaseJulianToGregorianDays()` functions 
in `DateTimeUtils` by new one which is based on the fact that difference 
between Proleptic Gregorian and the hybrid (Julian+Gregorian) calendars was 
changed only 14 times for entire supported range of valid dates `[0001-01-01, 
-12-31]`:

| date | Proleptic Greg. days | Hybrid (Julian+Greg) days | diff|
|  | |||
|0001-01-01|-719162|-719164|-2|
|0100-03-01|-682944|-682945|-1|
|0200-03-01|-646420|-646420|0|
|0300-03-01|-609896|-609895|1|
|0500-03-01|-536847|-536845|2|
|0600-03-01|-500323|-500320|3|
|0700-03-01|-463799|-463795|4|
|0900-03-01|-390750|-390745|5|
|1000-03-01|-354226|-354220|6|
|1100-03-01|-317702|-317695|7|
|1300-03-01|-244653|-244645|8|
|1400-03-01|-208129|-208120|9|
|1500-03-01|-171605|-171595|10|
|1582-10-15|-141427|-141427|0|

For the given days since the epoch, the proposed implementation finds the 
range of days which the input days belongs to, and adds the diff in days 
between calendars to the input. The result is rebased days since the epoch in 
the target calendar.

For example, if need to rebase -65 days from Proleptic Gregorian 
calendar to the hybrid calendar. In that case, the input falls to the bucket 
[-682944, -646420), the diff associated with the range is -1. To get the 
rebased days in Julian calendar, we should add -1 to -65, and the result is 
-650001.

### Why are the changes needed?
To make dates rebasing faster.

### Does this PR introduce any user-facing change?
No, the results should be the same for valid range of the `DATE` type 
`[0001-01-01, -12-31]`.

### How was this patch tested?
- Added 2 tests to `DateTimeUtilsSuite` for the 
`rebaseGregorianToJulianDays()` and `rebaseJulianToGregorianDays()` functions. 
The tests check that results of old and new implementation (optimized version) 
are the same for all supported dates.
- Re-run `DateTimeRebaseBenchmark` on:

| Item | Description |
|  | |
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 
(ami-06f2f779464715dc5) |
| Java | OpenJDK8/11 |

Closes #28067 from MaxGekk/optimize-rebasing.

Lead-authored-by: Maxim Gekk 
Co-authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 79 +++---
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 58 +++-
 .../DateTimeRebaseBenchmark-jdk11-results.txt  | 64 +-
 .../benchmarks/DateTimeRebaseBenchmark-results.txt | 64 +-
 4 files changed, 174 insertions(+), 91 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index 268cd19..04994a1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@@ -1034,6 +1034,44 @@ object DateTimeUtils {
   }
 
   /**
+   * Rebases days since the epoch from an original to an target calendar, from 
instance
+   * from a hybrid (Julian + Gregorian) to Proleptic Gregorian calendar.
+   *
+   * It finds the latest switch day which is less than `days`, and adds the 
difference
+   * in days associated with the switch day to the given `days`. The function 
is based
+   * on linear search which starts from the most recent switch days. This 
allows to perform
+   * less comparisons for modern dates.
+   *
+   * @param switchDays The days when difference in days between original and 
target
+   *   calendar was changed.
+   * @param diffs The differences in days between calendars.
+   * @param days The number of days since the epoch 1970-01-01 to be rebased 
to the
+   * target calendar.
+   * @return The rebased day
+   */
+  private def rebaseDays(switchDays: Array[Int], diffs: Array[Int], days: 
Int): Int = {
+var i = switchDays.length - 1
+while (i >= 0 && days < switchDays(i)) {
+  i -= 1
+}
+val rebased = days + diffs(if (i < 0) 0 else i)

svn commit: r38753 - /dev/spark/v3.0.0-rc1-bin/

2020-03-31 Thread rxin
Author: rxin
Date: Tue Mar 31 07:25:15 2020
New Revision: 38753

Log:
retry

Removed:
dev/spark/v3.0.0-rc1-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: Revert "[SPARK-30879][DOCS] Refine workflow for building docs"

2020-03-31 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4d4c3e7  Revert "[SPARK-30879][DOCS] Refine workflow for building docs"
4d4c3e7 is described below

commit 4d4c3e76f6d1d5ede511c3ff4036b0c458a0a4e3
Author: HyukjinKwon 
AuthorDate: Tue Mar 31 16:11:59 2020 +0900

Revert "[SPARK-30879][DOCS] Refine workflow for building docs"

This reverts commit 7892f88f84acc8c061aaa3d2987f2c8b71e41963.
---
 .gitignore  |  2 --
 dev/create-release/do-release-docker.sh |  2 +-
 dev/create-release/spark-rm/Dockerfile  | 61 -
 docs/README.md  | 44 
 4 files changed, 37 insertions(+), 72 deletions(-)

diff --git a/.gitignore b/.gitignore
index 60a12e3..198fdee 100644
--- a/.gitignore
+++ b/.gitignore
@@ -18,8 +18,6 @@
 .idea_modules/
 .project
 .pydevproject
-.python-version
-.ruby-version
 .scala_dependencies
 .settings
 /lib/
diff --git a/dev/create-release/do-release-docker.sh 
b/dev/create-release/do-release-docker.sh
index cda21eb..694a87b 100755
--- a/dev/create-release/do-release-docker.sh
+++ b/dev/create-release/do-release-docker.sh
@@ -96,7 +96,7 @@ fcreate_secure "$GPG_KEY_FILE"
 $GPG --export-secret-key --armor "$GPG_KEY" > "$GPG_KEY_FILE"
 
 run_silent "Building spark-rm image with tag $IMGTAG..." "docker-build.log" \
-  docker build --no-cache -t "spark-rm:$IMGTAG" --build-arg UID=$UID 
"$SELF/spark-rm"
+  docker build -t "spark-rm:$IMGTAG" --build-arg UID=$UID "$SELF/spark-rm"
 
 # Write the release information to a file with environment variables to be 
used when running the
 # image.
diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index d310aaf..6345168 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -20,9 +20,9 @@
 # Includes:
 # * Java 8
 # * Ivy
-# * Python 3.7
-# * Ruby 2.7
+# * Python (2.7.15/3.6.7)
 # * R-base/R-base-dev (3.6.1)
+# * Ruby 2.3 build utilities
 
 FROM ubuntu:18.04
 
@@ -33,11 +33,15 @@ ENV DEBCONF_NONINTERACTIVE_SEEN true
 # These arguments are just for reuse and not really meant to be customized.
 ARG APT_INSTALL="apt-get install --no-install-recommends -y"
 
-ARG PIP_PKGS="sphinx==2.3.1 mkdocs==1.0.4 numpy==1.18.1"
-ARG GEM_PKGS="jekyll:4.0.0 jekyll-redirect-from:0.16.0 rouge:3.15.0"
+ARG BASE_PIP_PKGS="setuptools wheel"
+ARG PIP_PKGS="pyopenssl numpy sphinx"
 
 # Install extra needed repos and refresh.
 # - CRAN repo
+# - Ruby repo (for doc generation)
+#
+# This is all in a single "RUN" command so that if anything changes, "apt 
update" is run to fetch
+# the most current package versions (instead of potentially using old versions 
cached by docker).
 RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates && \
   echo 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/' >> 
/etc/apt/sources.list && \
   gpg --keyserver keyserver.ubuntu.com --recv-key 
E298A3A825C0D65DFD57CBB651716619E084DAB9 && \
@@ -46,43 +50,36 @@ RUN apt-get clean && apt-get update && $APT_INSTALL gnupg 
ca-certificates && \
   rm -rf /var/lib/apt/lists/* && \
   apt-get clean && \
   apt-get update && \
+  $APT_INSTALL software-properties-common && \
+  apt-add-repository -y ppa:brightbox/ruby-ng && \
+  apt-get update && \
   # Install openjdk 8.
   $APT_INSTALL openjdk-8-jdk && \
   update-alternatives --set java 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java && \
   # Install build / source control tools
   $APT_INSTALL curl wget git maven ivy subversion make gcc lsof libffi-dev \
-pandoc pandoc-citeproc libssl-dev libcurl4-openssl-dev libxml2-dev
-
-ENV PATH "$PATH:/root/.pyenv/bin:/root/.pyenv/shims"
-RUN curl -L 
https://github.com/pyenv/pyenv-installer/raw/dd3f7d0914c5b4a416ca71ffabdf2954f2021596/bin/pyenv-installer
 | bash
-RUN $APT_INSTALL libbz2-dev libreadline-dev libsqlite3-dev
-RUN pyenv install 3.7.6
-RUN pyenv global 3.7.6
-RUN python --version
-RUN pip install --upgrade pip
-RUN pip --version
-RUN pip install $PIP_PKGS
-
-ENV PATH "$PATH:/root/.rbenv/bin:/root/.rbenv/shims"
-RUN curl -fsSL 
https://github.com/rbenv/rbenv-installer/raw/108c12307621a0aa06f19799641848dde1987deb/bin/rbenv-installer
 | bash
-RUN rbenv install 2.7.0
-RUN rbenv global 2.7.0
-RUN ruby --version
-RUN $APT_INSTALL g++
-RUN gem --version
-RUN gem install --no-document $GEM_PKGS
-
-RUN \
+pandoc pandoc-citeproc libssl-dev libcurl4-openssl-dev libxml2-dev && \
   curl -sL https://deb.nodesource.com/setup_11.x | bash && \
-  $APT_INSTALL nodejs
-
-# Install R packages and dependencies used when building.
-# R depends on pandoc*, libssl (which are installed above).
-RUN \
+  $APT_INSTALL nodejs && \
+  # Install needed python packages. Use pip for installing packages (for 

[spark] branch branch-2.4 updated: [SPARK-31306][DOCS] update rand() function documentation to indicate exclusive upper bound

2020-03-31 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new e226f68  [SPARK-31306][DOCS] update rand() function documentation to 
indicate exclusive upper bound
e226f68 is described below

commit e226f687c172c63ce9ae6531772af9df124c9454
Author: Ben Ryves 
AuthorDate: Tue Mar 31 15:16:17 2020 +0900

[SPARK-31306][DOCS] update rand() function documentation to indicate 
exclusive upper bound

### What changes were proposed in this pull request?
A small documentation change to clarify that the `rand()` function produces 
values in `[0.0, 1.0)`.

### Why are the changes needed?
`rand()` uses `Rand()` - which generates values in [0, 1) ([documented 
here](https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71)).
 The existing documentation suggests that 1.0 is a possible value returned by 
rand (i.e for a distribution written as `X ~ U(a, b)`, x can be a or b, so 
`U[0.0, 1.0]` suggests the value returned could include 1.0).

### Does this PR introduce any user-facing change?
Only documentation changes.

### How was this patch tested?
Documentation changes only.

Closes #28071 from Smeb/master.

Authored-by: Ben Ryves 
Signed-off-by: HyukjinKwon 
---
 R/pkg/R/functions.R  | 2 +-
 python/pyspark/sql/functions.py  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index e914dd3..09b0a21 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -2614,7 +2614,7 @@ setMethod("lpad", signature(x = "Column", len = 
"numeric", pad = "character"),
 
 #' @details
 #' \code{rand}: Generates a random column with independent and identically 
distributed (i.i.d.)
-#' samples from U[0.0, 1.0].
+#' samples uniformly distributed in [0.0, 1.0).
 #' Note: the function is non-deterministic in general case.
 #'
 #' @rdname column_nonaggregate_functions
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index b964980..c305529 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -553,7 +553,7 @@ def nanvl(col1, col2):
 @since(1.4)
 def rand(seed=None):
 """Generates a random column with independent and identically distributed 
(i.i.d.) samples
-from U[0.0, 1.0].
+uniformly distributed in [0.0, 1.0).
 
 .. note:: The function is non-deterministic in general case.
 
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index f419a38..21ad1fd 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -1224,7 +1224,7 @@ object functions {
 
   /**
* Generate a random column with independent and identically distributed 
(i.i.d.) samples
-   * from U[0.0, 1.0].
+   * uniformly distributed in [0.0, 1.0).
*
* @note The function is non-deterministic in general case.
*
@@ -1235,7 +1235,7 @@ object functions {
 
   /**
* Generate a random column with independent and identically distributed 
(i.i.d.) samples
-   * from U[0.0, 1.0].
+   * uniformly distributed in [0.0, 1.0).
*
* @note The function is non-deterministic in general case.
*


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-31306][DOCS] update rand() function documentation to indicate exclusive upper bound

2020-03-31 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new e226f68  [SPARK-31306][DOCS] update rand() function documentation to 
indicate exclusive upper bound
e226f68 is described below

commit e226f687c172c63ce9ae6531772af9df124c9454
Author: Ben Ryves 
AuthorDate: Tue Mar 31 15:16:17 2020 +0900

[SPARK-31306][DOCS] update rand() function documentation to indicate 
exclusive upper bound

### What changes were proposed in this pull request?
A small documentation change to clarify that the `rand()` function produces 
values in `[0.0, 1.0)`.

### Why are the changes needed?
`rand()` uses `Rand()` - which generates values in [0, 1) ([documented 
here](https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71)).
 The existing documentation suggests that 1.0 is a possible value returned by 
rand (i.e for a distribution written as `X ~ U(a, b)`, x can be a or b, so 
`U[0.0, 1.0]` suggests the value returned could include 1.0).

### Does this PR introduce any user-facing change?
Only documentation changes.

### How was this patch tested?
Documentation changes only.

Closes #28071 from Smeb/master.

Authored-by: Ben Ryves 
Signed-off-by: HyukjinKwon 
---
 R/pkg/R/functions.R  | 2 +-
 python/pyspark/sql/functions.py  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index e914dd3..09b0a21 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -2614,7 +2614,7 @@ setMethod("lpad", signature(x = "Column", len = 
"numeric", pad = "character"),
 
 #' @details
 #' \code{rand}: Generates a random column with independent and identically 
distributed (i.i.d.)
-#' samples from U[0.0, 1.0].
+#' samples uniformly distributed in [0.0, 1.0).
 #' Note: the function is non-deterministic in general case.
 #'
 #' @rdname column_nonaggregate_functions
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index b964980..c305529 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -553,7 +553,7 @@ def nanvl(col1, col2):
 @since(1.4)
 def rand(seed=None):
 """Generates a random column with independent and identically distributed 
(i.i.d.) samples
-from U[0.0, 1.0].
+uniformly distributed in [0.0, 1.0).
 
 .. note:: The function is non-deterministic in general case.
 
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index f419a38..21ad1fd 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -1224,7 +1224,7 @@ object functions {
 
   /**
* Generate a random column with independent and identically distributed 
(i.i.d.) samples
-   * from U[0.0, 1.0].
+   * uniformly distributed in [0.0, 1.0).
*
* @note The function is non-deterministic in general case.
*
@@ -1235,7 +1235,7 @@ object functions {
 
   /**
* Generate a random column with independent and identically distributed 
(i.i.d.) samples
-   * from U[0.0, 1.0].
+   * uniformly distributed in [0.0, 1.0).
*
* @note The function is non-deterministic in general case.
*


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-31306][DOCS] update rand() function documentation to indicate exclusive upper bound

2020-03-31 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fa37856  [SPARK-31306][DOCS] update rand() function documentation to 
indicate exclusive upper bound
fa37856 is described below

commit fa378567105ec9d9bbe30edf4b74b09c3df27658
Author: Ben Ryves 
AuthorDate: Tue Mar 31 15:16:17 2020 +0900

[SPARK-31306][DOCS] update rand() function documentation to indicate 
exclusive upper bound

### What changes were proposed in this pull request?
A small documentation change to clarify that the `rand()` function produces 
values in `[0.0, 1.0)`.

### Why are the changes needed?
`rand()` uses `Rand()` - which generates values in [0, 1) ([documented 
here](https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71)).
 The existing documentation suggests that 1.0 is a possible value returned by 
rand (i.e for a distribution written as `X ~ U(a, b)`, x can be a or b, so 
`U[0.0, 1.0]` suggests the value returned could include 1.0).

### Does this PR introduce any user-facing change?
Only documentation changes.

### How was this patch tested?
Documentation changes only.

Closes #28071 from Smeb/master.

Authored-by: Ben Ryves 
Signed-off-by: HyukjinKwon 
---
 R/pkg/R/functions.R  | 2 +-
 python/pyspark/sql/functions.py  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index 3d30ce1..2baf3aa 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -2975,7 +2975,7 @@ setMethod("lpad", signature(x = "Column", len = 
"numeric", pad = "character"),
 
 #' @details
 #' \code{rand}: Generates a random column with independent and identically 
distributed (i.i.d.)
-#' samples from U[0.0, 1.0].
+#' samples uniformly distributed in [0.0, 1.0).
 #' Note: the function is non-deterministic in general case.
 #'
 #' @rdname column_nonaggregate_functions
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 4b51dc1..de0d38e 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -652,7 +652,7 @@ def percentile_approx(col, percentage, accuracy=1):
 @since(1.4)
 def rand(seed=None):
 """Generates a random column with independent and identically distributed 
(i.i.d.) samples
-from U[0.0, 1.0].
+uniformly distributed in [0.0, 1.0).
 
 .. note:: The function is non-deterministic in general case.
 
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index 1a0244f..8d8638d 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -1227,7 +1227,7 @@ object functions {
 
   /**
* Generate a random column with independent and identically distributed 
(i.i.d.) samples
-   * from U[0.0, 1.0].
+   * uniformly distributed in [0.0, 1.0).
*
* @note The function is non-deterministic in general case.
*
@@ -1238,7 +1238,7 @@ object functions {
 
   /**
* Generate a random column with independent and identically distributed 
(i.i.d.) samples
-   * from U[0.0, 1.0].
+   * uniformly distributed in [0.0, 1.0).
*
* @note The function is non-deterministic in general case.
*


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31306][DOCS] update rand() function documentation to indicate exclusive upper bound

2020-03-31 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 1caca7d  [SPARK-31306][DOCS] update rand() function documentation to 
indicate exclusive upper bound
1caca7d is described below

commit 1caca7d97a03ab9ac99597e1ef9fa3890da90743
Author: Ben Ryves 
AuthorDate: Tue Mar 31 15:16:17 2020 +0900

[SPARK-31306][DOCS] update rand() function documentation to indicate 
exclusive upper bound

### What changes were proposed in this pull request?
A small documentation change to clarify that the `rand()` function produces 
values in `[0.0, 1.0)`.

### Why are the changes needed?
`rand()` uses `Rand()` - which generates values in [0, 1) ([documented 
here](https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71)).
 The existing documentation suggests that 1.0 is a possible value returned by 
rand (i.e for a distribution written as `X ~ U(a, b)`, x can be a or b, so 
`U[0.0, 1.0]` suggests the value returned could include 1.0).

### Does this PR introduce any user-facing change?
Only documentation changes.

### How was this patch tested?
Documentation changes only.

Closes #28071 from Smeb/master.

Authored-by: Ben Ryves 
Signed-off-by: HyukjinKwon 
(cherry picked from commit fa378567105ec9d9bbe30edf4b74b09c3df27658)
Signed-off-by: HyukjinKwon 
---
 R/pkg/R/functions.R  | 2 +-
 python/pyspark/sql/functions.py  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index d8b0450..173dbc4 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -2888,7 +2888,7 @@ setMethod("lpad", signature(x = "Column", len = 
"numeric", pad = "character"),
 
 #' @details
 #' \code{rand}: Generates a random column with independent and identically 
distributed (i.i.d.)
-#' samples from U[0.0, 1.0].
+#' samples uniformly distributed in [0.0, 1.0).
 #' Note: the function is non-deterministic in general case.
 #'
 #' @rdname column_nonaggregate_functions
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 1ade21c..476aab4 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -599,7 +599,7 @@ def nanvl(col1, col2):
 @since(1.4)
 def rand(seed=None):
 """Generates a random column with independent and identically distributed 
(i.i.d.) samples
-from U[0.0, 1.0].
+uniformly distributed in [0.0, 1.0).
 
 .. note:: The function is non-deterministic in general case.
 
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index 8a89a3b..fd4e77f 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -1204,7 +1204,7 @@ object functions {
 
   /**
* Generate a random column with independent and identically distributed 
(i.i.d.) samples
-   * from U[0.0, 1.0].
+   * uniformly distributed in [0.0, 1.0).
*
* @note The function is non-deterministic in general case.
*
@@ -1215,7 +1215,7 @@ object functions {
 
   /**
* Generate a random column with independent and identically distributed 
(i.i.d.) samples
-   * from U[0.0, 1.0].
+   * uniformly distributed in [0.0, 1.0).
*
* @note The function is non-deterministic in general case.
*


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org