[spark] branch branch-3.3 updated: [MINOR][TEST][SQL] Add a CTE subquery scope test case

2022-12-23 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new aa39b06462a [MINOR][TEST][SQL] Add a CTE subquery scope test case
aa39b06462a is described below

commit aa39b06462a98f37be59e239d12edd9f09a25b88
Author: Reynold Xin 
AuthorDate: Fri Dec 23 14:55:14 2022 -0800

[MINOR][TEST][SQL] Add a CTE subquery scope test case

### What changes were proposed in this pull request?
I noticed we were missing a test case for this in SQL tests, so I added one.

### Why are the changes needed?
To ensure we scope CTEs properly in subqueries.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
This is a test case change.

Closes #39189 from rxin/cte_test.

Authored-by: Reynold Xin 
Signed-off-by: Reynold Xin 
(cherry picked from commit 24edf8ecb5e47af294f89552dfd9957a2d9f193b)
Signed-off-by: Reynold Xin 
---
 .../test/resources/sql-tests/inputs/cte-nested.sql | 10 
 .../resources/sql-tests/results/cte-legacy.sql.out | 28 ++
 .../resources/sql-tests/results/cte-nested.sql.out | 28 ++
 .../sql-tests/results/cte-nonlegacy.sql.out| 28 ++
 4 files changed, 94 insertions(+)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql 
b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
index 5f12388b9cb..e5ef2443417 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
@@ -17,6 +17,16 @@ SELECT (
   SELECT * FROM t
 );
 
+-- Make sure CTE in subquery is scoped to that subquery rather than global
+-- the 2nd half of the union should fail because the cte is scoped to the 
first half
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte;
+
 -- CTE in CTE definition shadows outer
 WITH
   t AS (SELECT 1),
diff --git a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out 
b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out
index 264b64ffe96..ebdd64c3ac8 100644
--- a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out
@@ -36,6 +36,34 @@ struct
 1
 
 
+-- !query
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "TABLE_OR_VIEW_NOT_FOUND",
+  "sqlState" : "42000",
+  "messageParameters" : {
+"relationName" : "`cte`"
+  },
+  "queryContext" : [ {
+"objectType" : "",
+"objectName" : "",
+"startIndex" : 120,
+"stopIndex" : 122,
+"fragment" : "cte"
+  } ]
+}
+
+
 -- !query
 WITH
   t AS (SELECT 1),
diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out 
b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out
index 2c622de3f36..b6e1793f7d7 100644
--- a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out
@@ -36,6 +36,34 @@ struct
 1
 
 
+-- !query
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "TABLE_OR_VIEW_NOT_FOUND",
+  "sqlState" : "42000",
+  "messageParameters" : {
+"relationName" : "`cte`"
+  },
+  "queryContext" : [ {
+"objectType" : "",
+"objectName" : "",
+"startIndex" : 120,
+"stopIndex" : 122,
+"fragment" : "cte"
+  } ]
+}
+
+
 -- !query
 WITH
   t AS (SELECT 1),
diff --git 
a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out 
b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out
index 283f5a54a42..546ab7ecb95 100644
--- a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out
@@ -36,6 +36,34 @@ struct
 1
 
 
+-- !query
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisE

[spark] branch master updated: [MINOR][TEST][SQL] Add a CTE subquery scope test case

2022-12-23 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 24edf8ecb5e [MINOR][TEST][SQL] Add a CTE subquery scope test case
24edf8ecb5e is described below

commit 24edf8ecb5e47af294f89552dfd9957a2d9f193b
Author: Reynold Xin 
AuthorDate: Fri Dec 23 14:55:14 2022 -0800

[MINOR][TEST][SQL] Add a CTE subquery scope test case

### What changes were proposed in this pull request?
I noticed we were missing a test case for this in SQL tests, so I added one.

### Why are the changes needed?
To ensure we scope CTEs properly in subqueries.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
This is a test case change.

Closes #39189 from rxin/cte_test.

Authored-by: Reynold Xin 
Signed-off-by: Reynold Xin 
---
 .../test/resources/sql-tests/inputs/cte-nested.sql | 10 
 .../resources/sql-tests/results/cte-legacy.sql.out | 28 ++
 .../resources/sql-tests/results/cte-nested.sql.out | 28 ++
 .../sql-tests/results/cte-nonlegacy.sql.out| 28 ++
 4 files changed, 94 insertions(+)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql 
b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
index 5f12388b9cb..e5ef2443417 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
@@ -17,6 +17,16 @@ SELECT (
   SELECT * FROM t
 );
 
+-- Make sure CTE in subquery is scoped to that subquery rather than global
+-- the 2nd half of the union should fail because the cte is scoped to the 
first half
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte;
+
 -- CTE in CTE definition shadows outer
 WITH
   t AS (SELECT 1),
diff --git a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out 
b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out
index 013c5f27b50..65000471c75 100644
--- a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out
@@ -33,6 +33,34 @@ struct
 1
 
 
+-- !query
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "TABLE_OR_VIEW_NOT_FOUND",
+  "sqlState" : "42000",
+  "messageParameters" : {
+"relationName" : "`cte`"
+  },
+  "queryContext" : [ {
+"objectType" : "",
+"objectName" : "",
+"startIndex" : 120,
+"stopIndex" : 122,
+"fragment" : "cte"
+  } ]
+}
+
+
 -- !query
 WITH
   t AS (SELECT 1),
diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out 
b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out
index ed6d69b233e..2c67f2db56a 100644
--- a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out
@@ -33,6 +33,34 @@ struct
 1
 
 
+-- !query
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "TABLE_OR_VIEW_NOT_FOUND",
+  "sqlState" : "42000",
+  "messageParameters" : {
+"relationName" : "`cte`"
+  },
+  "queryContext" : [ {
+"objectType" : "",
+"objectName" : "",
+"startIndex" : 120,
+"stopIndex" : 122,
+"fragment" : "cte"
+  } ]
+}
+
+
 -- !query
 WITH
   t AS (SELECT 1),
diff --git 
a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out 
b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out
index 6a48e1bec43..154ebd20223 100644
--- a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out
@@ -33,6 +33,34 @@ struct
 1
 
 
+-- !query
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "TABLE_OR_VIEW_NOT_FOUND",
+  "sqlState" : "42000",
+  "messageParameters" : {
+  

svn commit: r46414 - /dev/spark/v3.1.1-rc3-bin/ /release/spark/spark-3.1.1/

2021-03-02 Thread rxin
Author: rxin
Date: Tue Mar  2 11:00:12 2021
New Revision: 46414

Log:
Moving Apache Spark 3.1.1 RC3 to Apache Spark 3.1.1

Added:
release/spark/spark-3.1.1/
  - copied from r46413, dev/spark/v3.1.1-rc3-bin/
Removed:
dev/spark/v3.1.1-rc3-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r46413 - in /dev/spark: v3.1.1-rc3-bin/ v3.1.1-rc3-docs/

2021-03-02 Thread rxin
Author: rxin
Date: Tue Mar  2 10:55:39 2021
New Revision: 46413

Log:
Recover 3.1.1 RC3

Added:
dev/spark/v3.1.1-rc3-bin/
  - copied from r46410, dev/spark/v3.1.1-rc3-bin/
dev/spark/v3.1.1-rc3-docs/
  - copied from r46410, dev/spark/v3.1.1-rc3-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r46411 - in /dev/spark: v3.1.1-rc3-bin/ v3.1.1-rc3-docs/

2021-03-02 Thread rxin
Author: rxin
Date: Tue Mar  2 10:39:38 2021
New Revision: 46411

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.1.1-rc3-bin/
dev/spark/v3.1.1-rc3-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r46412 - in /dev/spark: v3.1.0-rc1-bin/ v3.1.0-rc1-docs/

2021-03-02 Thread rxin
Author: rxin
Date: Tue Mar  2 10:39:58 2021
New Revision: 46412

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.1.0-rc1-bin/
dev/spark/v3.1.0-rc1-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r46410 - in /dev/spark: v3.1.1-rc2-bin/ v3.1.1-rc2-docs/

2021-03-02 Thread rxin
Author: rxin
Date: Tue Mar  2 10:39:32 2021
New Revision: 46410

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.1.1-rc2-bin/
dev/spark/v3.1.1-rc2-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r46409 - in /dev/spark: v3.1.1-rc1-bin/ v3.1.1-rc1-docs/

2021-03-02 Thread rxin
Author: rxin
Date: Tue Mar  2 10:39:25 2021
New Revision: 46409

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.1.1-rc1-bin/
dev/spark/v3.1.1-rc1-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r40088 - in /dev/spark: v3.0.0-rc1-bin/ v3.0.0-rc1-docs/ v3.0.0-rc2-bin/ v3.0.0-rc2-docs/ v3.0.0-rc3-docs/

2020-06-18 Thread rxin
Author: rxin
Date: Thu Jun 18 16:41:27 2020
New Revision: 40088

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.0.0-rc1-bin/
dev/spark/v3.0.0-rc1-docs/
dev/spark/v3.0.0-rc2-bin/
dev/spark/v3.0.0-rc2-docs/
dev/spark/v3.0.0-rc3-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r40050 - /dev/spark/v3.0.0-rc3-bin/ /release/spark/spark-3.0.0/

2020-06-16 Thread rxin
Author: rxin
Date: Tue Jun 16 09:18:02 2020
New Revision: 40050

Log:
release 3.0.0

Added:
release/spark/spark-3.0.0/
  - copied from r40049, dev/spark/v3.0.0-rc3-bin/
Removed:
dev/spark/v3.0.0-rc3-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] tag v3.0.0 created (now 3fdfce3)

2020-06-14 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to tag v3.0.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 3fdfce3  (commit)
No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r39960 - in /dev/spark/v3.0.0-rc3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2020-06-06 Thread rxin
Author: rxin
Date: Sat Jun  6 14:03:25 2020
New Revision: 39960

Log:
Apache Spark v3.0.0-rc3 docs


[This commit notification would consist of 1920 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r39959 - /dev/spark/v3.0.0-rc3-bin/

2020-06-06 Thread rxin
Author: rxin
Date: Sat Jun  6 13:35:40 2020
New Revision: 39959

Log:
Apache Spark v3.0.0-rc3

Added:
dev/spark/v3.0.0-rc3-bin/
dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz   (with 
props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.sha512

Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc Sat Jun  6 13:35:40 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7bh3gQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZjGIEACG3gsdARN8puRHS2YL+brOmjbrS4wVY/Av
+l+ZR59moZ7QuwjYoixyqNnztIKgIyleYJq9DL5TqqMxFgGpuoDrnuWVqI+8MngVA
+gau/QDmYINabZsJxFfDn1IjxxSQBsgf6pwfqQbB+fGSjLSPnDq+u3DIWr3fRMh4X
+DrTuATNewKiiBIwQHUKAtPMAbsdDvXv0DRL7CGTiIJri43opAntQzHec3sP9hgRU
+J5J2HnjOlamgv58S7zrUw/Wo1xPLmz2PGIsP0aq9DRRw0bLnesrtEaWAKFp2HL5E
+QlbjfboaDQz/X+meruW57/sO/DDwA90/XvF44z4Gu6kbS8nRuTsU5wVfZ/1iyWZk
+PLP2nFoWl7O85k/DLB5ADYgce3e6k2qD2obKxzsEx0nr0Wu13cxCR2+IBQmv05jb
+4Kwi7iE0iKIxt3cESDH6j9GqZoTrcxt6Jb88KSQ+YM2TBNUr1ZZNmkjgYdmLvm7a
+wH6vLtdpZzUKIGd6bt1grEwoQJBMnQjkoDYxhx+ugjbs8CwwxcdUNd2Q5xz0WaSn
+p443ZlMR5lbGf6D6U4PUigaIrdD8d+ef/rRTDtXdoDqC+FdNuepyS9+2+dUZGErx
+N2IMNunKIdKw57GZGcILey1hY45SSuQFw5JAe+nWqCAzCmFX72ulkv9The7rLdlE
+YdLu6XQIBA==
+=HhHH
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512
==
--- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 Sat Jun  6 13:35:40 2020
@@ -0,0 +1,3 @@
+SparkR_3.0.0.tar.gz: 394DCFEB 4E202A8E 5C58BF94 A77548FD 79A00F92 34538535
+ B0242E1B 96068E3E 80F78188 D71831F8 4A350224 41AA14B1
+ D72AE704 F2390842 DBEAB41F 5AC9859A

Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc Sat Jun  6 13:35:40 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7bh3oQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZvhPD/9Vyrywk4kYFaUo36Kg6nickTvWMm+yDfGZ
+tKrUo3dAoMta7OwUjT6W0Roo8a4BBgumaDv47Dm6mlquF2DuLuBrFsqFo8c5VNA/
+jT1tdSdHiTzjq7LfY9GQDn8Wkgp1gyIKON70XFdZifduW0gcFDkJ+FjhPYWcA6jy
+GGOGK5qboCdi9C+KowUVj4VB9bbxPbWvW7FVF3+VlcrKvkmNx+EmqmIrqsh72w8O
+EL70za2uBRUUiFcaOpY/wpmEN1raCAkMzQ+dPl7p1PFgmLFrMN9RaRXJ1stF+fXO
+rDLBLNPqb85TvvOOHpcr4PSP38GrdZvDAvljCOEbBzacF719bewu/IVRcNi9lPZE
+HDPUcZLgnocNIF6kafykrm3JhagzmPIhQ8d4DFTuH6ePxgWqdUa9lWKQL54z3mjU
+LT2CJ8gMDY0Wz5zSKc/sI/ZwL+Q6U8xiIGYSzQgT9yPztbhDd5AM2DgohJkZSD4b
+jOrEsSyNRJiwwRAHlbeOOVPb4UNYzsx1USPbPEBeXTt8X8VUb8jsU84o/RhXexk9
+EMJjxz/aChB+NefbmUjBZmXSaa/zYubprJrWnUgPw7hFxAnmtgIUdjSWSNIOJ6bp
+EV1M6xwuvrmGhOa3D0C+lYyAuYZca2FQrcAtzNiL6iOMQ6USFZvzjxGWQiV2CDGQ
+O8CNfkwOGA

svn commit: r39958 - /dev/spark/v3.0.0-rc3-bin/

2020-06-06 Thread rxin
Author: rxin
Date: Sat Jun  6 11:18:32 2020
New Revision: 39958

Log:
remove 3.0 rc3 binary

Removed:
dev/spark/v3.0.0-rc3-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (fa608b9 -> 3ea461d)

2020-06-05 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fa608b9  [SPARK-31904][SQL] Fix case sensitive problem of char and 
varchar partition columns
 add 3fdfce3  Preparing Spark release v3.0.0-rc3
 new 3ea461d  Preparing development version 3.0.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing development version 3.0.1-SNAPSHOT

2020-06-05 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 3ea461d61e635835c07bacb5a0c403ae2a3099a0
Author: Reynold Xin 
AuthorDate: Sat Jun 6 02:57:41 2020 +

Preparing development version 3.0.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 3bad429..21f3eaa 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.0.0
+Version: 3.0.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 0a52a00..8bef9d8 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index fa4fcb1f..fc1441d 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 14a1b7d..de2a6fb 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index e75a843..6c0c016 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 004af0a..b8df191 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index a35156a..8119709 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --g

[spark] 01/01: Preparing Spark release v3.0.0-rc3

2020-06-05 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to tag v3.0.0-rc3
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 3fdfce3120f307147244e5eaf46d61419a723d50
Author: Reynold Xin 
AuthorDate: Sat Jun 6 02:57:35 2020 +

Preparing Spark release v3.0.0-rc3
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 21f3eaa..3bad429 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.0.1
+Version: 3.0.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 8bef9d8..0a52a00 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index fc1441d..fa4fcb1f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index de2a6fb..14a1b7d 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 6c0c016..e75a843 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index b8df191..004af0a 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 8119709..a35156a 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/ta

[spark] tag v3.0.0-rc3 created (now 3fdfce3)

2020-06-05 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to tag v3.0.0-rc3
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 3fdfce3  (commit)
This tag includes the following new commits:

 new 3fdfce3  Preparing Spark release v3.0.0-rc3

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r39951 - /dev/spark/v3.0.0-rc3-bin/

2020-06-05 Thread rxin
Author: rxin
Date: Fri Jun  5 19:08:09 2020
New Revision: 39951

Log:
Apache Spark v3.0.0-rc3

Added:
dev/spark/v3.0.0-rc3-bin/
dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz   (with 
props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.sha512

Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc Fri Jun  5 19:08:09 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7ag4gQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZpBZD/9vSiD946kwdMWalYM01Zw2yjKK60eakhLY
+jxHRy1T6Yipspyh2idCrzd2MaGJFqUwRZjs1mpA/mKZUGRSzYFjlWWoaSc/T19MD
+3q/zg6glgoKquzxHcAqum/OCc1C1MJTcsMic2+LIelXRoJ2GPCeECq91JGX4xpD4
+09sDElvooqfMCLb05gaaF8Eyrpm+7WSyAEVpb1Fjpp/gtdG1YQyiW3o3WzNSJgeA
+dewZaSoI58lx3Rfs1jZN1M4Gyj1aKh4Yqw21+CDoHAhtkeOp5oGPgrWef4fZAE4D
+4xKoz1I/5C1s0wIZEhUI2IUJLeGyCR117QhIO/bQFR1XEOO22auQaPppGJKUa5bb
+bwpx6TARNP13fe2R48G+yZ9Em0uC3P1CucGYCRlY22umzkbalrVFeZ77n/FWRB7E
+nC29bso/R2VwmDRI6yWXiCPLMyQy/PukniWRJZiU7Ath1930cORAlqFC7EOBHgHu
+k3AVX/3h2qZBFuYu/wIsd89rgeiwrf4fksiuMhp8YXJh3xCLLSl4uT+q3flutJ3H
+nsOLYkuie/r4qx+M2J7rfezTzTeYr+SN8mn4CTsGRznHhb0amqlZE6yNFWVatr6D
+LEYWe9L3DK92Kj0Jtl5QyPXQlKSoBQriketgZXKxzeBScKeFd6acGxOhM5LpZRCo
+ngKbsgfcoQ==
+=bwFz
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512
==
--- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 Fri Jun  5 19:08:09 2020
@@ -0,0 +1,3 @@
+SparkR_3.0.0.tar.gz: 37496F1A C5BD0DFF 0F6B08B9 05CB55B7 DAA6397A 8C377126
+ C6887AEB CB05F172 0E4A9754 9ED4B6B4 68E9266A 6459229F
+ 48D58F7C 9C0A58B1 183CC6D0 A18ACE18

Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc Fri Jun  5 19:08:09 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7ag4kQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZlwHD/9tPwfyzwQkl6qkYp27AgZexy5k15gjJ/Bi
+MWWwv3bMhJiRlZN3hCyGC0QTTkRG+AJTd3SflbUhHzw9ttFAnt3VqZ7RZBB4UBDI
+5W85jUaF5bOMu7K4hW2iZdcLLLbq7/sXNNqRhomQStL4j6TerZjgP8IytCGEmLX4
+Qt894N7+MunZxbPXKkUqZfO0cWlxY53+zNGqXKJdwDhQUrrH0i+2fs3gd97OJs42
+83l+pE27C7+aTr6fSRWIS55nw9GzKrDOr0N47wtfCs0mqIW+dI+cVjZh8W/Gf9Dl
+EifAsLIpahNRpQLu0PqiWrsJ3meertha4DLWRPS0esYyZAGFK+DjD9Zm1cOovA9v
+ywjQVWCkmaqaozvm2RTKxwvS7kkBB2dJPUJJ8YeCBr0A7wHBAIeA0vvWe9q7u0KW
+O78uGswTF4EKz85ZMhuo8IjdjKjzTumzdFws4akeTzv60t+439zFdyhUghfQ71om
+biS1Fgopz1QLqCb3eaqhMBM0ZB4JVMTtMKb2/gqH/8qaQq91CEkLTpOOsRK+xdeg
+A8XoFCWEsBbHzLT3Y3FKsHC7ipo2FYXCcn/n/67bRuFFBwhLZzOyEISH72nKIk4k
+YOU5wZnsykG2oiV3ZysRlYewtU0mIIuUINrMVRZB69CUk9Q2fnDyuT02OEGIoNZC
+LohvgOFbqQ

svn commit: r39657 - in /dev/spark/v3.0.0-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2020-05-18 Thread rxin
Author: rxin
Date: Mon May 18 16:11:38 2020
New Revision: 39657

Log:
Apache Spark v3.0.0-rc2 docs


[This commit notification would consist of 1921 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r39656 - /dev/spark/v3.0.0-rc2-bin/

2020-05-18 Thread rxin
Author: rxin
Date: Mon May 18 15:42:56 2020
New Revision: 39656

Log:
Apache Spark v3.0.0-rc2

Added:
dev/spark/v3.0.0-rc2-bin/
dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz   (with 
props)
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-without-hadoop.tgz.asc
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-rc2-bin/spark-3.0.0.tgz   (with props)
dev/spark/v3.0.0-rc2-bin/spark-3.0.0.tgz.asc
dev/spark/v3.0.0-rc2-bin/spark-3.0.0.tgz.sha512

Added: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc Mon May 18 15:42:56 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7CmHgQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZllrEACaCgpeO1qK4uJLQC00J1iU2970iVn9Aqh/
+gZnikK7mBClXekg2Q8+poAhueXS1XfGoJfOCwTeOp8iMvD0BcLhIxftKBg7CxmOa
+yKrtL/dehNyYMTWofxluZzolPR4O0DDNva2W6ExKPhrUAAOTPjPkMx9ty0C57IqO
+Pwblsr6iI3BWrmRdN2Dpfo+enxJ1rd6H/0kYCmXEFgyW8lBbGiN23KrjkriZOJxo
+6Ad8zFIEI+rSmmgvy6lkXdlJFduCmRFFZguRtWq48rYEY3pu6geIUetPMsosBnDW
+mb5ywNMuqZomeEes1JoWp96E65K3HUO8LxPrP3wJY9TfUGduAAwwBX8nGsa0r+mz
+JJq2f4zwvINM2eQGXIfcpg21K3ijqdkqylAKuBGiil5QcHABGQIQ6N1M+1ruKjKp
+zHeXh6tac2IM3dvpyh12mC7ZhKPBAC1sUZD8qzvB6sjaHgvv3uSUc2xTW7kzs8l2
+mwNT8SmCscR6+PAm29dY6CoRtVtDEygt+oOMhRkturaDQ9vtYgduKo+p6PiqffUE
+7SUKwk7a3Cqe46uxHabHdi+6NedFuX7/bPSAX51Q4MpeHC8l4HpgHDPodtfRcEQm
+VDSeLBfhs3WHi+OrqZ2et/EYaGFxiZTTi2PfpeMBPmC4d4k+yymZEenJcXVps7+G
+fFFeOvCfyQ==
+=2zdl
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512
==
--- dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512 Mon May 18 15:42:56 2020
@@ -0,0 +1,3 @@
+SparkR_3.0.0.tar.gz: B50B8062 8C2158C5 5931EB47 275FB32D 52EFF715 F3B39524
+ 29C03A21 583459D5 32EC2135 D27AB970 0F345B7A 620E4281
+ 950CC383 58231D1D BB08817C 4EDC6A05

Added: dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc Mon May 18 15:42:56 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7CmHoQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9Zn7ED/9Ujdr6jmTAFbtJtJiaDCevVGDhoND+9wca
+4MEaUYecgrYWSx12YBZe+d4nIbTuVWK6X29C76E/wbwREWFqG1fA17P7ZpBh8x3W
+xHSfzyYAP6G63I6IC+7jiHkOIOYBScGKj9h6z5j39eqt05HGAv088YEeTMpAC32B
+GbACEglWGgrE3JsrKXf77hIU8AizcE6rhS5OapqWdxFoqTHbxgjg3uJjsxVKsMXG
+wchOtedVfcDZihoqrPoO+pwjP8LIt+iv53luaUJowosC8K62OcjL1ay9Gw4a8KMQ
+9pEr9HgjAj9abel0q+ic4reLcCh+bjFSBzXR8/uJHjmSsWHNlwyXJq5Ymff7T2xJ
+s75vYuHI9bcOqqb2X1r5TY6v34p13PzKuzL7Y5la1ZCPo0nXjCne5NcSTxu9sQY5
+jl9BsVwWONGSZHsNlW6dy3XeXRaAFAPDCHJvqEsP8cgxMd9ryLG2niITVBGrs3jV
+Q3ylNTsM5G7/As6PR5hYYmTqCBBXJWizJmENMJq0zXinNe83ycWmKikACUXtBDlO
+qfRr3op3DAxdcNWbfCG7l9Ifoyr6w7HYDHEA6mMSsZ0MSSaiWcnhBc4ul5P4JUN8
+1p9/4o2WV6lfT2c6VmCfx4W4d5w3pgEVRHakvGzXE59datTZs1AQREG9G87jEd7R
+wv/RT1q+dA

[spark] branch branch-3.0 updated (740da34 -> f6053b9)

2020-05-18 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 740da34  [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern 
letters
 add 29853ec  Preparing Spark release v3.0.0-rc2
 new f6053b9  Preparing development version 3.0.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v3.0.0-rc2

2020-05-18 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to tag v3.0.0-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 29853eca69bceefd227cbe8421a09c116b7b753a
Author: Reynold Xin 
AuthorDate: Mon May 18 13:21:37 2020 +

Preparing Spark release v3.0.0-rc2
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 21f3eaa..3bad429 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.0.1
+Version: 3.0.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 8bef9d8..0a52a00 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index fc1441d..fa4fcb1f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index de2a6fb..14a1b7d 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 6c0c016..e75a843 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index b8df191..004af0a 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 8119709..a35156a 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/ta

[spark] tag v3.0.0-rc2 created (now 29853ec)

2020-05-18 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to tag v3.0.0-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 29853ec  (commit)
This tag includes the following new commits:

 new 29853ec  Preparing Spark release v3.0.0-rc2

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing development version 3.0.1-SNAPSHOT

2020-05-18 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit f6053b94f874c62856baa7bfa35df14c78bebc9f
Author: Reynold Xin 
AuthorDate: Mon May 18 13:21:43 2020 +

Preparing development version 3.0.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 3bad429..21f3eaa 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.0.0
+Version: 3.0.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 0a52a00..8bef9d8 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index fa4fcb1f..fc1441d 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 14a1b7d..de2a6fb 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index e75a843..6c0c016 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 004af0a..b8df191 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index a35156a..8119709 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --g

svn commit: r38759 - in /dev/spark/v3.0.0-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2020-03-31 Thread rxin
Author: rxin
Date: Tue Mar 31 13:45:27 2020
New Revision: 38759

Log:
Apache Spark v3.0.0-rc1 docs


[This commit notification would consist of 1911 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r38754 - /dev/spark/v3.0.0-rc1-bin/

2020-03-31 Thread rxin
Author: rxin
Date: Tue Mar 31 09:57:10 2020
New Revision: 38754

Log:
Apache Spark v3.0.0-rc1

Added:
dev/spark/v3.0.0-rc1-bin/
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz   (with 
props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.sha512

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc Tue Mar 31 09:57:10 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6C/0sQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZtCiD/9GtNXfxGR9oh2B4k+fg38uCrloGUYo3Dx9
+eJU6G55fbKtXK24dKlxZQCVDpwLihycnLULcV+/D75vWa4tSoG6n/FTHimCnUJWQ
+UkEsxqhWuGi25rUx4VsOQeHPYIP9/2pVGVyanFzRp+yAyldATGG36u3Xv5lqox6b
+6pARVwC6FZWKuk1b47xbRfYKUoNTkObhGjcKKyigexqx/nZOp99NP+sVlEqRD/l/
+B7l3kgAVq3XlZKUCkMhWgAHT6rPNkvwBdYZFce9gJHuG75Zw5rQ2hHesEqDOVlC1
+kqJPtpmb2U93ItBF6ArlmXcm+60rLa++B8cyrEsKLIyYxRpHH1bQmLB9TTzDeFpz
+e+WWlUiDpC1Lorzvg+44MeOXSj9EhNgqsYypGKhlh6WTN8A+BRzvJRMpDMLElRz6
+lHaceqn9NC4eE5tzcyXAFL+8Y644nCTIZQuND72LvIv7rO0YXq/6yeudM+SDeANU
+vscR4LiQ7/a3oSpxoIuA0MjKz6gWUaYFgsb8OuUC4VQPJKQZG+57SOazq1VTlB6/
+Ur8pePIUxU52EmzmIp08ws8v+NOo9pMxw7lyBwpmGX0/ax6p9v1xVcCeXqH4HYvA
+9d7a7hZy9yoguAGsVkibSym8e6XITCDoXLb9/HPEhfdyxFgi87DVjKZ84HkyFw9/
+OzHhumSp/Q==
+=zl/N
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512
==
--- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 Tue Mar 31 09:57:10 2020
@@ -0,0 +1,3 @@
+SparkR_3.0.0.tar.gz: C2D9C0A5 E71C5B56 48AC15AA 998ABD06 2FDB4D5C D2B7C344
+ B1949A7B 28508364 A9A45767 F2642F17 7EBFF4B0 55823EBD
+ BE76A2CE 5604660F 62D1654D 8271287B

Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc Tue Mar 31 09:57:10 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6C/0wQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZkfTD/4zQ5FuCr+giluZHaBnaZy7PAtSkoTjAWKX
+8zObXESsoTlIIjHEpBUmUU6O0tZODFOF7Zau9HkftroGurYxpTWE5nX0e//71JuC
+smBWLCgAeOlNEdeZUd2zm7pPWJfwRpsOcEfexb+RvaFQriw559Erxb5NoWHFIkg/
+tsjtjitMqLxcMlzZW7A/89zqmrnzBu1vhh/q8STzA0Ub6Jq+JzD4e6yatYAzjRj3
++Um7+NL+g/2tmweH8f9TtYzQFcowm6DdXi53fWZX55oVc1xBRTNuSnAdCJlkgEPg
+nUxEcuXUvHn/NbNNHPBwP6xMKyKqJu8+4vNLzr2ZxaxArPYF2FqTl8sFNxwVBM1Y
+PnKun7iZiLq5JqC2OopiDa8FJP0JQkYVyBWAx3BOscsAELfdlZHlPdekcLE6YHHV
+pde79YJ0tzUFIdH/Ulw4Jag4Ixunrg+ajmLS8n9ncpX0I81Zv8IJDaBf0cBboFw8
+kTqAvNkcsoGdRn1OiQnlE2IUib/R0fk7MktOyoZpfKzbCzxBZgLTO4FKTbRCydQX
+I8UhuRhELHCI7YXJHwbk0Swp6+h36dUQtLxFfD/OZdDQABOK+nEVjNsBIHb7ULDB
+pCckj8HBHwaynvNLogS1KJHThW8LEXAmVQFCD39XTNMnhfCUePyzlAC4RPByIFR4
+yD6VQ7bJDA

svn commit: r38753 - /dev/spark/v3.0.0-rc1-bin/

2020-03-31 Thread rxin
Author: rxin
Date: Tue Mar 31 07:25:15 2020
New Revision: 38753

Log:
retry

Removed:
dev/spark/v3.0.0-rc1-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r38740 - /dev/spark/v3.0.0-rc1-bin/

2020-03-30 Thread rxin
Author: rxin
Date: Mon Mar 30 16:00:46 2020
New Revision: 38740

Log:
Apache Spark v3.0.0-rc1

Added:
dev/spark/v3.0.0-rc1-bin/
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz   (with 
props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.sha512

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc Mon Mar 30 16:00:46 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6CCPMQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9Zr8LD/9WOO4mDufkmhhXk78zWAyhRjJpG0Kjuvla
+KEnx8MK4MUtr77cQsmVLgj+FXFwmUvtZTZXHJX704Jk6xAAFXzii4EwIfk46wka0
+CY0arEleHJ6MBohLbOVW3sp86LduQBBd+dmBbIh7spJjd054RRqsAe8sVx0uqezD
+y4Fv+LM0B7kQhHdhsYymVClAwgwKOwecdks0l9PonE9YwyJixMEOZwxxk4aaRNwR
+VUH6X4mHlpWiQ+zHWTAmE7aOvjOwxQqciqtmgzLLRlDjuTtz160XLthUneoOVoDw
+spphs7pMpj8r4T9BZQCeIiuRvE5VeT6037Uz03X56xhzEvna9+0/frHR/Vb88gW8
+U5YJio4p8h286vLwb0X48K7lyfd60VM0kyfh31xl1ZppdAFXhV9qA7435wn6R4NU
+1zi/oXnHOgAWW037C+QFXpPnKzCY3BpmLw3uAGMgYRA+2NqrAT2HE8vmnlxJkrBS
+JT3OlJCCkIw2yitPN5zZaWZLpbvT07wFEH8KFoh7Wgs4FBl1mDeyGT53RhbSHjy1
++i85E6g9366CZNoD3bSUlPlY9iOtP4QK4Qp+VOn1j13Bu3BE9Fpuprani1ESsGME
+16qzwf5It3TVWK9czXqa8HBJvlrjaEInloWThmSysYFweKIRT+8CEu9+KyakTKVL
+fnGKXfbXzQ==
+=0ZBt
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512
==
--- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 Mon Mar 30 16:00:46 2020
@@ -0,0 +1,3 @@
+SparkR_3.0.0.tar.gz: A4828C8D BA3BA1AA 116EEA62 D7028B85 85FF87AE 8AE9F0B5
+ 421F1A3E E5E04F19 F1D4F0A6 144CEF29 8D690FC8 D9836830
+ 4518FF9E 96004114 1083326B 84B5C0EC

Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc Mon Mar 30 16:00:46 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6CCPUQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZmRGD/9UkePDo4IawkYALJoaqpwnjp1Md3RP5dbK
+l/x1VLfHzAkbYQo+tKe692koHo45tE0izt+99humvZT7SjP4sVPHuR16Ik0gE6h0
+Yn8CG4Qsof30Se9feg6EllACBDEvueGlcchHN+aPyYJoLjajAzfH/5P6fC9rHe5Z
+d3aYd93cqYtIKbDtQ6fxnI387wTmWkVKAXWNB7K5iEB8KFjzCjGeyac5JbnYBC6G
+Y9uWcxqQ+3XV2SIfDQuxFuj421RBx2IIu56qJLgVEzcs8yLh4APM29DfYv7YcRGg
+ILex3j8SWjgqG1rdDhc2U/SeakR/rErJ+oebxD9dTC19wMTnp37cgS0HgtWLHaU2
+RvxaMdAvF3GjN2LFhSRht/uZV350O3EI+L6ye9WauXzaK4iD7Mi5x7BIBN1csNWn
+MW0B+goqTpzvC78h5R2ETCw1xmAarjKmdLKf3AUuqGeobv/7+4sLuwq+PSyrTgUi
+BHPIgkYYk+EhHryB6wLkKYRXWKKmMyGCl+5HLYPuY4GyZm4rwc2et8v1pX3RvcCF
+NoOcg/TZgn6+Tz0OjUm4TARs9RkbJEhKk1EWKCFvPalhenLbHHOvDJJPoqp3LNVT
+/HQ1f1JRWqXWfc/O1BR9CRFNbZTxKorPxMXIEYn583lufZyvWiyAnYKD6ev0UAdB
+/iwwQeeM/Q

[spark] 01/01: Preparing development version 3.0.1-SNAPSHOT

2020-03-30 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit fc5079841907443369af98b17c20f1ac24b3727d
Author: Reynold Xin 
AuthorDate: Mon Mar 30 08:42:27 2020 +

Preparing development version 3.0.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index c8cb1c3..3eff30b 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.0.0
+Version: 3.0.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 0a52a00..8bef9d8 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index fa4fcb1f..fc1441d 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 14a1b7d..de2a6fb 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index e75a843..6c0c016 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 004af0a..b8df191 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index a35156a..8119709 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --g

[spark] branch branch-3.0 updated (5687b31 -> fc50798)

2020-03-30 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5687b31  [SPARK-30532] DataFrameStatFunctions to work with 
TABLE.COLUMN syntax
 add 6550d0d  Preparing Spark release v3.0.0-rc1
 new fc50798  Preparing development version 3.0.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] tag v3.0.0-rc1 created (now 6550d0d)

2020-03-30 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to tag v3.0.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 6550d0d  (commit)
This tag includes the following new commits:

 new 6550d0d  Preparing Spark release v3.0.0-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v3.0.0-rc1

2020-03-30 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to tag v3.0.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1
Author: Reynold Xin 
AuthorDate: Mon Mar 30 08:42:10 2020 +

Preparing Spark release v3.0.0-rc1
---
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 2 +-
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 38 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index 193ad3d..0a52a00 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index a1c8a8e..fa4fcb1f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 163c250..14a1b7d 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index a6d9981..e75a843 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 76a402b..004af0a 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 3c3c0d2..a35156a 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index 883b73a..dedc7df 100644
--- a/common/tags/pom.xml
+++ b/common/tags/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml
index 93a4f67..ebb0525 100644
--- a/common/unsafe/pom.xml
+++ b/common/unsafe/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git

svn commit: r38725 - /dev/spark/KEYS

2020-03-30 Thread rxin
Author: rxin
Date: Mon Mar 30 07:26:00 2020
New Revision: 38725

Log:
Update KEYS

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Mon Mar 30 07:26:00 2020
@@ -1167,3 +1167,61 @@ rMA+YcuC9o2K7dKjVv3KinQ2Tiv4TVxyTjcyZurg
 0TbepIdiQlc=
 =wdlY
 -END PGP PUBLIC KEY BLOCK-
+
+pub   rsa4096 2020-03-30 [SC]
+  4A8BDA48E6E212A734632502DEA963E2E9347D66
+uid   [ultimate] Reynold Xin (CODE SIGNING KEY) 
+sub   rsa4096 2020-03-30 [E]
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBF6BkJkBEACmRKcV6c575E6jOyZBwLteV7hJsETNYx9jMkENiyeyTFJ3A8Hg
++gPAmoU6jvzugR98qgVSH0uj/HZH1zEkJx049+OHwBcZ48mGJakIaKcg3k1CPRTL
+VDRWg7M4P7nQisMHsPHrdGPJFVBE7Mn6pafuRZ46gtnXf2Ec1EsvMBOYjRNt6nSg
+GvoQdiv5SjUuwxfrw7CICj1agxwLarBcWpIF6PMU7yG+XjTIrSM63KuuV+fOZvKM
+AdjwwUNNj2aOkprPHfmFIgSnEMsxvoJQNqYTaWzwT8WAyW1qTd0LhYYDTnb4J+j2
+BxgG5ASHYpsLQ1Moy+lYsTxWsoZMvqTqv/h+Mlb8fiUTiYppeMnLzxtI/t8Trvt8
+rXNGSkNd8dM5uqJ9Ba2MS6UB6EZUd5e7aPy8z5ThlhygRjLk0527O4BYAWlZw5F8
+egq/X0liCeRHoFUsyNnuQYSqo2spdTIV2ExKo/hEF1FgbXF6s1v/TcfzS0PkSYEH
+5yhKYoEkYOXIneIjUasy8xM9O2578NsVu1GH0n+E29KDA0w+QKwpbjgb9VWKCjk1
+CPvK7oi3DKA4A28w/h5jI9Xzb343L0gb+IhdgL5lNWp2HoSy+y7Smnbz6IchjAP7
+zCtQ9ZJCLdXgCtDlXUeF+TXzEfKUYwa0jnha/fArM3PVGvQlWdpVhe/oLQARAQAB
+tDBSZXlub2xkIFhpbiAoQ09ERSBTSUdOSU5HIEtFWSkgPHJ4aW5AYXBhY2hlLm9y
+Zz6JAk4EEwEIADgWIQRKi9pI5uISpzRjJQLeqWPi6TR9ZgUCXoGQmQIbAwULCQgH
+AgYVCgkICwIEFgIDAQIeAQIXgAAKCRDeqWPi6TR9ZrBJEACW92VdruNL+dYYH0Cu
+9oxZx0thCE1twc/6rvgvIj//0kZ4ZA6RoDId8vSmKSkB0GwMT7daIoeIvRTiEdMQ
+Wai7zqvNEdT1qdNn7MfN1rveN1tBNVndzbZ8S8Nz4sqZ/8R3wG90c2XLwno3joXA
+FhFRfVa+TWI1Ux84/ZXuzD14f54dorVo0CT51CnU67ERBAijl7UugPM3Fs7ApU/o
+SWCMq7ScPde81jmgMqBDLcj/hueCOTU5m8irOGGY439qEF+H41I+IB60yzAS4Gez
+xZl55Mv7ZKdwWtCcwtUYIm4R8NNu4alTxUpxw4ttRW3Kzue78TOIMTWTwRKrP5t2
+yq9bMT1fSO7h/Ntn8dXUL0EM/h+6k5py5Kr0+mrV/s0Z530Fit6AC/ReWV6hSGdk
+F1Z1ECa4AoUHqtoQKL+CNgO2qlJn/sKj3g10NiSwqUdUuxCSOpsY72udRLG9tfkB
+OwW3lTKLp66gYYE3nYaHzJKGdRs7aJ8RRALMQkadsyqpdVMp+Yvbj/3Hn3uB3jTt
+S+RolH545toeuhXaiIWlm2434oHW6QjzpPwaNp5AiWm+vMfPkhhCX6WT0jv9nEtM
+kJJVgwlWNKYEW9nLaIRMWWONSy9aJapZfLW0XDiKidibPHqNFih9z49eDVLobi5e
+mzmOFkKFxs9D4sg9oVmId6Y9SbkCDQRegZCZARAA5ZMv1ki5mKJVpASRGfTHVH5o
+9HixwJOinkHjSK3zFpuvh0bs+rKZL2+TUXci9Em64xXuYbiGH3YgH061H9tgAMaN
+iSIFGPlbBPbduJjdiUALqauOjjCIoWJLyuAC25zSGCeAwzQiRXN6VJUYwjQnDMDG
+8iUyL+IdXjq2T6vFVZGR/uVteRqqvEcg9km6IrFmXefqfry4hZ5a7SbmThCHqGxx
+5Oy+VkWw1IP7fHIUdC9ie45X6n08yC2BfWI4+RBny8906pSXEN/ag0Yw7vWkiyuK
+wZsoe0pRczV8mx6QF2+oJjRMtziKYW72jKE9a/DXXzQ3Luq5gyZeq0cluYNGHVdj
+ijA2ORNLloAfGjVGRKVznUFN8LMkcxm4jiiHKRkZEcjgm+1tRzGPufFidyhQIYO2
+YCOpnPQh5IXznb3RZ0JqJcXdne+7Nge85URTEMmMyx5kXvD03ZmUObshDL12YoM3
+bGzObo6jYg+h38Xlx9+9QAwGkf+gApIPI8KqPAVyP6s60AR4iR6iehEOciz7h6/b
+T9bKMw0w9cvyJzY1IJsy2sQYFwNyHYWQkyDciRAmIwriHhBDfXdBodF95V3uGbIp
+DZw3jVxcgJWKZ3y65N1aCguEI1fyy9JU12++GMBa+wuv9kdhSoj2qgInFB1VXGC7
+bBlRnHB44tsFTBEqqOcAEQEAAYkCNgQYAQgAIBYhBEqL2kjm4hKnNGMlAt6pY+Lp
+NH1mBQJegZCZAhsMAAoJEN6pY+LpNH1mwIYQAIRqbhEjL6uMxM19OMPDydbhiWoI
+8BmoqzsvRNF9VidjPRicYJ5JL5FFvvTyT6g87L8aRhiAdX/la92PdJ9DTS3sfIKF
+pIcUDFybKgk4pmGWl0fNIwEjHewf6HlndCFmVuPe32V/ZkCwb58dro15xzxblckB
+kgsqb0Xbfz/3Iwlqr5eTKH5iPrDFcYKy1ODcFmXS+udMm5uwn+d/RNmj8B3kgwrw
+brs53264qdWbfsxGPC1ZkDNNSRyIy6wGvc/diRm4TSV/Lmd5OoDX4UkPJ++JhGoO
+cYKxc2KzrEZxzMgJ3xFRs3zeymOwtgXUU1GBCuD7uxr1vacFwUV+9ymTeyUdTxB3
++/DzxYOJGQL/3IXlyQ2azoCWUpCjW0MFM1OolragOFJeQ+V0xrlOiXXAFfHo0KPG
+y0QdK810Ok+XYR6U9Y7yb6tYDgi+w9r46XjurdiZnUxxLUpFG++tSgBQ5X4y2UGw
+C4n0T8/jn6KIUZ0kx51ZZ6CEChjBt+AU+HCnw2sZfgq8Nlos95tw2MT6kn8BrY68
+n297ev/1T6B0OasQaw3Itw29+T+FdzdU4c6XW/rC6VAlBikWIS5zCT//vAeBacxL
+HYoqwKL52HzG121lfWXhx5vNF4bg/fKrFEOy2Wp1fMG6nRcuUUROvieD6ZU4ZrLA
+NjpTIP+lOkfxRwUi
+=rggH
+-END PGP PUBLIC KEY BLOCK-



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch test-branch deleted (was 0f8b07e)

2019-02-01 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to branch test-branch
in repository https://gitbox.apache.org/repos/asf/spark.git.


 was 0f8b07e  test

This change permanently discards the following revisions:

 discard 0f8b07e  test


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch test-branch created (now 0f8b07e)

2019-02-01 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to branch test-branch
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 0f8b07e  test

This branch includes the following new commits:

 new 0f8b07e  test

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: test

2019-02-01 Thread rxin
This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to branch test-branch
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 0f8b07e5034af2819b75b53aadffda82ae0c31b8
Author: Reynold Xin 
AuthorDate: Fri Feb 1 13:28:18 2019 -0800

test
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 271f2f5..2c1e02a 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ For general development tips, including info on developing 
Spark using an IDE, s
 
 The easiest way to start using Spark is through the Scala shell:
 
-./bin/spark-shell
+./bin/spark-shella
 
 Try the following command, which should return 1000:
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23207
  
```var writer: ShuffleWriter[Any, Any] = null
try {
  val manager = SparkEnv.get.shuffleManager
  writer = manager.getWriter[Any, Any](
dep.shuffleHandle, partitionId, context, 
context.taskMetrics().shuffleWriteMetrics)
  writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ 
<: Product2[Any, Any]]])
  writer.stop(success = true).get
} catch {
  case e: Exception =>
try {
  if (writer != null) {
writer.stop(success = false)
  }
} catch {
  case e: Exception =>
log.debug("Could not stop writer", e)
}
throw e
}```

Can we put the above in a closure and pass it into shuffle dependency? Then 
in SQL we just put the above in SQL using custom metrics.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r239308829
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -170,13 +172,23 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
 val df = testData2.groupBy().agg(collect_set('a)) // 2 partitions
 testSparkPlanMetrics(df, 1, Map(
   2L -> (("ObjectHashAggregate", Map("number of output rows" -> 2L))),
+  1L -> (("Exchange", Map(
+"shuffle records written" -> 2L,
+"records read" -> 2L,
+"local blocks fetched" -> 2L,
--- End diff --

yea i'd just change the display text here, and not change the api


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r239308706
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala
 ---
@@ -95,3 +96,59 @@ private[spark] object SQLShuffleMetricsReporter {
 FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait 
time"),
 RECORDS_READ -> SQLMetrics.createMetric(sc, "records read"))
 }
+
+/**
+ * A shuffle write metrics reporter for SQL exchange operators. Different 
with
+ * [[SQLShuffleReadMetricsReporter]], we need a function of (reporter => 
reporter) set in
+ * shuffle dependency, so the local SQLMetric should transient and create 
on executor.
+ * @param metrics Shuffle write metrics in current SparkPlan.
+ * @param metricsReporter Other reporter need to be updated in this 
SQLShuffleWriteMetricsReporter.
+ */
+private[spark] case class SQLShuffleWriteMetricsReporter(
+metrics: Map[String, SQLMetric])(metricsReporter: 
ShuffleWriteMetricsReporter)
+  extends ShuffleWriteMetricsReporter with Serializable {
+  @transient private[this] lazy val _bytesWritten =
+metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_BYTES_WRITTEN)
+  @transient private[this] lazy val _recordsWritten =
+metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_RECORDS_WRITTEN)
+  @transient private[this] lazy val _writeTime =
+metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_WRITE_TIME)
+
+  override private[spark] def incBytesWritten(v: Long): Unit = {
+metricsReporter.incBytesWritten(v)
+_bytesWritten.add(v)
+  }
+  override private[spark] def decRecordsWritten(v: Long): Unit = {
+metricsReporter.decBytesWritten(v)
+_recordsWritten.set(_recordsWritten.value - v)
+  }
+  override private[spark] def incRecordsWritten(v: Long): Unit = {
+metricsReporter.incRecordsWritten(v)
+_recordsWritten.add(v)
+  }
+  override private[spark] def incWriteTime(v: Long): Unit = {
+metricsReporter.incWriteTime(v)
+_writeTime.add(v)
+  }
+  override private[spark] def decBytesWritten(v: Long): Unit = {
+metricsReporter.decBytesWritten(v)
+_bytesWritten.set(_bytesWritten.value - v)
+  }
+}
+
+private[spark] object SQLShuffleWriteMetricsReporter {
+  val SHUFFLE_BYTES_WRITTEN = "shuffleBytesWritten"
+  val SHUFFLE_RECORDS_WRITTEN = "shuffleRecordsWritten"
+  val SHUFFLE_WRITE_TIME = "shuffleWriteTime"
--- End diff --

yea i think we can just report ms level granularity. no point reporting ns 
(although we might want to measure based on ns)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r239308197
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala
 ---
@@ -95,3 +96,59 @@ private[spark] object SQLShuffleMetricsReporter {
 FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait 
time"),
 RECORDS_READ -> SQLMetrics.createMetric(sc, "records read"))
 }
+
+/**
+ * A shuffle write metrics reporter for SQL exchange operators. Different 
with
+ * [[SQLShuffleReadMetricsReporter]], we need a function of (reporter => 
reporter) set in
+ * shuffle dependency, so the local SQLMetric should transient and create 
on executor.
+ * @param metrics Shuffle write metrics in current SparkPlan.
+ * @param metricsReporter Other reporter need to be updated in this 
SQLShuffleWriteMetricsReporter.
+ */
+private[spark] case class SQLShuffleWriteMetricsReporter(
+metrics: Map[String, SQLMetric])(metricsReporter: 
ShuffleWriteMetricsReporter)
--- End diff --

why are there two parameter list here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r239308082
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -38,12 +38,18 @@ case class CollectLimitExec(limit: Int, child: 
SparkPlan) extends UnaryExecNode
   override def outputPartitioning: Partitioning = SinglePartition
   override def executeCollect(): Array[InternalRow] = 
child.executeTake(limit)
   private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
-  override lazy val metrics = 
SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext)
+  private val writeMetrics = 
SQLShuffleWriteMetricsReporter.createShuffleWriteMetrics(sparkContext)
--- End diff --

why is metrics lazy val and this one val?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r239308007
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -38,12 +38,18 @@ case class CollectLimitExec(limit: Int, child: 
SparkPlan) extends UnaryExecNode
   override def outputPartitioning: Partitioning = SinglePartition
   override def executeCollect(): Array[InternalRow] = 
child.executeTake(limit)
   private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
-  override lazy val metrics = 
SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext)
+  private val writeMetrics = 
SQLShuffleWriteMetricsReporter.createShuffleWriteMetrics(sparkContext)
+  override lazy val metrics =
--- End diff --

this is somewhat confusing. I'd create a variable for the read metrics so 
you can pass just that into the ShuffledRDD.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-05 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23207
  
@xuanyuanking can you separate the prs to rename read side metric and the 
write side change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r238845399
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -299,12 +312,25 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value")
   val df2 = (1 to 10).map(i => (i, i.toString)).toSeq.toDF("key", 
"value")
   // Assume the execution plan is
-  // ... -> ShuffledHashJoin(nodeId = 1) -> Project(nodeId = 0)
+  // Project(nodeId = 0)
+  // +- ShuffledHashJoin(nodeId = 1)
+  // :- Exchange(nodeId = 2)
+  // :  +- Project(nodeId = 3)
+  // : +- LocalTableScan(nodeId = 4)
+  // +- Exchange(nodeId = 5)
+  // +- Project(nodeId = 6)
+  // +- LocalTableScan(nodeId = 7)
   val df = df1.join(df2, "key")
   testSparkPlanMetrics(df, 1, Map(
 1L -> (("ShuffledHashJoin", Map(
   "number of output rows" -> 2L,
-  "avg hash probe (min, med, max)" -> "\n(1, 1, 1)"
+  "avg hash probe (min, med, max)" -> "\n(1, 1, 1)"))),
+2L -> (("Exchange", Map(
+  "shuffle records written" -> 2L,
+  "records read" -> 2L))),
--- End diff --

is this always going to be the same as "shuffle records written" ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r238845029
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -170,13 +172,23 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
 val df = testData2.groupBy().agg(collect_set('a)) // 2 partitions
 testSparkPlanMetrics(df, 1, Map(
   2L -> (("ObjectHashAggregate", Map("number of output rows" -> 2L))),
+  1L -> (("Exchange", Map(
+"shuffle records written" -> 2L,
+"records read" -> 2L,
+"local blocks fetched" -> 2L,
--- End diff --

i think we should be consistent and name these "read", rather than "fetch".



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r238843017
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -163,6 +171,8 @@ object SQLMetrics {
 Utils.bytesToString
   } else if (metricsType == TIMING_METRIC) {
 Utils.msDurationToString
+  } else if (metricsType == NANO_TIMING_METRIC) {
+duration => Utils.msDurationToString(duration / 10)
--- End diff --

is this the right conversion from nanosecs to millisecs?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r238842276
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -78,6 +78,7 @@ object SQLMetrics {
   private val SUM_METRIC = "sum"
   private val SIZE_METRIC = "size"
   private val TIMING_METRIC = "timing"
+  private val NANO_TIMING_METRIC = "nanosecond"
--- End diff --

ns


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r238837000
  
--- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala ---
@@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter {
   private[spark] def decBytesWritten(v: Long): Unit
   private[spark] def decRecordsWritten(v: Long): Unit
 }
+
+
+/**
+ * A proxy class of ShuffleWriteMetricsReporter which proxy all metrics 
updating to the input
+ * reporters.
+ */
+private[spark] class GroupedShuffleWriteMetricsReporter(
+reporters: Seq[ShuffleWriteMetricsReporter]) extends 
ShuffleWriteMetricsReporter {
+  override private[spark] def incBytesWritten(v: Long): Unit = {
+reporters.foreach(_.incBytesWritten(v))
+  }
+  override private[spark] def decRecordsWritten(v: Long): Unit = {
+reporters.foreach(_.decRecordsWritten(v))
+  }
+  override private[spark] def incRecordsWritten(v: Long): Unit = {
+reporters.foreach(_.incRecordsWritten(v))
+  }
+  override private[spark] def incWriteTime(v: Long): Unit = {
+reporters.foreach(_.incWriteTime(v))
+  }
+  override private[spark] def decBytesWritten(v: Long): Unit = {
+reporters.foreach(_.decBytesWritten(v))
+  }
+}
+
+
+/**
+ * A proxy class of ShuffleReadMetricsReporter which proxy all metrics 
updating to the input
+ * reporters.
+ */
+private[spark] class GroupedShuffleReadMetricsReporter(
--- End diff --

Again - I think your old approach is much better. No point creating a 
general util when there is only one implementation without any known future 
needs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r238836448
  
--- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala ---
@@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter {
   private[spark] def decBytesWritten(v: Long): Unit
   private[spark] def decRecordsWritten(v: Long): Unit
 }
+
+
+/**
+ * A proxy class of ShuffleWriteMetricsReporter which proxy all metrics 
updating to the input
+ * reporters.
+ */
+private[spark] class GroupedShuffleWriteMetricsReporter(
--- End diff --

I'd not create a general API here. Just put one in SQL similar to the read 
side that also calls the default one.

It can be expensive to go through a seq for each record and bytes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-03 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23171
  
Basically logically there are only two expressions: In which handles 
arbitrary expressions, and InSet which handles expressions with literals. Both 
could work: (1) we provide two separate expressions for InSet, one using 
switch, and one using hashset, or (2) we just provide one InSet and internally 
in InSet have two implementations ... 

The downside with creating different expressions for the same logical 
expression is that potentially the downstream optimization rules would need to 
match more.

On Mon, Dec 03, 2018 at 10:52 PM, DB Tsai < notificati...@github.com > 
wrote:

> 
> 
    > 
> @ rxin ( https://github.com/rxin ) switch in Java is still significantly
> faster than hash set even without boxing / unboxing problems when the
> number of elements are small. We were thinking about to have two
> implementations in InSet , and pick up switch if the number of elements 
are
> small, or otherwise pick up hash set one. But this is the same complexity
> as having two implements in In as this PR.
> 
> 
> 
> @ cloud-fan ( https://github.com/cloud-fan ) do you suggest to create an 
OptimizeIn
> which has switch and hash set implementations based on the length of the
> elements and remove InSet ? Basically, what we were thinking above.
> 
> 
> 
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub (
> https://github.com/apache/spark/pull/23171#issuecomment-443991336 ) , or 
mute
> the thread (
> 
https://github.com/notifications/unsubscribe-auth/AATvPKtGyx5jWxgtO1y5WsiXYDAQqRQ4ks5u1hvJgaJpZM4Y4P4J
> ).
> 
> 
>


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-03 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23171
  
I thought InSwitch logically is the same as InSet, in which all the child 
expressions are literals?

On Mon, Dec 03, 2018 at 8:38 PM, Wenchen Fan < notificati...@github.com > 
wrote:

> 
> 
> 
> I think InSet is not an optimized version of In , but just a way to
> separate the implementation for different conditions (the length of the
> list). Maybe we should do the same thing here, create a InSwitch and
> convert In to it when meeting some conditions. One problem is, In and 
InSwitch
> is same in the interpreted version, maybe we should create a base class
> for them.
> 
> 
> 
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub (
> https://github.com/apache/spark/pull/23171#issuecomment-443968486 ) , or 
mute
> the thread (
> 
https://github.com/notifications/unsubscribe-auth/AATvPDTQic0Ii5UD40m_Uj5kMVy4pNExks5u1fxPgaJpZM4Y4P4J
> ).
> 
> 
>


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-03 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23171
  
That probably means we should just optimize InSet to have the switch 
version though? Rather than do it in In?

On Mon, Dec 03, 2018 at 8:20 PM, Wenchen Fan < notificati...@github.com > 
wrote:

> 
> 
    > 
> @ rxin ( https://github.com/rxin ) I proposed the same thing before, but
> one problem is that, we only convert In to InSet when the length of list
> reaches the threshold. If the switch way is faster than hash set when the
> list is small, it seems still worth to optimize In using switch.
> 
> 
> 
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub (
> https://github.com/apache/spark/pull/23171#issuecomment-443965616 ) , or 
mute
> the thread (
> 
https://github.com/notifications/unsubscribe-auth/AATvPEkrUFJuT4FI167cCI9b0nfv16V4ks5u1fgNgaJpZM4Y4P4J
> ).
> 
> 
>


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-03 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23171
  
I'm not a big fan of making the physical implementation of an expression 
very different depending on the situation. Why can't we just make InSet 
efficient and convert these cases to that?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23192: [SPARK-26241][SQL] Add queryId to IncrementalExecution

2018-12-01 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23192
  
Thanks @HyukjinKwon. Fixed it.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23193: [SPARK-26226][SQL] Track optimization phase for s...

2018-11-30 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/23193

[SPARK-26226][SQL] Track optimization phase for streaming queries

## What changes were proposed in this pull request?
In an earlier PR, we missed measuring the optimization phase time for 
streaming queries. This patch adds it.

## How was this patch tested?
Given this is a debugging feature, and it is very convoluted to add tests 
to verify the phase is set properly, I am not introducing a streaming specific 
test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-26226-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23193


commit 70c319bdaaac4fc4b8b988a96be6f976a63b41bf
Author: Reynold Xin 
Date:   2018-12-01T04:33:21Z

SPARK-26226




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23193: [SPARK-26226][SQL] Track optimization phase for streamin...

2018-11-30 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23193
  
cc @gatorsmile @jose-torres 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23192: [SPARK-26221][SQL] Add queryId to IncrementalExecution

2018-11-30 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23192
  
cc @zsxwing @jose-torres 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23192: [SPARK-26221][SQL] Add queryId to IncrementalExec...

2018-11-30 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/23192

[SPARK-26221][SQL] Add queryId to IncrementalExecution

## What changes were proposed in this pull request?
This is a small change for better debugging: to pass query uuid in 
IncrementalExecution, when we look at the QueryExecution in isolation to trace 
back the query.

## How was this patch tested?
N/A - just add some field for better debugging.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-26241

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23192


commit c037f4d2fa2c2844ac992d976b492e14ab9bed11
Author: Reynold Xin 
Date:   2018-12-01T04:27:00Z

[SPARK-26221]




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23183: [SPARK-26226][SQL] Update query tracker to report...

2018-11-30 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23183#discussion_r238019351
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala
 ---
@@ -51,6 +58,18 @@ object QueryPlanningTracker {
 }
   }
 
+  /**
+   * Summary of a phase, with start time and end time so we can construct 
a timeline.
+   */
+  class PhaseSummary(val startTimeMs: Long, val endTimeMs: Long) {
+
+def durationMs: Long = endTimeMs - startTimeMs
+
+override def toString: String = {
+  s"PhaseSummary($startTimeMs, $endTimeMs)"
--- End diff --

so for actual debugging this is not needed right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23183: [SPARK-26226][SQL] Update query tracker to report timeli...

2018-11-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23183
  
cc @hvanhovell @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23183: [SPARK-26226][SQL] Update query tracker to report...

2018-11-29 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/23183

[SPARK-26226][SQL] Update query tracker to report timeline for phases

## What changes were proposed in this pull request?
This patch changes the query plan tracker added earlier to report phase 
timeline, rather than just a duration for each phase. This way, we can easily 
find time that's unaccounted for.

## How was this patch tested?
Updated test cases to reflect that.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-26226

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23183


commit d200be22afd83472c03a612a22e5b1fb4d4d80ab
Author: Reynold Xin 
Date:   2018-11-29T23:00:49Z

[SPARK-26226][SQL] Update query tracker to report timeline for phases




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-26142] followup: Move sql shuffle read metrics relatives to SQLShuffleMetricsReporter

2018-11-29 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 9fdc7a840 -> cb368f2c2


[SPARK-26142] followup: Move sql shuffle read metrics relatives to 
SQLShuffleMetricsReporter

## What changes were proposed in this pull request?

Follow up for https://github.com/apache/spark/pull/23128, move sql read metrics 
relatives to `SQLShuffleMetricsReporter`, in order to put sql shuffle read 
metrics relatives closer and avoid possible problem about forgetting update 
SQLShuffleMetricsReporter while new metrics added by others.

## How was this patch tested?

Existing tests.

Closes #23175 from xuanyuanking/SPARK-26142-follow.

Authored-by: Yuanjian Li 
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cb368f2c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cb368f2c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cb368f2c

Branch: refs/heads/master
Commit: cb368f2c2964797d7313d3a4151e2352ff7847a9
Parents: 9fdc7a8
Author: Yuanjian Li 
Authored: Thu Nov 29 12:09:30 2018 -0800
Committer: Reynold Xin 
Committed: Thu Nov 29 12:09:30 2018 -0800

--
 .../exchange/ShuffleExchangeExec.scala  |  4 +-
 .../org/apache/spark/sql/execution/limit.scala  |  6 +--
 .../spark/sql/execution/metric/SQLMetrics.scala | 20 
 .../metric/SQLShuffleMetricsReporter.scala  | 50 
 .../execution/UnsafeRowSerializerSuite.scala|  4 +-
 5 files changed, 47 insertions(+), 37 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cb368f2c/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
index 8938d93..c9ca395 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
@@ -30,7 +30,7 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, 
BoundReference, Uns
 import 
org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering
 import org.apache.spark.sql.catalyst.plans.physical._
 import org.apache.spark.sql.execution._
-import org.apache.spark.sql.execution.metric.SQLMetrics
+import org.apache.spark.sql.execution.metric.{SQLMetrics, 
SQLShuffleMetricsReporter}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.StructType
 import org.apache.spark.util.MutablePair
@@ -49,7 +49,7 @@ case class ShuffleExchangeExec(
 
   override lazy val metrics = Map(
 "dataSize" -> SQLMetrics.createSizeMetric(sparkContext, "data size")
-  ) ++ SQLMetrics.getShuffleReadMetrics(sparkContext)
+  ) ++ SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext)
 
   override def nodeName: String = {
 val extraInfo = coordinator match {

http://git-wip-us.apache.org/repos/asf/spark/blob/cb368f2c/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
index ea845da..e9ab7cd 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
@@ -25,7 +25,7 @@ import 
org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodeGe
 import org.apache.spark.sql.catalyst.plans.physical._
 import org.apache.spark.sql.catalyst.util.truncatedString
 import org.apache.spark.sql.execution.exchange.ShuffleExchangeExec
-import org.apache.spark.sql.execution.metric.SQLMetrics
+import org.apache.spark.sql.execution.metric.SQLShuffleMetricsReporter
 
 /**
  * Take the first `limit` elements and collect them to a single partition.
@@ -38,7 +38,7 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) 
extends UnaryExecNode
   override def outputPartitioning: Partitioning = SinglePartition
   override def executeCollect(): Array[InternalRow] = child.executeTake(limit)
   private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
-  override lazy val metrics = SQLMetrics.getShuffleReadMetrics(sparkContext)
+  override lazy val metrics = 
SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext)
   protected override def doExecute(): RDD[InternalRow] = {
 val locallyLimited = child.execute().mapPartitionsInternal(_.take(limit))
 val shuffled = new ShuffledRowRDD(
@@ -154,7 +154,7 @@ case class TakeOrderedAndProjectExec(
 

[GitHub] spark issue #23175: [SPARK-26142]followup: Move sql shuffle read metrics rel...

2018-11-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23175
  
LGTM - merged in master.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23178: [SPARK-26216][SQL] Do not use case class as public API (...

2018-11-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23178
  
Good idea to have it sealed!

> On Nov 29, 2018, at 7:04 AM, Sean Owen  wrote:
> 
> @srowen commented on this pull request.
> 
> In 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala:
> 
> >  if (inputTypes.isDefined) {
>assert(inputTypes.get.length == nullableTypes.get.length)
>  }
>  
> +val inputsNullSafe = if (nullableTypes.isEmpty) {
> You can use getOrElse here and even inline this into the call below, but 
I don't really care.
> 
> In 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala:
> 
> > @@ -38,114 +38,108 @@ import org.apache.spark.sql.types.DataType
>   * @since 1.3.0
>   */
>  @Stable
> -case class UserDefinedFunction protected[sql] (
> -f: AnyRef,
> -dataType: DataType,
> -inputTypes: Option[Seq[DataType]]) {
> -
> -  private var _nameOption: Option[String] = None
> -  private var _nullable: Boolean = true
> -  private var _deterministic: Boolean = true
> -
> -  // This is a `var` instead of in the constructor for backward 
compatibility of this case class.
> -  // TODO: revisit this case class in Spark 3.0, and narrow down the 
public surface.
> -  private[sql] var nullableTypes: Option[Seq[Boolean]] = None
> +trait UserDefinedFunction {
> Should we make this sealed? I'm not sure. Would any user ever extend this 
meaningfully? I kind of worry someone will start doing so; maybe they already 
subclass it in some cases though. Elsewhere it might help the compiler 
understand in match statements that there is only ever one type of UDF class to 
match on.
> 
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL

2018-11-28 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23128
  
@xuanyuanking @cloud-fan when you think about where to put each code block, 
make sure you also think about future evolution of the codebase. In general put 
relevant things closer to each other (e.g. in one class, one file, or one 
method).



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23128: [SPARK-26142][SQL] Implement shuffle read metrics...

2018-11-28 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23128#discussion_r237129249
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -82,6 +82,14 @@ object SQLMetrics {
 
   private val baseForAvgMetric: Int = 10
 
+  val REMOTE_BLOCKS_FETCHED = "remoteBlocksFetched"
--- End diff --

rather than putting this list and the getShuffleReadMetrics function here, 
we should move it into SQLShuffleMetricsReporter. Otherwise in the future when 
one adds another metric, he/she is likely to forget to update 
SQLShuffleMetricsReporter.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23128: [SPARK-26142][SQL] Implement shuffle read metrics...

2018-11-28 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23128#discussion_r237128247
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala
 ---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.metric
+
+import org.apache.spark.executor.TempShuffleReadMetrics
+
+/**
+ * A shuffle metrics reporter for SQL exchange operators.
+ * @param tempMetrics [[TempShuffleReadMetrics]] created in TaskContext.
+ * @param metrics All metrics in current SparkPlan. This param should not 
empty and
+ *   contains all shuffle metrics defined in 
[[SQLMetrics.getShuffleReadMetrics]].
+ */
+private[spark] class SQLShuffleMetricsReporter(
+  tempMetrics: TempShuffleReadMetrics,
--- End diff --

4 space indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23128: [SPARK-26142][SQL] Implement shuffle read metrics...

2018-11-28 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23128#discussion_r237128189
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -194,4 +202,16 @@ object SQLMetrics {
 SparkListenerDriverAccumUpdates(executionId.toLong, metrics.map(m 
=> m.id -> m.value)))
 }
   }
+
+  /**
+   * Create all shuffle read relative metrics and return the Map.
+   */
+  def getShuffleReadMetrics(sc: SparkContext): Map[String, SQLMetric] = 
Map(
--- End diff --

I'd prefer to name this create, rather than get, to imply we are creating a 
new set rather than just returning some existing sets.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-27 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23086#discussion_r236845375
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -38,7 +38,7 @@ import org.apache.spark.sql.execution.datasources.jdbc._
 import 
org.apache.spark.sql.execution.datasources.json.TextInputJsonDataSource
 import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation
 import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils
-import org.apache.spark.sql.sources.v2.{BatchReadSupportProvider, 
DataSourceOptions, DataSourceV2}
+import org.apache.spark.sql.sources.v2._
--- End diff --

I do think this one is too nitpicking. If this gets long it should be 
wildcard. Use an IDE for large reviews like this if needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23106: [SPARK-26141] Enable custom metrics implementation in sh...

2018-11-26 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23106
  
Merging in master. Thanks @squito.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-26141] Enable custom metrics implementation in shuffle write

2018-11-26 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 85383d29e -> 6a064ba8f


[SPARK-26141] Enable custom metrics implementation in shuffle write

## What changes were proposed in this pull request?
This is the write side counterpart to https://github.com/apache/spark/pull/23105

## How was this patch tested?
No behavior change expected, as it is a straightforward refactoring. Updated 
all existing test cases.

Closes #23106 from rxin/SPARK-26141.

Authored-by: Reynold Xin 
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a064ba8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a064ba8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a064ba8

Branch: refs/heads/master
Commit: 6a064ba8f271d5f9d04acd41d0eea50a5b0f5018
Parents: 85383d2
Author: Reynold Xin 
Authored: Mon Nov 26 22:35:52 2018 -0800
Committer: Reynold Xin 
Committed: Mon Nov 26 22:35:52 2018 -0800

--
 .../sort/BypassMergeSortShuffleWriter.java| 11 +--
 .../spark/shuffle/sort/ShuffleExternalSorter.java | 18 --
 .../spark/shuffle/sort/UnsafeShuffleWriter.java   |  9 +
 .../spark/storage/TimeTrackingOutputStream.java   |  7 ---
 .../spark/executor/ShuffleWriteMetrics.scala  | 13 +++--
 .../apache/spark/scheduler/ShuffleMapTask.scala   |  3 ++-
 .../org/apache/spark/shuffle/ShuffleManager.scala |  6 +-
 .../spark/shuffle/sort/SortShuffleManager.scala   | 10 ++
 .../org/apache/spark/storage/BlockManager.scala   |  7 +++
 .../spark/storage/DiskBlockObjectWriter.scala |  4 ++--
 .../spark/util/collection/ExternalSorter.scala|  4 ++--
 .../shuffle/sort/UnsafeShuffleWriterSuite.java|  6 --
 .../scala/org/apache/spark/ShuffleSuite.scala | 12 
 .../sort/BypassMergeSortShuffleWriterSuite.scala  | 16 
 project/MimaExcludes.scala|  7 ++-
 15 files changed, 79 insertions(+), 54 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6a064ba8/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
 
b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
index b020a6d..fda33cd 100644
--- 
a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
+++ 
b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
@@ -37,12 +37,11 @@ import org.slf4j.LoggerFactory;
 import org.apache.spark.Partitioner;
 import org.apache.spark.ShuffleDependency;
 import org.apache.spark.SparkConf;
-import org.apache.spark.TaskContext;
-import org.apache.spark.executor.ShuffleWriteMetrics;
 import org.apache.spark.scheduler.MapStatus;
 import org.apache.spark.scheduler.MapStatus$;
 import org.apache.spark.serializer.Serializer;
 import org.apache.spark.serializer.SerializerInstance;
+import org.apache.spark.shuffle.ShuffleWriteMetricsReporter;
 import org.apache.spark.shuffle.IndexShuffleBlockResolver;
 import org.apache.spark.shuffle.ShuffleWriter;
 import org.apache.spark.storage.*;
@@ -79,7 +78,7 @@ final class BypassMergeSortShuffleWriter extends 
ShuffleWriter {
   private final int numPartitions;
   private final BlockManager blockManager;
   private final Partitioner partitioner;
-  private final ShuffleWriteMetrics writeMetrics;
+  private final ShuffleWriteMetricsReporter writeMetrics;
   private final int shuffleId;
   private final int mapId;
   private final Serializer serializer;
@@ -103,8 +102,8 @@ final class BypassMergeSortShuffleWriter extends 
ShuffleWriter {
   IndexShuffleBlockResolver shuffleBlockResolver,
   BypassMergeSortShuffleHandle handle,
   int mapId,
-  TaskContext taskContext,
-  SparkConf conf) {
+  SparkConf conf,
+  ShuffleWriteMetricsReporter writeMetrics) {
 // Use getSizeAsKb (not bytes) to maintain backwards compatibility if no 
units are provided
 this.fileBufferSize = (int) conf.getSizeAsKb("spark.shuffle.file.buffer", 
"32k") * 1024;
 this.transferToEnabled = conf.getBoolean("spark.file.transferTo", true);
@@ -114,7 +113,7 @@ final class BypassMergeSortShuffleWriter extends 
ShuffleWriter {
 this.shuffleId = dep.shuffleId();
 this.partitioner = dep.partitioner();
 this.numPartitions = partitioner.numPartitions();
-this.writeMetrics = taskContext.taskMetrics().shuffleWriteMetrics();
+this.writeMetrics = writeMetrics;
 this.serializer = dep.serializer();
 this.shuffleBlockResolver = shuffleBlockResolver;
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/6a064

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-26 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23086#discussion_r236492408
  
--- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java 
---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
--- End diff --

Everything in catalyst is considered private (although public visibility 
for debugging) and it's best to stay that way.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23106: [SPARK-26141] Enable custom metrics implementatio...

2018-11-26 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23106#discussion_r236432889
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -242,8 +243,13 @@ private void writeSortedFile(boolean isLastFile) {
   // Note that we intentionally ignore the value of 
`writeMetricsToUse.shuffleWriteTime()`.
   // Consistent with ExternalSorter, we do not count this IO towards 
shuffle write time.
   // This means that this IO time is not accounted for anywhere; 
SPARK-3577 will fix this.
-  writeMetrics.incRecordsWritten(writeMetricsToUse.recordsWritten());
-  
taskContext.taskMetrics().incDiskBytesSpilled(writeMetricsToUse.bytesWritten());
+
+  // This is guaranteed to be a ShuffleWriteMetrics based on the if 
check in the beginning
+  // of this file.
--- End diff --

ah yes. nice catch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23147: [SPARK-26140] followup: rename ShuffleMetricsReporter

2018-11-26 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23147
  
cc @gatorsmile @xuanyuanking 

@cloud-fan I misunderstood your comment. Finally saw it today when I was 
looking at my other PR.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23147: [SPARK-26140] followup: rename ShuffleMetricsRepo...

2018-11-26 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/23147

[SPARK-26140] followup: rename ShuffleMetricsReporter

## What changes were proposed in this pull request?
In https://github.com/apache/spark/pull/23105, due to working on two 
parallel PRs at once, I made the mistake of committing the copy of the PR that 
used the name ShuffleMetricsReporter for the interface, rather than the 
appropriate one ShuffleReadMetricsReporter. This patch fixes that.

## How was this patch tested?
This should be fine as long as compilation passes.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark ShuffleReadMetricsReporter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23147.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23147


commit 1d28d879572aa958b169acc5e1a48e52cced4c26
Author: Reynold Xin 
Date:   2018-11-26T18:56:18Z

ShuffleReadMetricsReporter




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23135: [SPARK-26168][SQL] Update the code comments in Ex...

2018-11-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23135#discussion_r236089467
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -575,6 +575,19 @@ case class Range(
   }
 }
 
+/**
+ * This is a Group by operator with the aggregate functions and 
projections.
+ *
+ * @param groupingExpressions expressions for grouping keys
+ * @param aggregateExpressions expressions for a project list, which could 
contain
+ *   
[[org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction]]s.
+ *
+ * Note: Currently, aggregateExpressions correspond to both 
[[AggregateExpression]] and the output
--- End diff --

It is not clear what “resultExpressions” mean.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-24 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23131#discussion_r236052557
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1852,6 +1852,19 @@ class Dataset[T] private[sql](
 CombineUnions(Union(logicalPlan, other.logicalPlan))
   }
 
+  /**
+   * Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
--- End diff --

say that this is an alias of union.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23129: [MINOR] Update all DOI links to preferred resolver

2018-11-24 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23129
  
Jenkins, test this please.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23128: [SPARK-26142][SQL] Support passing shuffle metric...

2018-11-23 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23128#discussion_r236025838
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.metric
+
+import org.apache.spark.executor.TempShuffleReadMetrics
+
+/**
+ * A shuffle metrics reporter for SQL exchange operators.
+ * @param tempMetrics [[TempShuffleReadMetrics]] created in TaskContext.
+ * @param metrics All metrics in current SparkPlan.
+ */
+class SQLShuffleMetricsReporter(
+  tempMetrics: TempShuffleReadMetrics,
+  metrics: Map[String, SQLMetric]) extends TempShuffleReadMetrics {
+
+  override def incRemoteBlocksFetched(v: Long): Unit = {
+metrics(SQLMetrics.REMOTE_BLOCKS_FETCHED).add(v)
--- End diff --

(I’m not referring to just this function, but in general, especially for 
per-row).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23128: [SPARK-26142][SQL] Support passing shuffle metric...

2018-11-23 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23128#discussion_r236025817
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.metric
+
+import org.apache.spark.executor.TempShuffleReadMetrics
+
+/**
+ * A shuffle metrics reporter for SQL exchange operators.
+ * @param tempMetrics [[TempShuffleReadMetrics]] created in TaskContext.
+ * @param metrics All metrics in current SparkPlan.
+ */
+class SQLShuffleMetricsReporter(
+  tempMetrics: TempShuffleReadMetrics,
+  metrics: Map[String, SQLMetric]) extends TempShuffleReadMetrics {
+
+  override def incRemoteBlocksFetched(v: Long): Unit = {
+metrics(SQLMetrics.REMOTE_BLOCKS_FETCHED).add(v)
--- End diff --

Doing a hashmap lookup here could introduce serious performance regressions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...

2018-11-23 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23105#discussion_r236020103
  
--- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle
+
+/**
+ * An interface for reporting shuffle read metrics, for each shuffle. This 
interface assumes
+ * all the methods are called on a single-threaded, i.e. concrete 
implementations would not need
+ * to synchronize.
+ *
+ * All methods have additional Spark visibility modifier to allow public, 
concrete implementations
+ * that still have these methods marked as private[spark].
+ */
+private[spark] trait ShuffleReadMetricsReporter {
--- End diff --

@xuanyuanking just submitted a PR on how to use it :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...

2018-11-23 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23105#discussion_r235950427
  
--- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala 
---
@@ -48,7 +48,8 @@ private[spark] trait ShuffleManager {
   handle: ShuffleHandle,
   startPartition: Int,
   endPartition: Int,
-  context: TaskContext): ShuffleReader[K, C]
+  context: TaskContext,
+  metrics: ShuffleMetricsReporter): ShuffleReader[K, C]
--- End diff --

It is a read metrics here actually. In the write PR this is renamed 
ShuffleReadMetricsReporter.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23110: [SPARK-26129] Followup - edge behavior for QueryPlanning...

2018-11-21 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23110
  
cc @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23110: [SPARK-26129] Followup - edge behavior for QueryP...

2018-11-21 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/23110

[SPARK-26129] Followup - edge behavior for 
QueryPlanningTracker.topRulesByTime

## What changes were proposed in this pull request?
This is an addendum patch for SPARK-26129 that defines the edge case 
behavior for QueryPlanningTracker.topRulesByTime.

## How was this patch tested?
Added unit tests for each behavior.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-26129-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23110.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23110


commit 683630ac3fbf054534e2589258793c9baaebfbf5
Author: Reynold Xin 
Date:   2018-11-21T22:25:09Z

[SPARK-26129]




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23106: [SPARK-26141] Enable custom shuffle metrics imple...

2018-11-21 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/23106

[SPARK-26141] Enable custom shuffle metrics implementation in shuffle write

## What changes were proposed in this pull request?
This is the write side counterpart to 
https://github.com/apache/spark/pull/23105

## How was this patch tested?
No behavior change expected, as it is a straightforward refactoring. 
Updated all existing test cases.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-26141

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23106.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23106


commit 115bd8bfa49674a2fcfa05517373146e90ec3bf7
Author: Reynold Xin 
Date:   2018-11-21T15:55:56Z

[SPARK-26141] Enable custom shuffle metrics implementation in shuffle write




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

2018-11-21 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23105
  
cc @jiangxb1987 @squito 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-26129][SQL] Instrumentation for per-query planning time

2018-11-21 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 6bbdf34ba -> 07a700b37


[SPARK-26129][SQL] Instrumentation for per-query planning time

## What changes were proposed in this pull request?
We currently don't have good visibility into query planning time (analysis vs 
optimization vs physical planning). This patch adds a simple utility to track 
the runtime of various rules and various planning phases.

## How was this patch tested?
Added unit tests and end-to-end integration tests.

Closes #23096 from rxin/SPARK-26129.

Authored-by: Reynold Xin 
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/07a700b3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/07a700b3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/07a700b3

Branch: refs/heads/master
Commit: 07a700b3711057553dfbb7b047216565726509c7
Parents: 6bbdf34
Author: Reynold Xin 
Authored: Wed Nov 21 16:41:12 2018 +0100
Committer: Reynold Xin 
Committed: Wed Nov 21 16:41:12 2018 +0100

--
 .../sql/catalyst/QueryPlanningTracker.scala | 127 +++
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  22 ++--
 .../spark/sql/catalyst/rules/RuleExecutor.scala |  19 ++-
 .../catalyst/QueryPlanningTrackerSuite.scala|  78 
 .../sql/catalyst/analysis/AnalysisTest.scala|   3 +-
 .../ResolveGroupingAnalyticsSuite.scala |   3 +-
 .../analysis/ResolvedUuidExpressionsSuite.scala |  10 +-
 .../scala/org/apache/spark/sql/Dataset.scala|   9 ++
 .../org/apache/spark/sql/SparkSession.scala |   6 +-
 .../spark/sql/execution/QueryExecution.scala|  21 ++-
 .../QueryPlanningTrackerEndToEndSuite.scala |  52 
 .../apache/spark/sql/hive/test/TestHive.scala   |  16 ++-
 12 files changed, 338 insertions(+), 28 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/07a700b3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala
new file mode 100644
index 000..420f2a1
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.util.BoundedPriorityQueue
+
+
+/**
+ * A simple utility for tracking runtime and associated stats in query 
planning.
+ *
+ * There are two separate concepts we track:
+ *
+ * 1. Phases: These are broad scope phases in query planning, as listed below, 
i.e. analysis,
+ * optimizationm and physical planning (just planning).
+ *
+ * 2. Rules: These are the individual Catalyst rules that we track. In 
addition to time, we also
+ * track the number of invocations and effective invocations.
+ */
+object QueryPlanningTracker {
+
+  // Define a list of common phases here.
+  val PARSING = "parsing"
+  val ANALYSIS = "analysis"
+  val OPTIMIZATION = "optimization"
+  val PLANNING = "planning"
+
+  class RuleSummary(
+var totalTimeNs: Long, var numInvocations: Long, var 
numEffectiveInvocations: Long) {
+
+def this() = this(totalTimeNs = 0, numInvocations = 0, 
numEffectiveInvocations = 0)
+
+override def toString: String = {
+  s"RuleSummary($totalTimeNs, $numInvocations, $numEffectiveInvocations)"
+}
+  }
+
+  /**
+   * A thread local variable to implicitly pass the tracker around. This 
assumes the query planner
+   * is single-threaded, and avoids passing the same tracker context in every 
function call.
+   */
+  private val localTracker = new ThreadLocal[QueryPlanningTracker]() {
+override def initialValue: QueryPlanningTracker = null
+  }
+
+  /** Returns the current tra

[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

2018-11-21 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23096
  
Merging this. Feel free to leave more comments. I'm hoping we can wire this 
into the UI eventually.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23105: [SPARK-26140] Enable passing in a custom shuffle ...

2018-11-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23105#discussion_r235420647
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ShuffleReadMetrics.scala ---
@@ -122,34 +123,3 @@ class ShuffleReadMetrics private[spark] () extends 
Serializable {
 }
   }
 }
-
-/**
- * A temporary shuffle read metrics holder that is used to collect shuffle 
read metrics for each
- * shuffle dependency, and all temporary metrics will be merged into the 
[[ShuffleReadMetrics]] at
- * last.
- */
-private[spark] class TempShuffleReadMetrics {
--- End diff --

this was moved to TempShuffleReadMetrics


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23105: [SPARK-26140] Pull TempShuffleReadMetrics creatio...

2018-11-21 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/23105

[SPARK-26140] Pull TempShuffleReadMetrics creation out of shuffle reader

## What changes were proposed in this pull request?
This patch defines an internal Spark interface for reporting shuffle 
metrics and uses that in shuffle reader. Before this patch, shuffle metrics is 
tied to a specific implementation (using a thread local temporary data 
structure and accumulators). After this patch, callers that define their own 
shuffle RDDs can create a custom metrics implementation.

With this patch, we would be able to create a better metrics for the SQL 
layer, e.g. reporting shuffle metrics in the SQL UI, for each exchange operator.

## How was this patch tested?
No behavior change expected, as it is a straightforward refactoring. 
Updated all existing test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-26140

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23105.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23105


commit da253b57c14bc0174f0330ae6fa5d3a61647269b
Author: Reynold Xin 
Date:   2018-11-21T14:56:23Z

[SPARK-26140] Pull TempShuffleReadMetrics creation out of shuffle reader




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

2018-11-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23096#discussion_r235309483
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -648,7 +648,11 @@ class SparkSession private(
* @since 2.0.0
*/
   def sql(sqlText: String): DataFrame = {
-Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText))
+val tracker = new QueryPlanningTracker
--- End diff --

I don't think it makes sense to add random flags for everything. If the 
argument is that this change has a decent chance of introducing regressions 
(e.g. due to higher memory usage, or cpu overhead), then it would make a lot of 
sense to put it behind a flag so it can be disabled in production if that 
happens.

That said, the overhead on the hot code path here is substantially smaller 
than even transforming the simplest Catalyst plan (hash map look up is orders 
of magnitude cheaper than calling a partial function to transform a Scala 
collection for TreeNode), so I think the risk is so low that it does not 
warrant adding a config.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23100: [WIP][SPARK-26133][ML] Remove deprecated OneHotEncoder a...

2018-11-21 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23100
  
Change of this type can really piss some people off. Was there consensus on 
this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

2018-11-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23096#discussion_r235182105
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 ---
@@ -88,15 +101,20 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
 val startTime = System.nanoTime()
 val result = rule(plan)
 val runTime = System.nanoTime() - startTime
+val effective = !result.fastEquals(plan)
 
-if (!result.fastEquals(plan)) {
+if (effective) {
   queryExecutionMetrics.incNumEffectiveExecution(rule.ruleName)
   
queryExecutionMetrics.incTimeEffectiveExecutionBy(rule.ruleName, runTime)
   planChangeLogger.log(rule.ruleName, plan, result)
 }
 queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, 
runTime)
 queryExecutionMetrics.incNumExecution(rule.ruleName)
 
+if (tracker ne null) {
--- End diff --

if one calls execute directly tracker would be null.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

2018-11-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23096#discussion_r235162047
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 ---
@@ -88,15 +92,18 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
 val startTime = System.nanoTime()
 val result = rule(plan)
 val runTime = System.nanoTime() - startTime
+val effective = !result.fastEquals(plan)
 
-if (!result.fastEquals(plan)) {
+if (effective) {
   queryExecutionMetrics.incNumEffectiveExecution(rule.ruleName)
   
queryExecutionMetrics.incTimeEffectiveExecutionBy(rule.ruleName, runTime)
   planChangeLogger.log(rule.ruleName, plan, result)
 }
 queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, 
runTime)
 queryExecutionMetrics.incNumExecution(rule.ruleName)
 
+tracker.foreach(_.recordRuleInvocation(rule.ruleName, runTime, 
effective))
--- End diff --

yes! (not great -- but I'd probably remove the global tracker at some point)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

2018-11-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23096#discussion_r235161825
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -696,7 +701,7 @@ class Analyzer(
   s"avoid errors. Increase the value of 
${SQLConf.MAX_NESTED_VIEW_DEPTH.key} to work " +
   "around this.")
   }
-  executeSameContext(child)
+  executeSameContext(child, None)
--- End diff --

No great reason. I just used None for everything, except the top level, 
because it is very difficult to wire the tracker here without refactoring a lot 
of code.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

2018-11-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23096#discussion_r235161336
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala
 ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.util.BoundedPriorityQueue
+
+
+/**
+ * A simple utility for tracking runtime and associated stats in query 
planning.
+ *
+ * There are two separate concepts we track:
+ *
+ * 1. Phases: These are broad scope phases in query planning, as listed 
below, i.e. analysis,
+ * optimizationm and physical planning (just planning).
+ *
+ * 2. Rules: These are the individual Catalyst rules that we track. In 
addition to time, we also
+ * track the number of invocations and effective invocations.
+ */
+object QueryPlanningTracker {
+
+  // Define a list of common phases here.
+  val PARSING = "parsing"
--- End diff --

Mostly because Scala enum is not great, and I was thinking about making 
this a generic thing that's extensible.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

2018-11-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23096
  
cc @hvanhovell @gatorsmile 

This is different from the existing metrics for rules as it is query 
specific. We might want to replace that one with this in the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for query plan...

2018-11-20 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/23096

[SPARK-26129][SQL] Instrumentation for query planning time

## What changes were proposed in this pull request?
We currently don't have good visibility into query planning time (analysis 
vs optimization vs physical planning). This patch adds a simple utility to 
track the runtime of various rules and various planning phases.

## How was this patch tested?
Added unit tests and end-to-end integration tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-26129

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23096.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23096


commit b6a3d02f2c2b0eff71f92c3ede854edc3b5bf9f8
Author: Reynold Xin 
Date:   2018-11-20T16:22:35Z

[SPARK-26129][SQL] Instrumentation for query planning time




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of non-struct ty...

2018-11-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23054#discussion_r234569150
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1594,6 +1594,15 @@ object SQLConf {
 "WHERE, which does not follow SQL standard.")
   .booleanConf
   .createWithDefault(false)
+
+  val LEGACY_ALIAS_NON_STRUCT_GROUPING_KEY =
+buildConf("spark.sql.legacy.dataset.aliasNonStructGroupingKey")
--- End diff --

Maybe aliasNonStructGroupingKeyAsValue, and default to true.

Then we can remove this in the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >