[spark] branch branch-3.3 updated: [MINOR][TEST][SQL] Add a CTE subquery scope test case
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new aa39b06462a [MINOR][TEST][SQL] Add a CTE subquery scope test case aa39b06462a is described below commit aa39b06462a98f37be59e239d12edd9f09a25b88 Author: Reynold Xin AuthorDate: Fri Dec 23 14:55:14 2022 -0800 [MINOR][TEST][SQL] Add a CTE subquery scope test case ### What changes were proposed in this pull request? I noticed we were missing a test case for this in SQL tests, so I added one. ### Why are the changes needed? To ensure we scope CTEs properly in subqueries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is a test case change. Closes #39189 from rxin/cte_test. Authored-by: Reynold Xin Signed-off-by: Reynold Xin (cherry picked from commit 24edf8ecb5e47af294f89552dfd9957a2d9f193b) Signed-off-by: Reynold Xin --- .../test/resources/sql-tests/inputs/cte-nested.sql | 10 .../resources/sql-tests/results/cte-legacy.sql.out | 28 ++ .../resources/sql-tests/results/cte-nested.sql.out | 28 ++ .../sql-tests/results/cte-nonlegacy.sql.out| 28 ++ 4 files changed, 94 insertions(+) diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql index 5f12388b9cb..e5ef2443417 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql @@ -17,6 +17,16 @@ SELECT ( SELECT * FROM t ); +-- Make sure CTE in subquery is scoped to that subquery rather than global +-- the 2nd half of the union should fail because the cte is scoped to the first half +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte; + -- CTE in CTE definition shadows outer WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out index 264b64ffe96..ebdd64c3ac8 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out @@ -36,6 +36,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out index 2c622de3f36..b6e1793f7d7 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out @@ -36,6 +36,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out index 283f5a54a42..546ab7ecb95 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out @@ -36,6 +36,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisE
[spark] branch master updated: [MINOR][TEST][SQL] Add a CTE subquery scope test case
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 24edf8ecb5e [MINOR][TEST][SQL] Add a CTE subquery scope test case 24edf8ecb5e is described below commit 24edf8ecb5e47af294f89552dfd9957a2d9f193b Author: Reynold Xin AuthorDate: Fri Dec 23 14:55:14 2022 -0800 [MINOR][TEST][SQL] Add a CTE subquery scope test case ### What changes were proposed in this pull request? I noticed we were missing a test case for this in SQL tests, so I added one. ### Why are the changes needed? To ensure we scope CTEs properly in subqueries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is a test case change. Closes #39189 from rxin/cte_test. Authored-by: Reynold Xin Signed-off-by: Reynold Xin --- .../test/resources/sql-tests/inputs/cte-nested.sql | 10 .../resources/sql-tests/results/cte-legacy.sql.out | 28 ++ .../resources/sql-tests/results/cte-nested.sql.out | 28 ++ .../sql-tests/results/cte-nonlegacy.sql.out| 28 ++ 4 files changed, 94 insertions(+) diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql index 5f12388b9cb..e5ef2443417 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql @@ -17,6 +17,16 @@ SELECT ( SELECT * FROM t ); +-- Make sure CTE in subquery is scoped to that subquery rather than global +-- the 2nd half of the union should fail because the cte is scoped to the first half +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte; + -- CTE in CTE definition shadows outer WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out index 013c5f27b50..65000471c75 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out @@ -33,6 +33,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out index ed6d69b233e..2c67f2db56a 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out @@ -33,6 +33,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out index 6a48e1bec43..154ebd20223 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out @@ -33,6 +33,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +
svn commit: r46414 - /dev/spark/v3.1.1-rc3-bin/ /release/spark/spark-3.1.1/
Author: rxin Date: Tue Mar 2 11:00:12 2021 New Revision: 46414 Log: Moving Apache Spark 3.1.1 RC3 to Apache Spark 3.1.1 Added: release/spark/spark-3.1.1/ - copied from r46413, dev/spark/v3.1.1-rc3-bin/ Removed: dev/spark/v3.1.1-rc3-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r46413 - in /dev/spark: v3.1.1-rc3-bin/ v3.1.1-rc3-docs/
Author: rxin Date: Tue Mar 2 10:55:39 2021 New Revision: 46413 Log: Recover 3.1.1 RC3 Added: dev/spark/v3.1.1-rc3-bin/ - copied from r46410, dev/spark/v3.1.1-rc3-bin/ dev/spark/v3.1.1-rc3-docs/ - copied from r46410, dev/spark/v3.1.1-rc3-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r46411 - in /dev/spark: v3.1.1-rc3-bin/ v3.1.1-rc3-docs/
Author: rxin Date: Tue Mar 2 10:39:38 2021 New Revision: 46411 Log: Removing RC artifacts. Removed: dev/spark/v3.1.1-rc3-bin/ dev/spark/v3.1.1-rc3-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r46412 - in /dev/spark: v3.1.0-rc1-bin/ v3.1.0-rc1-docs/
Author: rxin Date: Tue Mar 2 10:39:58 2021 New Revision: 46412 Log: Removing RC artifacts. Removed: dev/spark/v3.1.0-rc1-bin/ dev/spark/v3.1.0-rc1-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r46410 - in /dev/spark: v3.1.1-rc2-bin/ v3.1.1-rc2-docs/
Author: rxin Date: Tue Mar 2 10:39:32 2021 New Revision: 46410 Log: Removing RC artifacts. Removed: dev/spark/v3.1.1-rc2-bin/ dev/spark/v3.1.1-rc2-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r46409 - in /dev/spark: v3.1.1-rc1-bin/ v3.1.1-rc1-docs/
Author: rxin Date: Tue Mar 2 10:39:25 2021 New Revision: 46409 Log: Removing RC artifacts. Removed: dev/spark/v3.1.1-rc1-bin/ dev/spark/v3.1.1-rc1-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r40088 - in /dev/spark: v3.0.0-rc1-bin/ v3.0.0-rc1-docs/ v3.0.0-rc2-bin/ v3.0.0-rc2-docs/ v3.0.0-rc3-docs/
Author: rxin Date: Thu Jun 18 16:41:27 2020 New Revision: 40088 Log: Removing RC artifacts. Removed: dev/spark/v3.0.0-rc1-bin/ dev/spark/v3.0.0-rc1-docs/ dev/spark/v3.0.0-rc2-bin/ dev/spark/v3.0.0-rc2-docs/ dev/spark/v3.0.0-rc3-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r40050 - /dev/spark/v3.0.0-rc3-bin/ /release/spark/spark-3.0.0/
Author: rxin Date: Tue Jun 16 09:18:02 2020 New Revision: 40050 Log: release 3.0.0 Added: release/spark/spark-3.0.0/ - copied from r40049, dev/spark/v3.0.0-rc3-bin/ Removed: dev/spark/v3.0.0-rc3-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] tag v3.0.0 created (now 3fdfce3)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to tag v3.0.0 in repository https://gitbox.apache.org/repos/asf/spark.git. at 3fdfce3 (commit) No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r39960 - in /dev/spark/v3.0.0-rc3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu
Author: rxin Date: Sat Jun 6 14:03:25 2020 New Revision: 39960 Log: Apache Spark v3.0.0-rc3 docs [This commit notification would consist of 1920 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r39959 - /dev/spark/v3.0.0-rc3-bin/
Author: rxin Date: Sat Jun 6 13:35:40 2020 New Revision: 39959 Log: Apache Spark v3.0.0-rc3 Added: dev/spark/v3.0.0-rc3-bin/ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.sha512 Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc Sat Jun 6 13:35:40 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7bh3gQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZjGIEACG3gsdARN8puRHS2YL+brOmjbrS4wVY/Av +l+ZR59moZ7QuwjYoixyqNnztIKgIyleYJq9DL5TqqMxFgGpuoDrnuWVqI+8MngVA +gau/QDmYINabZsJxFfDn1IjxxSQBsgf6pwfqQbB+fGSjLSPnDq+u3DIWr3fRMh4X +DrTuATNewKiiBIwQHUKAtPMAbsdDvXv0DRL7CGTiIJri43opAntQzHec3sP9hgRU +J5J2HnjOlamgv58S7zrUw/Wo1xPLmz2PGIsP0aq9DRRw0bLnesrtEaWAKFp2HL5E +QlbjfboaDQz/X+meruW57/sO/DDwA90/XvF44z4Gu6kbS8nRuTsU5wVfZ/1iyWZk +PLP2nFoWl7O85k/DLB5ADYgce3e6k2qD2obKxzsEx0nr0Wu13cxCR2+IBQmv05jb +4Kwi7iE0iKIxt3cESDH6j9GqZoTrcxt6Jb88KSQ+YM2TBNUr1ZZNmkjgYdmLvm7a +wH6vLtdpZzUKIGd6bt1grEwoQJBMnQjkoDYxhx+ugjbs8CwwxcdUNd2Q5xz0WaSn +p443ZlMR5lbGf6D6U4PUigaIrdD8d+ef/rRTDtXdoDqC+FdNuepyS9+2+dUZGErx +N2IMNunKIdKw57GZGcILey1hY45SSuQFw5JAe+nWqCAzCmFX72ulkv9The7rLdlE +YdLu6XQIBA== +=HhHH +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 == --- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 Sat Jun 6 13:35:40 2020 @@ -0,0 +1,3 @@ +SparkR_3.0.0.tar.gz: 394DCFEB 4E202A8E 5C58BF94 A77548FD 79A00F92 34538535 + B0242E1B 96068E3E 80F78188 D71831F8 4A350224 41AA14B1 + D72AE704 F2390842 DBEAB41F 5AC9859A Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc Sat Jun 6 13:35:40 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7bh3oQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZvhPD/9Vyrywk4kYFaUo36Kg6nickTvWMm+yDfGZ +tKrUo3dAoMta7OwUjT6W0Roo8a4BBgumaDv47Dm6mlquF2DuLuBrFsqFo8c5VNA/ +jT1tdSdHiTzjq7LfY9GQDn8Wkgp1gyIKON70XFdZifduW0gcFDkJ+FjhPYWcA6jy +GGOGK5qboCdi9C+KowUVj4VB9bbxPbWvW7FVF3+VlcrKvkmNx+EmqmIrqsh72w8O +EL70za2uBRUUiFcaOpY/wpmEN1raCAkMzQ+dPl7p1PFgmLFrMN9RaRXJ1stF+fXO +rDLBLNPqb85TvvOOHpcr4PSP38GrdZvDAvljCOEbBzacF719bewu/IVRcNi9lPZE +HDPUcZLgnocNIF6kafykrm3JhagzmPIhQ8d4DFTuH6ePxgWqdUa9lWKQL54z3mjU +LT2CJ8gMDY0Wz5zSKc/sI/ZwL+Q6U8xiIGYSzQgT9yPztbhDd5AM2DgohJkZSD4b +jOrEsSyNRJiwwRAHlbeOOVPb4UNYzsx1USPbPEBeXTt8X8VUb8jsU84o/RhXexk9 +EMJjxz/aChB+NefbmUjBZmXSaa/zYubprJrWnUgPw7hFxAnmtgIUdjSWSNIOJ6bp +EV1M6xwuvrmGhOa3D0C+lYyAuYZca2FQrcAtzNiL6iOMQ6USFZvzjxGWQiV2CDGQ +O8CNfkwOGA
svn commit: r39958 - /dev/spark/v3.0.0-rc3-bin/
Author: rxin Date: Sat Jun 6 11:18:32 2020 New Revision: 39958 Log: remove 3.0 rc3 binary Removed: dev/spark/v3.0.0-rc3-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (fa608b9 -> 3ea461d)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from fa608b9 [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns add 3fdfce3 Preparing Spark release v3.0.0-rc3 new 3ea461d Preparing development version 3.0.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing development version 3.0.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit 3ea461d61e635835c07bacb5a0c403ae2a3099a0 Author: Reynold Xin AuthorDate: Sat Jun 6 02:57:41 2020 + Preparing development version 3.0.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 39 files changed, 40 insertions(+), 40 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 3bad429..21f3eaa 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.0.0 +Version: 3.0.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index 0a52a00..8bef9d8 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index fa4fcb1f..fc1441d 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 14a1b7d..de2a6fb 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index e75a843..6c0c016 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 004af0a..b8df191 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index a35156a..8119709 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --g
[spark] 01/01: Preparing Spark release v3.0.0-rc3
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to tag v3.0.0-rc3 in repository https://gitbox.apache.org/repos/asf/spark.git commit 3fdfce3120f307147244e5eaf46d61419a723d50 Author: Reynold Xin AuthorDate: Sat Jun 6 02:57:35 2020 + Preparing Spark release v3.0.0-rc3 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 39 files changed, 40 insertions(+), 40 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 21f3eaa..3bad429 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.0.1 +Version: 3.0.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index 8bef9d8..0a52a00 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index fc1441d..fa4fcb1f 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index de2a6fb..14a1b7d 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 6c0c016..e75a843 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index b8df191..004af0a 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 8119709..a35156a 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/ta
[spark] tag v3.0.0-rc3 created (now 3fdfce3)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to tag v3.0.0-rc3 in repository https://gitbox.apache.org/repos/asf/spark.git. at 3fdfce3 (commit) This tag includes the following new commits: new 3fdfce3 Preparing Spark release v3.0.0-rc3 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r39951 - /dev/spark/v3.0.0-rc3-bin/
Author: rxin Date: Fri Jun 5 19:08:09 2020 New Revision: 39951 Log: Apache Spark v3.0.0-rc3 Added: dev/spark/v3.0.0-rc3-bin/ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.sha512 Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc Fri Jun 5 19:08:09 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7ag4gQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZpBZD/9vSiD946kwdMWalYM01Zw2yjKK60eakhLY +jxHRy1T6Yipspyh2idCrzd2MaGJFqUwRZjs1mpA/mKZUGRSzYFjlWWoaSc/T19MD +3q/zg6glgoKquzxHcAqum/OCc1C1MJTcsMic2+LIelXRoJ2GPCeECq91JGX4xpD4 +09sDElvooqfMCLb05gaaF8Eyrpm+7WSyAEVpb1Fjpp/gtdG1YQyiW3o3WzNSJgeA +dewZaSoI58lx3Rfs1jZN1M4Gyj1aKh4Yqw21+CDoHAhtkeOp5oGPgrWef4fZAE4D +4xKoz1I/5C1s0wIZEhUI2IUJLeGyCR117QhIO/bQFR1XEOO22auQaPppGJKUa5bb +bwpx6TARNP13fe2R48G+yZ9Em0uC3P1CucGYCRlY22umzkbalrVFeZ77n/FWRB7E +nC29bso/R2VwmDRI6yWXiCPLMyQy/PukniWRJZiU7Ath1930cORAlqFC7EOBHgHu +k3AVX/3h2qZBFuYu/wIsd89rgeiwrf4fksiuMhp8YXJh3xCLLSl4uT+q3flutJ3H +nsOLYkuie/r4qx+M2J7rfezTzTeYr+SN8mn4CTsGRznHhb0amqlZE6yNFWVatr6D +LEYWe9L3DK92Kj0Jtl5QyPXQlKSoBQriketgZXKxzeBScKeFd6acGxOhM5LpZRCo +ngKbsgfcoQ== +=bwFz +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 == --- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 Fri Jun 5 19:08:09 2020 @@ -0,0 +1,3 @@ +SparkR_3.0.0.tar.gz: 37496F1A C5BD0DFF 0F6B08B9 05CB55B7 DAA6397A 8C377126 + C6887AEB CB05F172 0E4A9754 9ED4B6B4 68E9266A 6459229F + 48D58F7C 9C0A58B1 183CC6D0 A18ACE18 Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc Fri Jun 5 19:08:09 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7ag4kQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZlwHD/9tPwfyzwQkl6qkYp27AgZexy5k15gjJ/Bi +MWWwv3bMhJiRlZN3hCyGC0QTTkRG+AJTd3SflbUhHzw9ttFAnt3VqZ7RZBB4UBDI +5W85jUaF5bOMu7K4hW2iZdcLLLbq7/sXNNqRhomQStL4j6TerZjgP8IytCGEmLX4 +Qt894N7+MunZxbPXKkUqZfO0cWlxY53+zNGqXKJdwDhQUrrH0i+2fs3gd97OJs42 +83l+pE27C7+aTr6fSRWIS55nw9GzKrDOr0N47wtfCs0mqIW+dI+cVjZh8W/Gf9Dl +EifAsLIpahNRpQLu0PqiWrsJ3meertha4DLWRPS0esYyZAGFK+DjD9Zm1cOovA9v +ywjQVWCkmaqaozvm2RTKxwvS7kkBB2dJPUJJ8YeCBr0A7wHBAIeA0vvWe9q7u0KW +O78uGswTF4EKz85ZMhuo8IjdjKjzTumzdFws4akeTzv60t+439zFdyhUghfQ71om +biS1Fgopz1QLqCb3eaqhMBM0ZB4JVMTtMKb2/gqH/8qaQq91CEkLTpOOsRK+xdeg +A8XoFCWEsBbHzLT3Y3FKsHC7ipo2FYXCcn/n/67bRuFFBwhLZzOyEISH72nKIk4k +YOU5wZnsykG2oiV3ZysRlYewtU0mIIuUINrMVRZB69CUk9Q2fnDyuT02OEGIoNZC +LohvgOFbqQ
svn commit: r39657 - in /dev/spark/v3.0.0-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu
Author: rxin Date: Mon May 18 16:11:38 2020 New Revision: 39657 Log: Apache Spark v3.0.0-rc2 docs [This commit notification would consist of 1921 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r39656 - /dev/spark/v3.0.0-rc2-bin/
Author: rxin Date: Mon May 18 15:42:56 2020 New Revision: 39656 Log: Apache Spark v3.0.0-rc2 Added: dev/spark/v3.0.0-rc2-bin/ dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz (with props) dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512 dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-rc2-bin/spark-3.0.0.tgz (with props) dev/spark/v3.0.0-rc2-bin/spark-3.0.0.tgz.asc dev/spark/v3.0.0-rc2-bin/spark-3.0.0.tgz.sha512 Added: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc Mon May 18 15:42:56 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7CmHgQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZllrEACaCgpeO1qK4uJLQC00J1iU2970iVn9Aqh/ +gZnikK7mBClXekg2Q8+poAhueXS1XfGoJfOCwTeOp8iMvD0BcLhIxftKBg7CxmOa +yKrtL/dehNyYMTWofxluZzolPR4O0DDNva2W6ExKPhrUAAOTPjPkMx9ty0C57IqO +Pwblsr6iI3BWrmRdN2Dpfo+enxJ1rd6H/0kYCmXEFgyW8lBbGiN23KrjkriZOJxo +6Ad8zFIEI+rSmmgvy6lkXdlJFduCmRFFZguRtWq48rYEY3pu6geIUetPMsosBnDW +mb5ywNMuqZomeEes1JoWp96E65K3HUO8LxPrP3wJY9TfUGduAAwwBX8nGsa0r+mz +JJq2f4zwvINM2eQGXIfcpg21K3ijqdkqylAKuBGiil5QcHABGQIQ6N1M+1ruKjKp +zHeXh6tac2IM3dvpyh12mC7ZhKPBAC1sUZD8qzvB6sjaHgvv3uSUc2xTW7kzs8l2 +mwNT8SmCscR6+PAm29dY6CoRtVtDEygt+oOMhRkturaDQ9vtYgduKo+p6PiqffUE +7SUKwk7a3Cqe46uxHabHdi+6NedFuX7/bPSAX51Q4MpeHC8l4HpgHDPodtfRcEQm +VDSeLBfhs3WHi+OrqZ2et/EYaGFxiZTTi2PfpeMBPmC4d4k+yymZEenJcXVps7+G +fFFeOvCfyQ== +=2zdl +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512 == --- dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512 Mon May 18 15:42:56 2020 @@ -0,0 +1,3 @@ +SparkR_3.0.0.tar.gz: B50B8062 8C2158C5 5931EB47 275FB32D 52EFF715 F3B39524 + 29C03A21 583459D5 32EC2135 D27AB970 0F345B7A 620E4281 + 950CC383 58231D1D BB08817C 4EDC6A05 Added: dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc Mon May 18 15:42:56 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7CmHoQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9Zn7ED/9Ujdr6jmTAFbtJtJiaDCevVGDhoND+9wca +4MEaUYecgrYWSx12YBZe+d4nIbTuVWK6X29C76E/wbwREWFqG1fA17P7ZpBh8x3W +xHSfzyYAP6G63I6IC+7jiHkOIOYBScGKj9h6z5j39eqt05HGAv088YEeTMpAC32B +GbACEglWGgrE3JsrKXf77hIU8AizcE6rhS5OapqWdxFoqTHbxgjg3uJjsxVKsMXG +wchOtedVfcDZihoqrPoO+pwjP8LIt+iv53luaUJowosC8K62OcjL1ay9Gw4a8KMQ +9pEr9HgjAj9abel0q+ic4reLcCh+bjFSBzXR8/uJHjmSsWHNlwyXJq5Ymff7T2xJ +s75vYuHI9bcOqqb2X1r5TY6v34p13PzKuzL7Y5la1ZCPo0nXjCne5NcSTxu9sQY5 +jl9BsVwWONGSZHsNlW6dy3XeXRaAFAPDCHJvqEsP8cgxMd9ryLG2niITVBGrs3jV +Q3ylNTsM5G7/As6PR5hYYmTqCBBXJWizJmENMJq0zXinNe83ycWmKikACUXtBDlO +qfRr3op3DAxdcNWbfCG7l9Ifoyr6w7HYDHEA6mMSsZ0MSSaiWcnhBc4ul5P4JUN8 +1p9/4o2WV6lfT2c6VmCfx4W4d5w3pgEVRHakvGzXE59datTZs1AQREG9G87jEd7R +wv/RT1q+dA
[spark] branch branch-3.0 updated (740da34 -> f6053b9)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 740da34 [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters add 29853ec Preparing Spark release v3.0.0-rc2 new f6053b9 Preparing development version 3.0.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing Spark release v3.0.0-rc2
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to tag v3.0.0-rc2 in repository https://gitbox.apache.org/repos/asf/spark.git commit 29853eca69bceefd227cbe8421a09c116b7b753a Author: Reynold Xin AuthorDate: Mon May 18 13:21:37 2020 + Preparing Spark release v3.0.0-rc2 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 39 files changed, 40 insertions(+), 40 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 21f3eaa..3bad429 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.0.1 +Version: 3.0.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index 8bef9d8..0a52a00 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index fc1441d..fa4fcb1f 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index de2a6fb..14a1b7d 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 6c0c016..e75a843 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index b8df191..004af0a 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 8119709..a35156a 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/ta
[spark] tag v3.0.0-rc2 created (now 29853ec)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to tag v3.0.0-rc2 in repository https://gitbox.apache.org/repos/asf/spark.git. at 29853ec (commit) This tag includes the following new commits: new 29853ec Preparing Spark release v3.0.0-rc2 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing development version 3.0.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit f6053b94f874c62856baa7bfa35df14c78bebc9f Author: Reynold Xin AuthorDate: Mon May 18 13:21:43 2020 + Preparing development version 3.0.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 39 files changed, 40 insertions(+), 40 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 3bad429..21f3eaa 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.0.0 +Version: 3.0.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index 0a52a00..8bef9d8 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index fa4fcb1f..fc1441d 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 14a1b7d..de2a6fb 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index e75a843..6c0c016 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 004af0a..b8df191 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index a35156a..8119709 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --g
svn commit: r38759 - in /dev/spark/v3.0.0-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu
Author: rxin Date: Tue Mar 31 13:45:27 2020 New Revision: 38759 Log: Apache Spark v3.0.0-rc1 docs [This commit notification would consist of 1911 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r38754 - /dev/spark/v3.0.0-rc1-bin/
Author: rxin Date: Tue Mar 31 09:57:10 2020 New Revision: 38754 Log: Apache Spark v3.0.0-rc1 Added: dev/spark/v3.0.0-rc1-bin/ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.sha512 Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc Tue Mar 31 09:57:10 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6C/0sQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZtCiD/9GtNXfxGR9oh2B4k+fg38uCrloGUYo3Dx9 +eJU6G55fbKtXK24dKlxZQCVDpwLihycnLULcV+/D75vWa4tSoG6n/FTHimCnUJWQ +UkEsxqhWuGi25rUx4VsOQeHPYIP9/2pVGVyanFzRp+yAyldATGG36u3Xv5lqox6b +6pARVwC6FZWKuk1b47xbRfYKUoNTkObhGjcKKyigexqx/nZOp99NP+sVlEqRD/l/ +B7l3kgAVq3XlZKUCkMhWgAHT6rPNkvwBdYZFce9gJHuG75Zw5rQ2hHesEqDOVlC1 +kqJPtpmb2U93ItBF6ArlmXcm+60rLa++B8cyrEsKLIyYxRpHH1bQmLB9TTzDeFpz +e+WWlUiDpC1Lorzvg+44MeOXSj9EhNgqsYypGKhlh6WTN8A+BRzvJRMpDMLElRz6 +lHaceqn9NC4eE5tzcyXAFL+8Y644nCTIZQuND72LvIv7rO0YXq/6yeudM+SDeANU +vscR4LiQ7/a3oSpxoIuA0MjKz6gWUaYFgsb8OuUC4VQPJKQZG+57SOazq1VTlB6/ +Ur8pePIUxU52EmzmIp08ws8v+NOo9pMxw7lyBwpmGX0/ax6p9v1xVcCeXqH4HYvA +9d7a7hZy9yoguAGsVkibSym8e6XITCDoXLb9/HPEhfdyxFgi87DVjKZ84HkyFw9/ +OzHhumSp/Q== +=zl/N +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 == --- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 Tue Mar 31 09:57:10 2020 @@ -0,0 +1,3 @@ +SparkR_3.0.0.tar.gz: C2D9C0A5 E71C5B56 48AC15AA 998ABD06 2FDB4D5C D2B7C344 + B1949A7B 28508364 A9A45767 F2642F17 7EBFF4B0 55823EBD + BE76A2CE 5604660F 62D1654D 8271287B Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc Tue Mar 31 09:57:10 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6C/0wQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZkfTD/4zQ5FuCr+giluZHaBnaZy7PAtSkoTjAWKX +8zObXESsoTlIIjHEpBUmUU6O0tZODFOF7Zau9HkftroGurYxpTWE5nX0e//71JuC +smBWLCgAeOlNEdeZUd2zm7pPWJfwRpsOcEfexb+RvaFQriw559Erxb5NoWHFIkg/ +tsjtjitMqLxcMlzZW7A/89zqmrnzBu1vhh/q8STzA0Ub6Jq+JzD4e6yatYAzjRj3 ++Um7+NL+g/2tmweH8f9TtYzQFcowm6DdXi53fWZX55oVc1xBRTNuSnAdCJlkgEPg +nUxEcuXUvHn/NbNNHPBwP6xMKyKqJu8+4vNLzr2ZxaxArPYF2FqTl8sFNxwVBM1Y +PnKun7iZiLq5JqC2OopiDa8FJP0JQkYVyBWAx3BOscsAELfdlZHlPdekcLE6YHHV +pde79YJ0tzUFIdH/Ulw4Jag4Ixunrg+ajmLS8n9ncpX0I81Zv8IJDaBf0cBboFw8 +kTqAvNkcsoGdRn1OiQnlE2IUib/R0fk7MktOyoZpfKzbCzxBZgLTO4FKTbRCydQX +I8UhuRhELHCI7YXJHwbk0Swp6+h36dUQtLxFfD/OZdDQABOK+nEVjNsBIHb7ULDB +pCckj8HBHwaynvNLogS1KJHThW8LEXAmVQFCD39XTNMnhfCUePyzlAC4RPByIFR4 +yD6VQ7bJDA
svn commit: r38753 - /dev/spark/v3.0.0-rc1-bin/
Author: rxin Date: Tue Mar 31 07:25:15 2020 New Revision: 38753 Log: retry Removed: dev/spark/v3.0.0-rc1-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r38740 - /dev/spark/v3.0.0-rc1-bin/
Author: rxin Date: Mon Mar 30 16:00:46 2020 New Revision: 38740 Log: Apache Spark v3.0.0-rc1 Added: dev/spark/v3.0.0-rc1-bin/ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.sha512 Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc Mon Mar 30 16:00:46 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6CCPMQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9Zr8LD/9WOO4mDufkmhhXk78zWAyhRjJpG0Kjuvla +KEnx8MK4MUtr77cQsmVLgj+FXFwmUvtZTZXHJX704Jk6xAAFXzii4EwIfk46wka0 +CY0arEleHJ6MBohLbOVW3sp86LduQBBd+dmBbIh7spJjd054RRqsAe8sVx0uqezD +y4Fv+LM0B7kQhHdhsYymVClAwgwKOwecdks0l9PonE9YwyJixMEOZwxxk4aaRNwR +VUH6X4mHlpWiQ+zHWTAmE7aOvjOwxQqciqtmgzLLRlDjuTtz160XLthUneoOVoDw +spphs7pMpj8r4T9BZQCeIiuRvE5VeT6037Uz03X56xhzEvna9+0/frHR/Vb88gW8 +U5YJio4p8h286vLwb0X48K7lyfd60VM0kyfh31xl1ZppdAFXhV9qA7435wn6R4NU +1zi/oXnHOgAWW037C+QFXpPnKzCY3BpmLw3uAGMgYRA+2NqrAT2HE8vmnlxJkrBS +JT3OlJCCkIw2yitPN5zZaWZLpbvT07wFEH8KFoh7Wgs4FBl1mDeyGT53RhbSHjy1 ++i85E6g9366CZNoD3bSUlPlY9iOtP4QK4Qp+VOn1j13Bu3BE9Fpuprani1ESsGME +16qzwf5It3TVWK9czXqa8HBJvlrjaEInloWThmSysYFweKIRT+8CEu9+KyakTKVL +fnGKXfbXzQ== +=0ZBt +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 == --- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 Mon Mar 30 16:00:46 2020 @@ -0,0 +1,3 @@ +SparkR_3.0.0.tar.gz: A4828C8D BA3BA1AA 116EEA62 D7028B85 85FF87AE 8AE9F0B5 + 421F1A3E E5E04F19 F1D4F0A6 144CEF29 8D690FC8 D9836830 + 4518FF9E 96004114 1083326B 84B5C0EC Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc Mon Mar 30 16:00:46 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6CCPUQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZmRGD/9UkePDo4IawkYALJoaqpwnjp1Md3RP5dbK +l/x1VLfHzAkbYQo+tKe692koHo45tE0izt+99humvZT7SjP4sVPHuR16Ik0gE6h0 +Yn8CG4Qsof30Se9feg6EllACBDEvueGlcchHN+aPyYJoLjajAzfH/5P6fC9rHe5Z +d3aYd93cqYtIKbDtQ6fxnI387wTmWkVKAXWNB7K5iEB8KFjzCjGeyac5JbnYBC6G +Y9uWcxqQ+3XV2SIfDQuxFuj421RBx2IIu56qJLgVEzcs8yLh4APM29DfYv7YcRGg +ILex3j8SWjgqG1rdDhc2U/SeakR/rErJ+oebxD9dTC19wMTnp37cgS0HgtWLHaU2 +RvxaMdAvF3GjN2LFhSRht/uZV350O3EI+L6ye9WauXzaK4iD7Mi5x7BIBN1csNWn +MW0B+goqTpzvC78h5R2ETCw1xmAarjKmdLKf3AUuqGeobv/7+4sLuwq+PSyrTgUi +BHPIgkYYk+EhHryB6wLkKYRXWKKmMyGCl+5HLYPuY4GyZm4rwc2et8v1pX3RvcCF +NoOcg/TZgn6+Tz0OjUm4TARs9RkbJEhKk1EWKCFvPalhenLbHHOvDJJPoqp3LNVT +/HQ1f1JRWqXWfc/O1BR9CRFNbZTxKorPxMXIEYn583lufZyvWiyAnYKD6ev0UAdB +/iwwQeeM/Q
[spark] 01/01: Preparing development version 3.0.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit fc5079841907443369af98b17c20f1ac24b3727d Author: Reynold Xin AuthorDate: Mon Mar 30 08:42:27 2020 + Preparing development version 3.0.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 39 files changed, 40 insertions(+), 40 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index c8cb1c3..3eff30b 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.0.0 +Version: 3.0.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index 0a52a00..8bef9d8 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index fa4fcb1f..fc1441d 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 14a1b7d..de2a6fb 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index e75a843..6c0c016 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 004af0a..b8df191 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index a35156a..8119709 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --g
[spark] branch branch-3.0 updated (5687b31 -> fc50798)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 5687b31 [SPARK-30532] DataFrameStatFunctions to work with TABLE.COLUMN syntax add 6550d0d Preparing Spark release v3.0.0-rc1 new fc50798 Preparing development version 3.0.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 39 files changed, 40 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] tag v3.0.0-rc1 created (now 6550d0d)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to tag v3.0.0-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git. at 6550d0d (commit) This tag includes the following new commits: new 6550d0d Preparing Spark release v3.0.0-rc1 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing Spark release v3.0.0-rc1
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to tag v3.0.0-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git commit 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1 Author: Reynold Xin AuthorDate: Mon Mar 30 08:42:10 2020 + Preparing Spark release v3.0.0-rc1 --- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 2 +- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 38 files changed, 38 insertions(+), 38 deletions(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index 193ad3d..0a52a00 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index a1c8a8e..fa4fcb1f 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 163c250..14a1b7d 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index a6d9981..e75a843 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 76a402b..004af0a 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 3c3c0d2..a35156a 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/tags/pom.xml b/common/tags/pom.xml index 883b73a..dedc7df 100644 --- a/common/tags/pom.xml +++ b/common/tags/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml index 93a4f67..ebb0525 100644 --- a/common/unsafe/pom.xml +++ b/common/unsafe/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git
svn commit: r38725 - /dev/spark/KEYS
Author: rxin Date: Mon Mar 30 07:26:00 2020 New Revision: 38725 Log: Update KEYS Modified: dev/spark/KEYS Modified: dev/spark/KEYS == --- dev/spark/KEYS (original) +++ dev/spark/KEYS Mon Mar 30 07:26:00 2020 @@ -1167,3 +1167,61 @@ rMA+YcuC9o2K7dKjVv3KinQ2Tiv4TVxyTjcyZurg 0TbepIdiQlc= =wdlY -END PGP PUBLIC KEY BLOCK- + +pub rsa4096 2020-03-30 [SC] + 4A8BDA48E6E212A734632502DEA963E2E9347D66 +uid [ultimate] Reynold Xin (CODE SIGNING KEY) +sub rsa4096 2020-03-30 [E] + +-BEGIN PGP PUBLIC KEY BLOCK- + +mQINBF6BkJkBEACmRKcV6c575E6jOyZBwLteV7hJsETNYx9jMkENiyeyTFJ3A8Hg ++gPAmoU6jvzugR98qgVSH0uj/HZH1zEkJx049+OHwBcZ48mGJakIaKcg3k1CPRTL +VDRWg7M4P7nQisMHsPHrdGPJFVBE7Mn6pafuRZ46gtnXf2Ec1EsvMBOYjRNt6nSg +GvoQdiv5SjUuwxfrw7CICj1agxwLarBcWpIF6PMU7yG+XjTIrSM63KuuV+fOZvKM +AdjwwUNNj2aOkprPHfmFIgSnEMsxvoJQNqYTaWzwT8WAyW1qTd0LhYYDTnb4J+j2 +BxgG5ASHYpsLQ1Moy+lYsTxWsoZMvqTqv/h+Mlb8fiUTiYppeMnLzxtI/t8Trvt8 +rXNGSkNd8dM5uqJ9Ba2MS6UB6EZUd5e7aPy8z5ThlhygRjLk0527O4BYAWlZw5F8 +egq/X0liCeRHoFUsyNnuQYSqo2spdTIV2ExKo/hEF1FgbXF6s1v/TcfzS0PkSYEH +5yhKYoEkYOXIneIjUasy8xM9O2578NsVu1GH0n+E29KDA0w+QKwpbjgb9VWKCjk1 +CPvK7oi3DKA4A28w/h5jI9Xzb343L0gb+IhdgL5lNWp2HoSy+y7Smnbz6IchjAP7 +zCtQ9ZJCLdXgCtDlXUeF+TXzEfKUYwa0jnha/fArM3PVGvQlWdpVhe/oLQARAQAB +tDBSZXlub2xkIFhpbiAoQ09ERSBTSUdOSU5HIEtFWSkgPHJ4aW5AYXBhY2hlLm9y +Zz6JAk4EEwEIADgWIQRKi9pI5uISpzRjJQLeqWPi6TR9ZgUCXoGQmQIbAwULCQgH +AgYVCgkICwIEFgIDAQIeAQIXgAAKCRDeqWPi6TR9ZrBJEACW92VdruNL+dYYH0Cu +9oxZx0thCE1twc/6rvgvIj//0kZ4ZA6RoDId8vSmKSkB0GwMT7daIoeIvRTiEdMQ +Wai7zqvNEdT1qdNn7MfN1rveN1tBNVndzbZ8S8Nz4sqZ/8R3wG90c2XLwno3joXA +FhFRfVa+TWI1Ux84/ZXuzD14f54dorVo0CT51CnU67ERBAijl7UugPM3Fs7ApU/o +SWCMq7ScPde81jmgMqBDLcj/hueCOTU5m8irOGGY439qEF+H41I+IB60yzAS4Gez +xZl55Mv7ZKdwWtCcwtUYIm4R8NNu4alTxUpxw4ttRW3Kzue78TOIMTWTwRKrP5t2 +yq9bMT1fSO7h/Ntn8dXUL0EM/h+6k5py5Kr0+mrV/s0Z530Fit6AC/ReWV6hSGdk +F1Z1ECa4AoUHqtoQKL+CNgO2qlJn/sKj3g10NiSwqUdUuxCSOpsY72udRLG9tfkB +OwW3lTKLp66gYYE3nYaHzJKGdRs7aJ8RRALMQkadsyqpdVMp+Yvbj/3Hn3uB3jTt +S+RolH545toeuhXaiIWlm2434oHW6QjzpPwaNp5AiWm+vMfPkhhCX6WT0jv9nEtM +kJJVgwlWNKYEW9nLaIRMWWONSy9aJapZfLW0XDiKidibPHqNFih9z49eDVLobi5e +mzmOFkKFxs9D4sg9oVmId6Y9SbkCDQRegZCZARAA5ZMv1ki5mKJVpASRGfTHVH5o +9HixwJOinkHjSK3zFpuvh0bs+rKZL2+TUXci9Em64xXuYbiGH3YgH061H9tgAMaN +iSIFGPlbBPbduJjdiUALqauOjjCIoWJLyuAC25zSGCeAwzQiRXN6VJUYwjQnDMDG +8iUyL+IdXjq2T6vFVZGR/uVteRqqvEcg9km6IrFmXefqfry4hZ5a7SbmThCHqGxx +5Oy+VkWw1IP7fHIUdC9ie45X6n08yC2BfWI4+RBny8906pSXEN/ag0Yw7vWkiyuK +wZsoe0pRczV8mx6QF2+oJjRMtziKYW72jKE9a/DXXzQ3Luq5gyZeq0cluYNGHVdj +ijA2ORNLloAfGjVGRKVznUFN8LMkcxm4jiiHKRkZEcjgm+1tRzGPufFidyhQIYO2 +YCOpnPQh5IXznb3RZ0JqJcXdne+7Nge85URTEMmMyx5kXvD03ZmUObshDL12YoM3 +bGzObo6jYg+h38Xlx9+9QAwGkf+gApIPI8KqPAVyP6s60AR4iR6iehEOciz7h6/b +T9bKMw0w9cvyJzY1IJsy2sQYFwNyHYWQkyDciRAmIwriHhBDfXdBodF95V3uGbIp +DZw3jVxcgJWKZ3y65N1aCguEI1fyy9JU12++GMBa+wuv9kdhSoj2qgInFB1VXGC7 +bBlRnHB44tsFTBEqqOcAEQEAAYkCNgQYAQgAIBYhBEqL2kjm4hKnNGMlAt6pY+Lp +NH1mBQJegZCZAhsMAAoJEN6pY+LpNH1mwIYQAIRqbhEjL6uMxM19OMPDydbhiWoI +8BmoqzsvRNF9VidjPRicYJ5JL5FFvvTyT6g87L8aRhiAdX/la92PdJ9DTS3sfIKF +pIcUDFybKgk4pmGWl0fNIwEjHewf6HlndCFmVuPe32V/ZkCwb58dro15xzxblckB +kgsqb0Xbfz/3Iwlqr5eTKH5iPrDFcYKy1ODcFmXS+udMm5uwn+d/RNmj8B3kgwrw +brs53264qdWbfsxGPC1ZkDNNSRyIy6wGvc/diRm4TSV/Lmd5OoDX4UkPJ++JhGoO +cYKxc2KzrEZxzMgJ3xFRs3zeymOwtgXUU1GBCuD7uxr1vacFwUV+9ymTeyUdTxB3 ++/DzxYOJGQL/3IXlyQ2azoCWUpCjW0MFM1OolragOFJeQ+V0xrlOiXXAFfHo0KPG +y0QdK810Ok+XYR6U9Y7yb6tYDgi+w9r46XjurdiZnUxxLUpFG++tSgBQ5X4y2UGw +C4n0T8/jn6KIUZ0kx51ZZ6CEChjBt+AU+HCnw2sZfgq8Nlos95tw2MT6kn8BrY68 +n297ev/1T6B0OasQaw3Itw29+T+FdzdU4c6XW/rC6VAlBikWIS5zCT//vAeBacxL +HYoqwKL52HzG121lfWXhx5vNF4bg/fKrFEOy2Wp1fMG6nRcuUUROvieD6ZU4ZrLA +NjpTIP+lOkfxRwUi +=rggH +-END PGP PUBLIC KEY BLOCK- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch test-branch deleted (was 0f8b07e)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to branch test-branch in repository https://gitbox.apache.org/repos/asf/spark.git. was 0f8b07e test This change permanently discards the following revisions: discard 0f8b07e test - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch test-branch created (now 0f8b07e)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to branch test-branch in repository https://gitbox.apache.org/repos/asf/spark.git. at 0f8b07e test This branch includes the following new commits: new 0f8b07e test The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: test
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch test-branch in repository https://gitbox.apache.org/repos/asf/spark.git commit 0f8b07e5034af2819b75b53aadffda82ae0c31b8 Author: Reynold Xin AuthorDate: Fri Feb 1 13:28:18 2019 -0800 test --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 271f2f5..2c1e02a 100644 --- a/README.md +++ b/README.md @@ -39,7 +39,7 @@ For general development tips, including info on developing Spark using an IDE, s The easiest way to start using Spark is through the Scala shell: -./bin/spark-shell +./bin/spark-shella Try the following command, which should return 1000: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23207 ```var writer: ShuffleWriter[Any, Any] = null try { val manager = SparkEnv.get.shuffleManager writer = manager.getWriter[Any, Any]( dep.shuffleHandle, partitionId, context, context.taskMetrics().shuffleWriteMetrics) writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]]) writer.stop(success = true).get } catch { case e: Exception => try { if (writer != null) { writer.stop(success = false) } } catch { case e: Exception => log.debug("Could not stop writer", e) } throw e }``` Can we put the above in a closure and pass it into shuffle dependency? Then in SQL we just put the above in SQL using custom metrics. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239308829 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -170,13 +172,23 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared val df = testData2.groupBy().agg(collect_set('a)) // 2 partitions testSparkPlanMetrics(df, 1, Map( 2L -> (("ObjectHashAggregate", Map("number of output rows" -> 2L))), + 1L -> (("Exchange", Map( +"shuffle records written" -> 2L, +"records read" -> 2L, +"local blocks fetched" -> 2L, --- End diff -- yea i'd just change the display text here, and not change the api --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239308706 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -95,3 +96,59 @@ private[spark] object SQLShuffleMetricsReporter { FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait time"), RECORDS_READ -> SQLMetrics.createMetric(sc, "records read")) } + +/** + * A shuffle write metrics reporter for SQL exchange operators. Different with + * [[SQLShuffleReadMetricsReporter]], we need a function of (reporter => reporter) set in + * shuffle dependency, so the local SQLMetric should transient and create on executor. + * @param metrics Shuffle write metrics in current SparkPlan. + * @param metricsReporter Other reporter need to be updated in this SQLShuffleWriteMetricsReporter. + */ +private[spark] case class SQLShuffleWriteMetricsReporter( +metrics: Map[String, SQLMetric])(metricsReporter: ShuffleWriteMetricsReporter) + extends ShuffleWriteMetricsReporter with Serializable { + @transient private[this] lazy val _bytesWritten = +metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_BYTES_WRITTEN) + @transient private[this] lazy val _recordsWritten = +metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_RECORDS_WRITTEN) + @transient private[this] lazy val _writeTime = +metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_WRITE_TIME) + + override private[spark] def incBytesWritten(v: Long): Unit = { +metricsReporter.incBytesWritten(v) +_bytesWritten.add(v) + } + override private[spark] def decRecordsWritten(v: Long): Unit = { +metricsReporter.decBytesWritten(v) +_recordsWritten.set(_recordsWritten.value - v) + } + override private[spark] def incRecordsWritten(v: Long): Unit = { +metricsReporter.incRecordsWritten(v) +_recordsWritten.add(v) + } + override private[spark] def incWriteTime(v: Long): Unit = { +metricsReporter.incWriteTime(v) +_writeTime.add(v) + } + override private[spark] def decBytesWritten(v: Long): Unit = { +metricsReporter.decBytesWritten(v) +_bytesWritten.set(_bytesWritten.value - v) + } +} + +private[spark] object SQLShuffleWriteMetricsReporter { + val SHUFFLE_BYTES_WRITTEN = "shuffleBytesWritten" + val SHUFFLE_RECORDS_WRITTEN = "shuffleRecordsWritten" + val SHUFFLE_WRITE_TIME = "shuffleWriteTime" --- End diff -- yea i think we can just report ms level granularity. no point reporting ns (although we might want to measure based on ns) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239308197 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -95,3 +96,59 @@ private[spark] object SQLShuffleMetricsReporter { FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait time"), RECORDS_READ -> SQLMetrics.createMetric(sc, "records read")) } + +/** + * A shuffle write metrics reporter for SQL exchange operators. Different with + * [[SQLShuffleReadMetricsReporter]], we need a function of (reporter => reporter) set in + * shuffle dependency, so the local SQLMetric should transient and create on executor. + * @param metrics Shuffle write metrics in current SparkPlan. + * @param metricsReporter Other reporter need to be updated in this SQLShuffleWriteMetricsReporter. + */ +private[spark] case class SQLShuffleWriteMetricsReporter( +metrics: Map[String, SQLMetric])(metricsReporter: ShuffleWriteMetricsReporter) --- End diff -- why are there two parameter list here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239308082 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -38,12 +38,18 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode override def outputPartitioning: Partitioning = SinglePartition override def executeCollect(): Array[InternalRow] = child.executeTake(limit) private val serializer: Serializer = new UnsafeRowSerializer(child.output.size) - override lazy val metrics = SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext) + private val writeMetrics = SQLShuffleWriteMetricsReporter.createShuffleWriteMetrics(sparkContext) --- End diff -- why is metrics lazy val and this one val? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239308007 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -38,12 +38,18 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode override def outputPartitioning: Partitioning = SinglePartition override def executeCollect(): Array[InternalRow] = child.executeTake(limit) private val serializer: Serializer = new UnsafeRowSerializer(child.output.size) - override lazy val metrics = SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext) + private val writeMetrics = SQLShuffleWriteMetricsReporter.createShuffleWriteMetrics(sparkContext) + override lazy val metrics = --- End diff -- this is somewhat confusing. I'd create a variable for the read metrics so you can pass just that into the ShuffledRDD. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23207 @xuanyuanking can you separate the prs to rename read side metric and the write side change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238845399 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -299,12 +312,25 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value") val df2 = (1 to 10).map(i => (i, i.toString)).toSeq.toDF("key", "value") // Assume the execution plan is - // ... -> ShuffledHashJoin(nodeId = 1) -> Project(nodeId = 0) + // Project(nodeId = 0) + // +- ShuffledHashJoin(nodeId = 1) + // :- Exchange(nodeId = 2) + // : +- Project(nodeId = 3) + // : +- LocalTableScan(nodeId = 4) + // +- Exchange(nodeId = 5) + // +- Project(nodeId = 6) + // +- LocalTableScan(nodeId = 7) val df = df1.join(df2, "key") testSparkPlanMetrics(df, 1, Map( 1L -> (("ShuffledHashJoin", Map( "number of output rows" -> 2L, - "avg hash probe (min, med, max)" -> "\n(1, 1, 1)" + "avg hash probe (min, med, max)" -> "\n(1, 1, 1)"))), +2L -> (("Exchange", Map( + "shuffle records written" -> 2L, + "records read" -> 2L))), --- End diff -- is this always going to be the same as "shuffle records written" ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238845029 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -170,13 +172,23 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared val df = testData2.groupBy().agg(collect_set('a)) // 2 partitions testSparkPlanMetrics(df, 1, Map( 2L -> (("ObjectHashAggregate", Map("number of output rows" -> 2L))), + 1L -> (("Exchange", Map( +"shuffle records written" -> 2L, +"records read" -> 2L, +"local blocks fetched" -> 2L, --- End diff -- i think we should be consistent and name these "read", rather than "fetch". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238843017 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -163,6 +171,8 @@ object SQLMetrics { Utils.bytesToString } else if (metricsType == TIMING_METRIC) { Utils.msDurationToString + } else if (metricsType == NANO_TIMING_METRIC) { +duration => Utils.msDurationToString(duration / 10) --- End diff -- is this the right conversion from nanosecs to millisecs? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238842276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -78,6 +78,7 @@ object SQLMetrics { private val SUM_METRIC = "sum" private val SIZE_METRIC = "size" private val TIMING_METRIC = "timing" + private val NANO_TIMING_METRIC = "nanosecond" --- End diff -- ns --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238837000 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter { private[spark] def decBytesWritten(v: Long): Unit private[spark] def decRecordsWritten(v: Long): Unit } + + +/** + * A proxy class of ShuffleWriteMetricsReporter which proxy all metrics updating to the input + * reporters. + */ +private[spark] class GroupedShuffleWriteMetricsReporter( +reporters: Seq[ShuffleWriteMetricsReporter]) extends ShuffleWriteMetricsReporter { + override private[spark] def incBytesWritten(v: Long): Unit = { +reporters.foreach(_.incBytesWritten(v)) + } + override private[spark] def decRecordsWritten(v: Long): Unit = { +reporters.foreach(_.decRecordsWritten(v)) + } + override private[spark] def incRecordsWritten(v: Long): Unit = { +reporters.foreach(_.incRecordsWritten(v)) + } + override private[spark] def incWriteTime(v: Long): Unit = { +reporters.foreach(_.incWriteTime(v)) + } + override private[spark] def decBytesWritten(v: Long): Unit = { +reporters.foreach(_.decBytesWritten(v)) + } +} + + +/** + * A proxy class of ShuffleReadMetricsReporter which proxy all metrics updating to the input + * reporters. + */ +private[spark] class GroupedShuffleReadMetricsReporter( --- End diff -- Again - I think your old approach is much better. No point creating a general util when there is only one implementation without any known future needs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238836448 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter { private[spark] def decBytesWritten(v: Long): Unit private[spark] def decRecordsWritten(v: Long): Unit } + + +/** + * A proxy class of ShuffleWriteMetricsReporter which proxy all metrics updating to the input + * reporters. + */ +private[spark] class GroupedShuffleWriteMetricsReporter( --- End diff -- I'd not create a general API here. Just put one in SQL similar to the read side that also calls the default one. It can be expensive to go through a seq for each record and bytes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23171 Basically logically there are only two expressions: In which handles arbitrary expressions, and InSet which handles expressions with literals. Both could work: (1) we provide two separate expressions for InSet, one using switch, and one using hashset, or (2) we just provide one InSet and internally in InSet have two implementations ... The downside with creating different expressions for the same logical expression is that potentially the downstream optimization rules would need to match more. On Mon, Dec 03, 2018 at 10:52 PM, DB Tsai < notificati...@github.com > wrote: > > > > @ rxin ( https://github.com/rxin ) switch in Java is still significantly > faster than hash set even without boxing / unboxing problems when the > number of elements are small. We were thinking about to have two > implementations in InSet , and pick up switch if the number of elements are > small, or otherwise pick up hash set one. But this is the same complexity > as having two implements in In as this PR. > > > > @ cloud-fan ( https://github.com/cloud-fan ) do you suggest to create an OptimizeIn > which has switch and hash set implementations based on the length of the > elements and remove InSet ? Basically, what we were thinking above. > > > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub ( > https://github.com/apache/spark/pull/23171#issuecomment-443991336 ) , or mute > the thread ( > https://github.com/notifications/unsubscribe-auth/AATvPKtGyx5jWxgtO1y5WsiXYDAQqRQ4ks5u1hvJgaJpZM4Y4P4J > ). > > > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23171 I thought InSwitch logically is the same as InSet, in which all the child expressions are literals? On Mon, Dec 03, 2018 at 8:38 PM, Wenchen Fan < notificati...@github.com > wrote: > > > > I think InSet is not an optimized version of In , but just a way to > separate the implementation for different conditions (the length of the > list). Maybe we should do the same thing here, create a InSwitch and > convert In to it when meeting some conditions. One problem is, In and InSwitch > is same in the interpreted version, maybe we should create a base class > for them. > > > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub ( > https://github.com/apache/spark/pull/23171#issuecomment-443968486 ) , or mute > the thread ( > https://github.com/notifications/unsubscribe-auth/AATvPDTQic0Ii5UD40m_Uj5kMVy4pNExks5u1fxPgaJpZM4Y4P4J > ). > > > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23171 That probably means we should just optimize InSet to have the switch version though? Rather than do it in In? On Mon, Dec 03, 2018 at 8:20 PM, Wenchen Fan < notificati...@github.com > wrote: > > > > @ rxin ( https://github.com/rxin ) I proposed the same thing before, but > one problem is that, we only convert In to InSet when the length of list > reaches the threshold. If the switch way is faster than hash set when the > list is small, it seems still worth to optimize In using switch. > > > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub ( > https://github.com/apache/spark/pull/23171#issuecomment-443965616 ) , or mute > the thread ( > https://github.com/notifications/unsubscribe-auth/AATvPEkrUFJuT4FI167cCI9b0nfv16V4ks5u1fgNgaJpZM4Y4P4J > ). > > > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23171 I'm not a big fan of making the physical implementation of an expression very different depending on the situation. Why can't we just make InSet efficient and convert these cases to that? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23192: [SPARK-26241][SQL] Add queryId to IncrementalExecution
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23192 Thanks @HyukjinKwon. Fixed it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23193: [SPARK-26226][SQL] Track optimization phase for s...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23193 [SPARK-26226][SQL] Track optimization phase for streaming queries ## What changes were proposed in this pull request? In an earlier PR, we missed measuring the optimization phase time for streaming queries. This patch adds it. ## How was this patch tested? Given this is a debugging feature, and it is very convoluted to add tests to verify the phase is set properly, I am not introducing a streaming specific test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-26226-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23193.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23193 commit 70c319bdaaac4fc4b8b988a96be6f976a63b41bf Author: Reynold Xin Date: 2018-12-01T04:33:21Z SPARK-26226 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23193: [SPARK-26226][SQL] Track optimization phase for streamin...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23193 cc @gatorsmile @jose-torres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23192: [SPARK-26221][SQL] Add queryId to IncrementalExecution
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23192 cc @zsxwing @jose-torres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23192: [SPARK-26221][SQL] Add queryId to IncrementalExec...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23192 [SPARK-26221][SQL] Add queryId to IncrementalExecution ## What changes were proposed in this pull request? This is a small change for better debugging: to pass query uuid in IncrementalExecution, when we look at the QueryExecution in isolation to trace back the query. ## How was this patch tested? N/A - just add some field for better debugging. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-26241 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23192.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23192 commit c037f4d2fa2c2844ac992d976b492e14ab9bed11 Author: Reynold Xin Date: 2018-12-01T04:27:00Z [SPARK-26221] --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23183: [SPARK-26226][SQL] Update query tracker to report...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23183#discussion_r238019351 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala --- @@ -51,6 +58,18 @@ object QueryPlanningTracker { } } + /** + * Summary of a phase, with start time and end time so we can construct a timeline. + */ + class PhaseSummary(val startTimeMs: Long, val endTimeMs: Long) { + +def durationMs: Long = endTimeMs - startTimeMs + +override def toString: String = { + s"PhaseSummary($startTimeMs, $endTimeMs)" --- End diff -- so for actual debugging this is not needed right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23183: [SPARK-26226][SQL] Update query tracker to report timeli...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23183 cc @hvanhovell @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23183: [SPARK-26226][SQL] Update query tracker to report...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23183 [SPARK-26226][SQL] Update query tracker to report timeline for phases ## What changes were proposed in this pull request? This patch changes the query plan tracker added earlier to report phase timeline, rather than just a duration for each phase. This way, we can easily find time that's unaccounted for. ## How was this patch tested? Updated test cases to reflect that. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-26226 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23183.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23183 commit d200be22afd83472c03a612a22e5b1fb4d4d80ab Author: Reynold Xin Date: 2018-11-29T23:00:49Z [SPARK-26226][SQL] Update query tracker to report timeline for phases --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: [SPARK-26142] followup: Move sql shuffle read metrics relatives to SQLShuffleMetricsReporter
Repository: spark Updated Branches: refs/heads/master 9fdc7a840 -> cb368f2c2 [SPARK-26142] followup: Move sql shuffle read metrics relatives to SQLShuffleMetricsReporter ## What changes were proposed in this pull request? Follow up for https://github.com/apache/spark/pull/23128, move sql read metrics relatives to `SQLShuffleMetricsReporter`, in order to put sql shuffle read metrics relatives closer and avoid possible problem about forgetting update SQLShuffleMetricsReporter while new metrics added by others. ## How was this patch tested? Existing tests. Closes #23175 from xuanyuanking/SPARK-26142-follow. Authored-by: Yuanjian Li Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cb368f2c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cb368f2c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cb368f2c Branch: refs/heads/master Commit: cb368f2c2964797d7313d3a4151e2352ff7847a9 Parents: 9fdc7a8 Author: Yuanjian Li Authored: Thu Nov 29 12:09:30 2018 -0800 Committer: Reynold Xin Committed: Thu Nov 29 12:09:30 2018 -0800 -- .../exchange/ShuffleExchangeExec.scala | 4 +- .../org/apache/spark/sql/execution/limit.scala | 6 +-- .../spark/sql/execution/metric/SQLMetrics.scala | 20 .../metric/SQLShuffleMetricsReporter.scala | 50 .../execution/UnsafeRowSerializerSuite.scala| 4 +- 5 files changed, 47 insertions(+), 37 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cb368f2c/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala index 8938d93..c9ca395 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala @@ -30,7 +30,7 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, BoundReference, Uns import org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering import org.apache.spark.sql.catalyst.plans.physical._ import org.apache.spark.sql.execution._ -import org.apache.spark.sql.execution.metric.SQLMetrics +import org.apache.spark.sql.execution.metric.{SQLMetrics, SQLShuffleMetricsReporter} import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.StructType import org.apache.spark.util.MutablePair @@ -49,7 +49,7 @@ case class ShuffleExchangeExec( override lazy val metrics = Map( "dataSize" -> SQLMetrics.createSizeMetric(sparkContext, "data size") - ) ++ SQLMetrics.getShuffleReadMetrics(sparkContext) + ) ++ SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext) override def nodeName: String = { val extraInfo = coordinator match { http://git-wip-us.apache.org/repos/asf/spark/blob/cb368f2c/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala index ea845da..e9ab7cd 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala @@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodeGe import org.apache.spark.sql.catalyst.plans.physical._ import org.apache.spark.sql.catalyst.util.truncatedString import org.apache.spark.sql.execution.exchange.ShuffleExchangeExec -import org.apache.spark.sql.execution.metric.SQLMetrics +import org.apache.spark.sql.execution.metric.SQLShuffleMetricsReporter /** * Take the first `limit` elements and collect them to a single partition. @@ -38,7 +38,7 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode override def outputPartitioning: Partitioning = SinglePartition override def executeCollect(): Array[InternalRow] = child.executeTake(limit) private val serializer: Serializer = new UnsafeRowSerializer(child.output.size) - override lazy val metrics = SQLMetrics.getShuffleReadMetrics(sparkContext) + override lazy val metrics = SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext) protected override def doExecute(): RDD[InternalRow] = { val locallyLimited = child.execute().mapPartitionsInternal(_.take(limit)) val shuffled = new ShuffledRowRDD( @@ -154,7 +154,7 @@ case class TakeOrderedAndProjectExec(
[GitHub] spark issue #23175: [SPARK-26142]followup: Move sql shuffle read metrics rel...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23175 LGTM - merged in master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23178: [SPARK-26216][SQL] Do not use case class as public API (...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23178 Good idea to have it sealed! > On Nov 29, 2018, at 7:04 AM, Sean Owen wrote: > > @srowen commented on this pull request. > > In sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala: > > > if (inputTypes.isDefined) { >assert(inputTypes.get.length == nullableTypes.get.length) > } > > +val inputsNullSafe = if (nullableTypes.isEmpty) { > You can use getOrElse here and even inline this into the call below, but I don't really care. > > In sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala: > > > @@ -38,114 +38,108 @@ import org.apache.spark.sql.types.DataType > * @since 1.3.0 > */ > @Stable > -case class UserDefinedFunction protected[sql] ( > -f: AnyRef, > -dataType: DataType, > -inputTypes: Option[Seq[DataType]]) { > - > - private var _nameOption: Option[String] = None > - private var _nullable: Boolean = true > - private var _deterministic: Boolean = true > - > - // This is a `var` instead of in the constructor for backward compatibility of this case class. > - // TODO: revisit this case class in Spark 3.0, and narrow down the public surface. > - private[sql] var nullableTypes: Option[Seq[Boolean]] = None > +trait UserDefinedFunction { > Should we make this sealed? I'm not sure. Would any user ever extend this meaningfully? I kind of worry someone will start doing so; maybe they already subclass it in some cases though. Elsewhere it might help the compiler understand in match statements that there is only ever one type of UDF class to match on. > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Implement shuffle read metrics in SQL
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23128 @xuanyuanking @cloud-fan when you think about where to put each code block, make sure you also think about future evolution of the codebase. In general put relevant things closer to each other (e.g. in one class, one file, or one method). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23128: [SPARK-26142][SQL] Implement shuffle read metrics...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23128#discussion_r237129249 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -82,6 +82,14 @@ object SQLMetrics { private val baseForAvgMetric: Int = 10 + val REMOTE_BLOCKS_FETCHED = "remoteBlocksFetched" --- End diff -- rather than putting this list and the getShuffleReadMetrics function here, we should move it into SQLShuffleMetricsReporter. Otherwise in the future when one adds another metric, he/she is likely to forget to update SQLShuffleMetricsReporter. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23128: [SPARK-26142][SQL] Implement shuffle read metrics...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23128#discussion_r237128247 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.metric + +import org.apache.spark.executor.TempShuffleReadMetrics + +/** + * A shuffle metrics reporter for SQL exchange operators. + * @param tempMetrics [[TempShuffleReadMetrics]] created in TaskContext. + * @param metrics All metrics in current SparkPlan. This param should not empty and + * contains all shuffle metrics defined in [[SQLMetrics.getShuffleReadMetrics]]. + */ +private[spark] class SQLShuffleMetricsReporter( + tempMetrics: TempShuffleReadMetrics, --- End diff -- 4 space indent --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23128: [SPARK-26142][SQL] Implement shuffle read metrics...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23128#discussion_r237128189 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -194,4 +202,16 @@ object SQLMetrics { SparkListenerDriverAccumUpdates(executionId.toLong, metrics.map(m => m.id -> m.value))) } } + + /** + * Create all shuffle read relative metrics and return the Map. + */ + def getShuffleReadMetrics(sc: SparkContext): Map[String, SQLMetric] = Map( --- End diff -- I'd prefer to name this create, rather than get, to imply we are creating a new set rather than just returning some existing sets. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236845375 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -38,7 +38,7 @@ import org.apache.spark.sql.execution.datasources.jdbc._ import org.apache.spark.sql.execution.datasources.json.TextInputJsonDataSource import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils -import org.apache.spark.sql.sources.v2.{BatchReadSupportProvider, DataSourceOptions, DataSourceV2} +import org.apache.spark.sql.sources.v2._ --- End diff -- I do think this one is too nitpicking. If this gets long it should be wildcard. Use an IDE for large reviews like this if needed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23106: [SPARK-26141] Enable custom metrics implementation in sh...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23106 Merging in master. Thanks @squito. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: [SPARK-26141] Enable custom metrics implementation in shuffle write
Repository: spark Updated Branches: refs/heads/master 85383d29e -> 6a064ba8f [SPARK-26141] Enable custom metrics implementation in shuffle write ## What changes were proposed in this pull request? This is the write side counterpart to https://github.com/apache/spark/pull/23105 ## How was this patch tested? No behavior change expected, as it is a straightforward refactoring. Updated all existing test cases. Closes #23106 from rxin/SPARK-26141. Authored-by: Reynold Xin Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a064ba8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a064ba8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a064ba8 Branch: refs/heads/master Commit: 6a064ba8f271d5f9d04acd41d0eea50a5b0f5018 Parents: 85383d2 Author: Reynold Xin Authored: Mon Nov 26 22:35:52 2018 -0800 Committer: Reynold Xin Committed: Mon Nov 26 22:35:52 2018 -0800 -- .../sort/BypassMergeSortShuffleWriter.java| 11 +-- .../spark/shuffle/sort/ShuffleExternalSorter.java | 18 -- .../spark/shuffle/sort/UnsafeShuffleWriter.java | 9 + .../spark/storage/TimeTrackingOutputStream.java | 7 --- .../spark/executor/ShuffleWriteMetrics.scala | 13 +++-- .../apache/spark/scheduler/ShuffleMapTask.scala | 3 ++- .../org/apache/spark/shuffle/ShuffleManager.scala | 6 +- .../spark/shuffle/sort/SortShuffleManager.scala | 10 ++ .../org/apache/spark/storage/BlockManager.scala | 7 +++ .../spark/storage/DiskBlockObjectWriter.scala | 4 ++-- .../spark/util/collection/ExternalSorter.scala| 4 ++-- .../shuffle/sort/UnsafeShuffleWriterSuite.java| 6 -- .../scala/org/apache/spark/ShuffleSuite.scala | 12 .../sort/BypassMergeSortShuffleWriterSuite.scala | 16 project/MimaExcludes.scala| 7 ++- 15 files changed, 79 insertions(+), 54 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6a064ba8/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java -- diff --git a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java index b020a6d..fda33cd 100644 --- a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java +++ b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java @@ -37,12 +37,11 @@ import org.slf4j.LoggerFactory; import org.apache.spark.Partitioner; import org.apache.spark.ShuffleDependency; import org.apache.spark.SparkConf; -import org.apache.spark.TaskContext; -import org.apache.spark.executor.ShuffleWriteMetrics; import org.apache.spark.scheduler.MapStatus; import org.apache.spark.scheduler.MapStatus$; import org.apache.spark.serializer.Serializer; import org.apache.spark.serializer.SerializerInstance; +import org.apache.spark.shuffle.ShuffleWriteMetricsReporter; import org.apache.spark.shuffle.IndexShuffleBlockResolver; import org.apache.spark.shuffle.ShuffleWriter; import org.apache.spark.storage.*; @@ -79,7 +78,7 @@ final class BypassMergeSortShuffleWriter extends ShuffleWriter { private final int numPartitions; private final BlockManager blockManager; private final Partitioner partitioner; - private final ShuffleWriteMetrics writeMetrics; + private final ShuffleWriteMetricsReporter writeMetrics; private final int shuffleId; private final int mapId; private final Serializer serializer; @@ -103,8 +102,8 @@ final class BypassMergeSortShuffleWriter extends ShuffleWriter { IndexShuffleBlockResolver shuffleBlockResolver, BypassMergeSortShuffleHandle handle, int mapId, - TaskContext taskContext, - SparkConf conf) { + SparkConf conf, + ShuffleWriteMetricsReporter writeMetrics) { // Use getSizeAsKb (not bytes) to maintain backwards compatibility if no units are provided this.fileBufferSize = (int) conf.getSizeAsKb("spark.shuffle.file.buffer", "32k") * 1024; this.transferToEnabled = conf.getBoolean("spark.file.transferTo", true); @@ -114,7 +113,7 @@ final class BypassMergeSortShuffleWriter extends ShuffleWriter { this.shuffleId = dep.shuffleId(); this.partitioner = dep.partitioner(); this.numPartitions = partitioner.numPartitions(); -this.writeMetrics = taskContext.taskMetrics().shuffleWriteMetrics(); +this.writeMetrics = writeMetrics; this.serializer = dep.serializer(); this.shuffleBlockResolver = shuffleBlockResolver; } http://git-wip-us.apache.org/repos/asf/spark/blob/6a064
[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236492408 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; --- End diff -- Everything in catalyst is considered private (although public visibility for debugging) and it's best to stay that way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23106: [SPARK-26141] Enable custom metrics implementatio...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23106#discussion_r236432889 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -242,8 +243,13 @@ private void writeSortedFile(boolean isLastFile) { // Note that we intentionally ignore the value of `writeMetricsToUse.shuffleWriteTime()`. // Consistent with ExternalSorter, we do not count this IO towards shuffle write time. // This means that this IO time is not accounted for anywhere; SPARK-3577 will fix this. - writeMetrics.incRecordsWritten(writeMetricsToUse.recordsWritten()); - taskContext.taskMetrics().incDiskBytesSpilled(writeMetricsToUse.bytesWritten()); + + // This is guaranteed to be a ShuffleWriteMetrics based on the if check in the beginning + // of this file. --- End diff -- ah yes. nice catch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23147: [SPARK-26140] followup: rename ShuffleMetricsReporter
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23147 cc @gatorsmile @xuanyuanking @cloud-fan I misunderstood your comment. Finally saw it today when I was looking at my other PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23147: [SPARK-26140] followup: rename ShuffleMetricsRepo...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23147 [SPARK-26140] followup: rename ShuffleMetricsReporter ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/23105, due to working on two parallel PRs at once, I made the mistake of committing the copy of the PR that used the name ShuffleMetricsReporter for the interface, rather than the appropriate one ShuffleReadMetricsReporter. This patch fixes that. ## How was this patch tested? This should be fine as long as compilation passes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark ShuffleReadMetricsReporter Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23147.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23147 commit 1d28d879572aa958b169acc5e1a48e52cced4c26 Author: Reynold Xin Date: 2018-11-26T18:56:18Z ShuffleReadMetricsReporter --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23135: [SPARK-26168][SQL] Update the code comments in Ex...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23135#discussion_r236089467 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -575,6 +575,19 @@ case class Range( } } +/** + * This is a Group by operator with the aggregate functions and projections. + * + * @param groupingExpressions expressions for grouping keys + * @param aggregateExpressions expressions for a project list, which could contain + * [[org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction]]s. + * + * Note: Currently, aggregateExpressions correspond to both [[AggregateExpression]] and the output --- End diff -- It is not clear what âresultExpressionsâ mean. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23131#discussion_r236052557 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1852,6 +1852,19 @@ class Dataset[T] private[sql]( CombineUnions(Union(logicalPlan, other.logicalPlan)) } + /** + * Returns a new Dataset containing union of rows in this Dataset and another Dataset. --- End diff -- say that this is an alias of union. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23129: [MINOR] Update all DOI links to preferred resolver
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23129 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23128: [SPARK-26142][SQL] Support passing shuffle metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23128#discussion_r236025838 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.metric + +import org.apache.spark.executor.TempShuffleReadMetrics + +/** + * A shuffle metrics reporter for SQL exchange operators. + * @param tempMetrics [[TempShuffleReadMetrics]] created in TaskContext. + * @param metrics All metrics in current SparkPlan. + */ +class SQLShuffleMetricsReporter( + tempMetrics: TempShuffleReadMetrics, + metrics: Map[String, SQLMetric]) extends TempShuffleReadMetrics { + + override def incRemoteBlocksFetched(v: Long): Unit = { +metrics(SQLMetrics.REMOTE_BLOCKS_FETCHED).add(v) --- End diff -- (Iâm not referring to just this function, but in general, especially for per-row). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23128: [SPARK-26142][SQL] Support passing shuffle metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23128#discussion_r236025817 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.metric + +import org.apache.spark.executor.TempShuffleReadMetrics + +/** + * A shuffle metrics reporter for SQL exchange operators. + * @param tempMetrics [[TempShuffleReadMetrics]] created in TaskContext. + * @param metrics All metrics in current SparkPlan. + */ +class SQLShuffleMetricsReporter( + tempMetrics: TempShuffleReadMetrics, + metrics: Map[String, SQLMetric]) extends TempShuffleReadMetrics { + + override def incRemoteBlocksFetched(v: Long): Unit = { +metrics(SQLMetrics.REMOTE_BLOCKS_FETCHED).add(v) --- End diff -- Doing a hashmap lookup here could introduce serious performance regressions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23105#discussion_r236020103 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +/** + * An interface for reporting shuffle read metrics, for each shuffle. This interface assumes + * all the methods are called on a single-threaded, i.e. concrete implementations would not need + * to synchronize. + * + * All methods have additional Spark visibility modifier to allow public, concrete implementations + * that still have these methods marked as private[spark]. + */ +private[spark] trait ShuffleReadMetricsReporter { --- End diff -- @xuanyuanking just submitted a PR on how to use it :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23105#discussion_r235950427 --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala --- @@ -48,7 +48,8 @@ private[spark] trait ShuffleManager { handle: ShuffleHandle, startPartition: Int, endPartition: Int, - context: TaskContext): ShuffleReader[K, C] + context: TaskContext, + metrics: ShuffleMetricsReporter): ShuffleReader[K, C] --- End diff -- It is a read metrics here actually. In the write PR this is renamed ShuffleReadMetricsReporter. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23110: [SPARK-26129] Followup - edge behavior for QueryPlanning...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23110 cc @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23110: [SPARK-26129] Followup - edge behavior for QueryP...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23110 [SPARK-26129] Followup - edge behavior for QueryPlanningTracker.topRulesByTime ## What changes were proposed in this pull request? This is an addendum patch for SPARK-26129 that defines the edge case behavior for QueryPlanningTracker.topRulesByTime. ## How was this patch tested? Added unit tests for each behavior. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-26129-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23110.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23110 commit 683630ac3fbf054534e2589258793c9baaebfbf5 Author: Reynold Xin Date: 2018-11-21T22:25:09Z [SPARK-26129] --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23106: [SPARK-26141] Enable custom shuffle metrics imple...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23106 [SPARK-26141] Enable custom shuffle metrics implementation in shuffle write ## What changes were proposed in this pull request? This is the write side counterpart to https://github.com/apache/spark/pull/23105 ## How was this patch tested? No behavior change expected, as it is a straightforward refactoring. Updated all existing test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-26141 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23106.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23106 commit 115bd8bfa49674a2fcfa05517373146e90ec3bf7 Author: Reynold Xin Date: 2018-11-21T15:55:56Z [SPARK-26141] Enable custom shuffle metrics implementation in shuffle write --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23105 cc @jiangxb1987 @squito --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: [SPARK-26129][SQL] Instrumentation for per-query planning time
Repository: spark Updated Branches: refs/heads/master 6bbdf34ba -> 07a700b37 [SPARK-26129][SQL] Instrumentation for per-query planning time ## What changes were proposed in this pull request? We currently don't have good visibility into query planning time (analysis vs optimization vs physical planning). This patch adds a simple utility to track the runtime of various rules and various planning phases. ## How was this patch tested? Added unit tests and end-to-end integration tests. Closes #23096 from rxin/SPARK-26129. Authored-by: Reynold Xin Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/07a700b3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/07a700b3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/07a700b3 Branch: refs/heads/master Commit: 07a700b3711057553dfbb7b047216565726509c7 Parents: 6bbdf34 Author: Reynold Xin Authored: Wed Nov 21 16:41:12 2018 +0100 Committer: Reynold Xin Committed: Wed Nov 21 16:41:12 2018 +0100 -- .../sql/catalyst/QueryPlanningTracker.scala | 127 +++ .../spark/sql/catalyst/analysis/Analyzer.scala | 22 ++-- .../spark/sql/catalyst/rules/RuleExecutor.scala | 19 ++- .../catalyst/QueryPlanningTrackerSuite.scala| 78 .../sql/catalyst/analysis/AnalysisTest.scala| 3 +- .../ResolveGroupingAnalyticsSuite.scala | 3 +- .../analysis/ResolvedUuidExpressionsSuite.scala | 10 +- .../scala/org/apache/spark/sql/Dataset.scala| 9 ++ .../org/apache/spark/sql/SparkSession.scala | 6 +- .../spark/sql/execution/QueryExecution.scala| 21 ++- .../QueryPlanningTrackerEndToEndSuite.scala | 52 .../apache/spark/sql/hive/test/TestHive.scala | 16 ++- 12 files changed, 338 insertions(+), 28 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/07a700b3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala new file mode 100644 index 000..420f2a1 --- /dev/null +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst + +import scala.collection.JavaConverters._ + +import org.apache.spark.util.BoundedPriorityQueue + + +/** + * A simple utility for tracking runtime and associated stats in query planning. + * + * There are two separate concepts we track: + * + * 1. Phases: These are broad scope phases in query planning, as listed below, i.e. analysis, + * optimizationm and physical planning (just planning). + * + * 2. Rules: These are the individual Catalyst rules that we track. In addition to time, we also + * track the number of invocations and effective invocations. + */ +object QueryPlanningTracker { + + // Define a list of common phases here. + val PARSING = "parsing" + val ANALYSIS = "analysis" + val OPTIMIZATION = "optimization" + val PLANNING = "planning" + + class RuleSummary( +var totalTimeNs: Long, var numInvocations: Long, var numEffectiveInvocations: Long) { + +def this() = this(totalTimeNs = 0, numInvocations = 0, numEffectiveInvocations = 0) + +override def toString: String = { + s"RuleSummary($totalTimeNs, $numInvocations, $numEffectiveInvocations)" +} + } + + /** + * A thread local variable to implicitly pass the tracker around. This assumes the query planner + * is single-threaded, and avoids passing the same tracker context in every function call. + */ + private val localTracker = new ThreadLocal[QueryPlanningTracker]() { +override def initialValue: QueryPlanningTracker = null + } + + /** Returns the current tra
[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23096 Merging this. Feel free to leave more comments. I'm hoping we can wire this into the UI eventually. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23105: [SPARK-26140] Enable passing in a custom shuffle ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23105#discussion_r235420647 --- Diff: core/src/main/scala/org/apache/spark/executor/ShuffleReadMetrics.scala --- @@ -122,34 +123,3 @@ class ShuffleReadMetrics private[spark] () extends Serializable { } } } - -/** - * A temporary shuffle read metrics holder that is used to collect shuffle read metrics for each - * shuffle dependency, and all temporary metrics will be merged into the [[ShuffleReadMetrics]] at - * last. - */ -private[spark] class TempShuffleReadMetrics { --- End diff -- this was moved to TempShuffleReadMetrics --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23105: [SPARK-26140] Pull TempShuffleReadMetrics creatio...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23105 [SPARK-26140] Pull TempShuffleReadMetrics creation out of shuffle reader ## What changes were proposed in this pull request? This patch defines an internal Spark interface for reporting shuffle metrics and uses that in shuffle reader. Before this patch, shuffle metrics is tied to a specific implementation (using a thread local temporary data structure and accumulators). After this patch, callers that define their own shuffle RDDs can create a custom metrics implementation. With this patch, we would be able to create a better metrics for the SQL layer, e.g. reporting shuffle metrics in the SQL UI, for each exchange operator. ## How was this patch tested? No behavior change expected, as it is a straightforward refactoring. Updated all existing test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-26140 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23105.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23105 commit da253b57c14bc0174f0330ae6fa5d3a61647269b Author: Reynold Xin Date: 2018-11-21T14:56:23Z [SPARK-26140] Pull TempShuffleReadMetrics creation out of shuffle reader --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23096#discussion_r235309483 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -648,7 +648,11 @@ class SparkSession private( * @since 2.0.0 */ def sql(sqlText: String): DataFrame = { -Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText)) +val tracker = new QueryPlanningTracker --- End diff -- I don't think it makes sense to add random flags for everything. If the argument is that this change has a decent chance of introducing regressions (e.g. due to higher memory usage, or cpu overhead), then it would make a lot of sense to put it behind a flag so it can be disabled in production if that happens. That said, the overhead on the hot code path here is substantially smaller than even transforming the simplest Catalyst plan (hash map look up is orders of magnitude cheaper than calling a partial function to transform a Scala collection for TreeNode), so I think the risk is so low that it does not warrant adding a config. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23100: [WIP][SPARK-26133][ML] Remove deprecated OneHotEncoder a...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23100 Change of this type can really piss some people off. Was there consensus on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23096#discussion_r235182105 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala --- @@ -88,15 +101,20 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { val startTime = System.nanoTime() val result = rule(plan) val runTime = System.nanoTime() - startTime +val effective = !result.fastEquals(plan) -if (!result.fastEquals(plan)) { +if (effective) { queryExecutionMetrics.incNumEffectiveExecution(rule.ruleName) queryExecutionMetrics.incTimeEffectiveExecutionBy(rule.ruleName, runTime) planChangeLogger.log(rule.ruleName, plan, result) } queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime) queryExecutionMetrics.incNumExecution(rule.ruleName) +if (tracker ne null) { --- End diff -- if one calls execute directly tracker would be null. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23096#discussion_r235162047 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala --- @@ -88,15 +92,18 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { val startTime = System.nanoTime() val result = rule(plan) val runTime = System.nanoTime() - startTime +val effective = !result.fastEquals(plan) -if (!result.fastEquals(plan)) { +if (effective) { queryExecutionMetrics.incNumEffectiveExecution(rule.ruleName) queryExecutionMetrics.incTimeEffectiveExecutionBy(rule.ruleName, runTime) planChangeLogger.log(rule.ruleName, plan, result) } queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime) queryExecutionMetrics.incNumExecution(rule.ruleName) +tracker.foreach(_.recordRuleInvocation(rule.ruleName, runTime, effective)) --- End diff -- yes! (not great -- but I'd probably remove the global tracker at some point) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23096#discussion_r235161825 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -696,7 +701,7 @@ class Analyzer( s"avoid errors. Increase the value of ${SQLConf.MAX_NESTED_VIEW_DEPTH.key} to work " + "around this.") } - executeSameContext(child) + executeSameContext(child, None) --- End diff -- No great reason. I just used None for everything, except the top level, because it is very difficult to wire the tracker here without refactoring a lot of code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23096#discussion_r235161336 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst + +import scala.collection.JavaConverters._ + +import org.apache.spark.util.BoundedPriorityQueue + + +/** + * A simple utility for tracking runtime and associated stats in query planning. + * + * There are two separate concepts we track: + * + * 1. Phases: These are broad scope phases in query planning, as listed below, i.e. analysis, + * optimizationm and physical planning (just planning). + * + * 2. Rules: These are the individual Catalyst rules that we track. In addition to time, we also + * track the number of invocations and effective invocations. + */ +object QueryPlanningTracker { + + // Define a list of common phases here. + val PARSING = "parsing" --- End diff -- Mostly because Scala enum is not great, and I was thinking about making this a generic thing that's extensible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23096 cc @hvanhovell @gatorsmile This is different from the existing metrics for rules as it is query specific. We might want to replace that one with this in the future. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for query plan...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/23096 [SPARK-26129][SQL] Instrumentation for query planning time ## What changes were proposed in this pull request? We currently don't have good visibility into query planning time (analysis vs optimization vs physical planning). This patch adds a simple utility to track the runtime of various rules and various planning phases. ## How was this patch tested? Added unit tests and end-to-end integration tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-26129 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23096.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23096 commit b6a3d02f2c2b0eff71f92c3ede854edc3b5bf9f8 Author: Reynold Xin Date: 2018-11-20T16:22:35Z [SPARK-26129][SQL] Instrumentation for query planning time --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of non-struct ty...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23054#discussion_r234569150 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1594,6 +1594,15 @@ object SQLConf { "WHERE, which does not follow SQL standard.") .booleanConf .createWithDefault(false) + + val LEGACY_ALIAS_NON_STRUCT_GROUPING_KEY = +buildConf("spark.sql.legacy.dataset.aliasNonStructGroupingKey") --- End diff -- Maybe aliasNonStructGroupingKeyAsValue, and default to true. Then we can remove this in the future. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org