from:"rxin"

[spark] branch branch-3.3 updated: [MINOR][TEST][SQL] Add a CTE subquery scope test case

2022-12-23 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new aa39b06462a [MINOR][TEST][SQL] Add a CTE subquery scope test case
aa39b06462a is described below

commit aa39b06462a98f37be59e239d12edd9f09a25b88
Author: Reynold Xin 
AuthorDate: Fri Dec 23 14:55:14 2022 -0800

[MINOR][TEST][SQL] Add a CTE subquery scope test case

### What changes were proposed in this pull request?
I noticed we were missing a test case for this in SQL tests, so I added one.

### Why are the changes needed?
To ensure we scope CTEs properly in subqueries.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
This is a test case change.

Closes #39189 from rxin/cte_test.

Authored-by: Reynold Xin 
Signed-off-by: Reynold Xin 
(cherry picked from commit 24edf8ecb5e47af294f89552dfd9957a2d9f193b)
Signed-off-by: Reynold Xin 
---
 .../test/resources/sql-tests/inputs/cte-nested.sql | 10 
 .../resources/sql-tests/results/cte-legacy.sql.out | 28 ++
 .../resources/sql-tests/results/cte-nested.sql.out | 28 ++
 .../sql-tests/results/cte-nonlegacy.sql.out| 28 ++
 4 files changed, 94 insertions(+)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql 
b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
index 5f12388b9cb..e5ef2443417 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
@@ -17,6 +17,16 @@ SELECT (
   SELECT * FROM t
 );
 
+-- Make sure CTE in subquery is scoped to that subquery rather than global
+-- the 2nd half of the union should fail because the cte is scoped to the 
first half
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte;
+
 -- CTE in CTE definition shadows outer
 WITH
   t AS (SELECT 1),
diff --git a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out 
b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out
index 264b64ffe96..ebdd64c3ac8 100644
--- a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out
@@ -36,6 +36,34 @@ struct
 1
 
 
+-- !query
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "TABLE_OR_VIEW_NOT_FOUND",
+  "sqlState" : "42000",
+  "messageParameters" : {
+"relationName" : "`cte`"
+  },
+  "queryContext" : [ {
+"objectType" : "",
+"objectName" : "",
+"startIndex" : 120,
+"stopIndex" : 122,
+"fragment" : "cte"
+  } ]
+}
+
+
 -- !query
 WITH
   t AS (SELECT 1),
diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out 
b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out
index 2c622de3f36..b6e1793f7d7 100644
--- a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out
@@ -36,6 +36,34 @@ struct
 1
 
 
+-- !query
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "TABLE_OR_VIEW_NOT_FOUND",
+  "sqlState" : "42000",
+  "messageParameters" : {
+"relationName" : "`cte`"
+  },
+  "queryContext" : [ {
+"objectType" : "",
+"objectName" : "",
+"startIndex" : 120,
+"stopIndex" : 122,
+"fragment" : "cte"
+  } ]
+}
+
+
 -- !query
 WITH
   t AS (SELECT 1),
diff --git 
a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out 
b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out
index 283f5a54a42..546ab7ecb95 100644
--- a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out
@@ -36,6 +36,34 @@ struct
 1
 
 
+-- !query
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisE

[spark] branch master updated: [MINOR][TEST][SQL] Add a CTE subquery scope test case

2022-12-23 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 24edf8ecb5e [MINOR][TEST][SQL] Add a CTE subquery scope test case
24edf8ecb5e is described below

commit 24edf8ecb5e47af294f89552dfd9957a2d9f193b
Author: Reynold Xin 
AuthorDate: Fri Dec 23 14:55:14 2022 -0800

[MINOR][TEST][SQL] Add a CTE subquery scope test case

### What changes were proposed in this pull request?
I noticed we were missing a test case for this in SQL tests, so I added one.

### Why are the changes needed?
To ensure we scope CTEs properly in subqueries.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
This is a test case change.

Closes #39189 from rxin/cte_test.

Authored-by: Reynold Xin 
Signed-off-by: Reynold Xin 
---
 .../test/resources/sql-tests/inputs/cte-nested.sql | 10 
 .../resources/sql-tests/results/cte-legacy.sql.out | 28 ++
 .../resources/sql-tests/results/cte-nested.sql.out | 28 ++
 .../sql-tests/results/cte-nonlegacy.sql.out| 28 ++
 4 files changed, 94 insertions(+)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql 
b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
index 5f12388b9cb..e5ef2443417 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
@@ -17,6 +17,16 @@ SELECT (
   SELECT * FROM t
 );
 
+-- Make sure CTE in subquery is scoped to that subquery rather than global
+-- the 2nd half of the union should fail because the cte is scoped to the 
first half
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte;
+
 -- CTE in CTE definition shadows outer
 WITH
   t AS (SELECT 1),
diff --git a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out 
b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out
index 013c5f27b50..65000471c75 100644
--- a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out
@@ -33,6 +33,34 @@ struct
 1
 
 
+-- !query
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "TABLE_OR_VIEW_NOT_FOUND",
+  "sqlState" : "42000",
+  "messageParameters" : {
+"relationName" : "`cte`"
+  },
+  "queryContext" : [ {
+"objectType" : "",
+"objectName" : "",
+"startIndex" : 120,
+"stopIndex" : 122,
+"fragment" : "cte"
+  } ]
+}
+
+
 -- !query
 WITH
   t AS (SELECT 1),
diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out 
b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out
index ed6d69b233e..2c67f2db56a 100644
--- a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out
@@ -33,6 +33,34 @@ struct
 1
 
 
+-- !query
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "TABLE_OR_VIEW_NOT_FOUND",
+  "sqlState" : "42000",
+  "messageParameters" : {
+"relationName" : "`cte`"
+  },
+  "queryContext" : [ {
+"objectType" : "",
+"objectName" : "",
+"startIndex" : 120,
+"stopIndex" : 122,
+"fragment" : "cte"
+  } ]
+}
+
+
 -- !query
 WITH
   t AS (SELECT 1),
diff --git 
a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out 
b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out
index 6a48e1bec43..154ebd20223 100644
--- a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out
@@ -33,6 +33,34 @@ struct
 1
 
 
+-- !query
+SELECT * FROM
+  (
+   WITH cte AS (SELECT * FROM range(10))
+   SELECT * FROM cte WHERE id = 8
+  ) a
+UNION
+SELECT * FROM cte
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "TABLE_OR_VIEW_NOT_FOUND",
+  "sqlState" : "42000",
+  "messageParameters" : {
+

svn commit: r46414 - /dev/spark/v3.1.1-rc3-bin/ /release/spark/spark-3.1.1/

2021-03-02 Thread rxin

Author: rxin
Date: Tue Mar  2 11:00:12 2021
New Revision: 46414

Log:
Moving Apache Spark 3.1.1 RC3 to Apache Spark 3.1.1

Added:
release/spark/spark-3.1.1/
  - copied from r46413, dev/spark/v3.1.1-rc3-bin/
Removed:
dev/spark/v3.1.1-rc3-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r46413 - in /dev/spark: v3.1.1-rc3-bin/ v3.1.1-rc3-docs/

2021-03-02 Thread rxin

Author: rxin
Date: Tue Mar  2 10:55:39 2021
New Revision: 46413

Log:
Recover 3.1.1 RC3

Added:
dev/spark/v3.1.1-rc3-bin/
  - copied from r46410, dev/spark/v3.1.1-rc3-bin/
dev/spark/v3.1.1-rc3-docs/
  - copied from r46410, dev/spark/v3.1.1-rc3-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r46411 - in /dev/spark: v3.1.1-rc3-bin/ v3.1.1-rc3-docs/

2021-03-02 Thread rxin

Author: rxin
Date: Tue Mar  2 10:39:38 2021
New Revision: 46411

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.1.1-rc3-bin/
dev/spark/v3.1.1-rc3-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r46412 - in /dev/spark: v3.1.0-rc1-bin/ v3.1.0-rc1-docs/

2021-03-02 Thread rxin

Author: rxin
Date: Tue Mar  2 10:39:58 2021
New Revision: 46412

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.1.0-rc1-bin/
dev/spark/v3.1.0-rc1-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r46410 - in /dev/spark: v3.1.1-rc2-bin/ v3.1.1-rc2-docs/

2021-03-02 Thread rxin

Author: rxin
Date: Tue Mar  2 10:39:32 2021
New Revision: 46410

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.1.1-rc2-bin/
dev/spark/v3.1.1-rc2-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r46409 - in /dev/spark: v3.1.1-rc1-bin/ v3.1.1-rc1-docs/

2021-03-02 Thread rxin

Author: rxin
Date: Tue Mar  2 10:39:25 2021
New Revision: 46409

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.1.1-rc1-bin/
dev/spark/v3.1.1-rc1-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r40088 - in /dev/spark: v3.0.0-rc1-bin/ v3.0.0-rc1-docs/ v3.0.0-rc2-bin/ v3.0.0-rc2-docs/ v3.0.0-rc3-docs/

2020-06-18 Thread rxin

Author: rxin
Date: Thu Jun 18 16:41:27 2020
New Revision: 40088

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.0.0-rc1-bin/
dev/spark/v3.0.0-rc1-docs/
dev/spark/v3.0.0-rc2-bin/
dev/spark/v3.0.0-rc2-docs/
dev/spark/v3.0.0-rc3-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r40050 - /dev/spark/v3.0.0-rc3-bin/ /release/spark/spark-3.0.0/

2020-06-16 Thread rxin

Author: rxin
Date: Tue Jun 16 09:18:02 2020
New Revision: 40050

Log:
release 3.0.0

Added:
release/spark/spark-3.0.0/
  - copied from r40049, dev/spark/v3.0.0-rc3-bin/
Removed:
dev/spark/v3.0.0-rc3-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] tag v3.0.0 created (now 3fdfce3)

2020-06-14 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to tag v3.0.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 3fdfce3  (commit)
No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r39960 - in /dev/spark/v3.0.0-rc3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2020-06-06 Thread rxin

Author: rxin
Date: Sat Jun  6 14:03:25 2020
New Revision: 39960

Log:
Apache Spark v3.0.0-rc3 docs


[This commit notification would consist of 1920 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r39959 - /dev/spark/v3.0.0-rc3-bin/

2020-06-06 Thread rxin

Author: rxin
Date: Sat Jun  6 13:35:40 2020
New Revision: 39959

Log:
Apache Spark v3.0.0-rc3

Added:
dev/spark/v3.0.0-rc3-bin/
dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz   (with 
props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.sha512

Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc Sat Jun  6 13:35:40 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7bh3gQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZjGIEACG3gsdARN8puRHS2YL+brOmjbrS4wVY/Av
+l+ZR59moZ7QuwjYoixyqNnztIKgIyleYJq9DL5TqqMxFgGpuoDrnuWVqI+8MngVA
+gau/QDmYINabZsJxFfDn1IjxxSQBsgf6pwfqQbB+fGSjLSPnDq+u3DIWr3fRMh4X
+DrTuATNewKiiBIwQHUKAtPMAbsdDvXv0DRL7CGTiIJri43opAntQzHec3sP9hgRU
+J5J2HnjOlamgv58S7zrUw/Wo1xPLmz2PGIsP0aq9DRRw0bLnesrtEaWAKFp2HL5E
+QlbjfboaDQz/X+meruW57/sO/DDwA90/XvF44z4Gu6kbS8nRuTsU5wVfZ/1iyWZk
+PLP2nFoWl7O85k/DLB5ADYgce3e6k2qD2obKxzsEx0nr0Wu13cxCR2+IBQmv05jb
+4Kwi7iE0iKIxt3cESDH6j9GqZoTrcxt6Jb88KSQ+YM2TBNUr1ZZNmkjgYdmLvm7a
+wH6vLtdpZzUKIGd6bt1grEwoQJBMnQjkoDYxhx+ugjbs8CwwxcdUNd2Q5xz0WaSn
+p443ZlMR5lbGf6D6U4PUigaIrdD8d+ef/rRTDtXdoDqC+FdNuepyS9+2+dUZGErx
+N2IMNunKIdKw57GZGcILey1hY45SSuQFw5JAe+nWqCAzCmFX72ulkv9The7rLdlE
+YdLu6XQIBA==
+=HhHH
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512
==
--- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 Sat Jun  6 13:35:40 2020
@@ -0,0 +1,3 @@
+SparkR_3.0.0.tar.gz: 394DCFEB 4E202A8E 5C58BF94 A77548FD 79A00F92 34538535
+ B0242E1B 96068E3E 80F78188 D71831F8 4A350224 41AA14B1
+ D72AE704 F2390842 DBEAB41F 5AC9859A

Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc Sat Jun  6 13:35:40 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7bh3oQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZvhPD/9Vyrywk4kYFaUo36Kg6nickTvWMm+yDfGZ
+tKrUo3dAoMta7OwUjT6W0Roo8a4BBgumaDv47Dm6mlquF2DuLuBrFsqFo8c5VNA/
+jT1tdSdHiTzjq7LfY9GQDn8Wkgp1gyIKON70XFdZifduW0gcFDkJ+FjhPYWcA6jy
+GGOGK5qboCdi9C+KowUVj4VB9bbxPbWvW7FVF3+VlcrKvkmNx+EmqmIrqsh72w8O
+EL70za2uBRUUiFcaOpY/wpmEN1raCAkMzQ+dPl7p1PFgmLFrMN9RaRXJ1stF+fXO
+rDLBLNPqb85TvvOOHpcr4PSP38GrdZvDAvljCOEbBzacF719bewu/IVRcNi9lPZE
+HDPUcZLgnocNIF6kafykrm3JhagzmPIhQ8d4DFTuH6ePxgWqdUa9lWKQL54z3mjU
+LT2CJ8gMDY0Wz5zSKc/sI/ZwL+Q6U8xiIGYSzQgT9yPztbhDd5AM2DgohJkZSD4b
+jOrEsSyNRJiwwRAHlbeOOVPb4UNYzsx1USPbPEBeXTt8X8VUb8jsU84o/RhXexk9
+EMJjxz/aChB+NefbmUjBZmXSaa/zYubprJrWnUgPw7hFxAnmtgIUdjSWSNIOJ6bp
+EV1M6xwuvrmGhOa3D0C+lYyAuYZca2FQrcAtzNiL6iOMQ6USFZvzjxGWQiV2CDGQ
+O8CNfkwOGA

svn commit: r39958 - /dev/spark/v3.0.0-rc3-bin/

2020-06-06 Thread rxin

Author: rxin
Date: Sat Jun  6 11:18:32 2020
New Revision: 39958

Log:
remove 3.0 rc3 binary

Removed:
dev/spark/v3.0.0-rc3-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (fa608b9 -> 3ea461d)

2020-06-05 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fa608b9  [SPARK-31904][SQL] Fix case sensitive problem of char and 
varchar partition columns
 add 3fdfce3  Preparing Spark release v3.0.0-rc3
 new 3ea461d  Preparing development version 3.0.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing development version 3.0.1-SNAPSHOT

2020-06-05 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 3ea461d61e635835c07bacb5a0c403ae2a3099a0
Author: Reynold Xin 
AuthorDate: Sat Jun 6 02:57:41 2020 +

Preparing development version 3.0.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 3bad429..21f3eaa 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.0.0
+Version: 3.0.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 0a52a00..8bef9d8 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index fa4fcb1f..fc1441d 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 14a1b7d..de2a6fb 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index e75a843..6c0c016 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 004af0a..b8df191 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index a35156a..8119709 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --g

[spark] 01/01: Preparing Spark release v3.0.0-rc3

2020-06-05 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to tag v3.0.0-rc3
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 3fdfce3120f307147244e5eaf46d61419a723d50
Author: Reynold Xin 
AuthorDate: Sat Jun 6 02:57:35 2020 +

Preparing Spark release v3.0.0-rc3
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 21f3eaa..3bad429 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.0.1
+Version: 3.0.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 8bef9d8..0a52a00 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index fc1441d..fa4fcb1f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index de2a6fb..14a1b7d 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 6c0c016..e75a843 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index b8df191..004af0a 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 8119709..a35156a 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/ta

[spark] tag v3.0.0-rc3 created (now 3fdfce3)

2020-06-05 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to tag v3.0.0-rc3
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 3fdfce3  (commit)
This tag includes the following new commits:

 new 3fdfce3  Preparing Spark release v3.0.0-rc3

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r39951 - /dev/spark/v3.0.0-rc3-bin/

2020-06-05 Thread rxin

Author: rxin
Date: Fri Jun  5 19:08:09 2020
New Revision: 39951

Log:
Apache Spark v3.0.0-rc3

Added:
dev/spark/v3.0.0-rc3-bin/
dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz   (with 
props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz   (with props)
dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.asc
dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.sha512

Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc Fri Jun  5 19:08:09 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7ag4gQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZpBZD/9vSiD946kwdMWalYM01Zw2yjKK60eakhLY
+jxHRy1T6Yipspyh2idCrzd2MaGJFqUwRZjs1mpA/mKZUGRSzYFjlWWoaSc/T19MD
+3q/zg6glgoKquzxHcAqum/OCc1C1MJTcsMic2+LIelXRoJ2GPCeECq91JGX4xpD4
+09sDElvooqfMCLb05gaaF8Eyrpm+7WSyAEVpb1Fjpp/gtdG1YQyiW3o3WzNSJgeA
+dewZaSoI58lx3Rfs1jZN1M4Gyj1aKh4Yqw21+CDoHAhtkeOp5oGPgrWef4fZAE4D
+4xKoz1I/5C1s0wIZEhUI2IUJLeGyCR117QhIO/bQFR1XEOO22auQaPppGJKUa5bb
+bwpx6TARNP13fe2R48G+yZ9Em0uC3P1CucGYCRlY22umzkbalrVFeZ77n/FWRB7E
+nC29bso/R2VwmDRI6yWXiCPLMyQy/PukniWRJZiU7Ath1930cORAlqFC7EOBHgHu
+k3AVX/3h2qZBFuYu/wIsd89rgeiwrf4fksiuMhp8YXJh3xCLLSl4uT+q3flutJ3H
+nsOLYkuie/r4qx+M2J7rfezTzTeYr+SN8mn4CTsGRznHhb0amqlZE6yNFWVatr6D
+LEYWe9L3DK92Kj0Jtl5QyPXQlKSoBQriketgZXKxzeBScKeFd6acGxOhM5LpZRCo
+ngKbsgfcoQ==
+=bwFz
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512
==
--- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 Fri Jun  5 19:08:09 2020
@@ -0,0 +1,3 @@
+SparkR_3.0.0.tar.gz: 37496F1A C5BD0DFF 0F6B08B9 05CB55B7 DAA6397A 8C377126
+ C6887AEB CB05F172 0E4A9754 9ED4B6B4 68E9266A 6459229F
+ 48D58F7C 9C0A58B1 183CC6D0 A18ACE18

Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc Fri Jun  5 19:08:09 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7ag4kQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZlwHD/9tPwfyzwQkl6qkYp27AgZexy5k15gjJ/Bi
+MWWwv3bMhJiRlZN3hCyGC0QTTkRG+AJTd3SflbUhHzw9ttFAnt3VqZ7RZBB4UBDI
+5W85jUaF5bOMu7K4hW2iZdcLLLbq7/sXNNqRhomQStL4j6TerZjgP8IytCGEmLX4
+Qt894N7+MunZxbPXKkUqZfO0cWlxY53+zNGqXKJdwDhQUrrH0i+2fs3gd97OJs42
+83l+pE27C7+aTr6fSRWIS55nw9GzKrDOr0N47wtfCs0mqIW+dI+cVjZh8W/Gf9Dl
+EifAsLIpahNRpQLu0PqiWrsJ3meertha4DLWRPS0esYyZAGFK+DjD9Zm1cOovA9v
+ywjQVWCkmaqaozvm2RTKxwvS7kkBB2dJPUJJ8YeCBr0A7wHBAIeA0vvWe9q7u0KW
+O78uGswTF4EKz85ZMhuo8IjdjKjzTumzdFws4akeTzv60t+439zFdyhUghfQ71om
+biS1Fgopz1QLqCb3eaqhMBM0ZB4JVMTtMKb2/gqH/8qaQq91CEkLTpOOsRK+xdeg
+A8XoFCWEsBbHzLT3Y3FKsHC7ipo2FYXCcn/n/67bRuFFBwhLZzOyEISH72nKIk4k
+YOU5wZnsykG2oiV3ZysRlYewtU0mIIuUINrMVRZB69CUk9Q2fnDyuT02OEGIoNZC
+LohvgOFbqQ

svn commit: r39657 - in /dev/spark/v3.0.0-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2020-05-18 Thread rxin

Author: rxin
Date: Mon May 18 16:11:38 2020
New Revision: 39657

Log:
Apache Spark v3.0.0-rc2 docs


[This commit notification would consist of 1921 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r39656 - /dev/spark/v3.0.0-rc2-bin/

2020-05-18 Thread rxin

Author: rxin
Date: Mon May 18 15:42:56 2020
New Revision: 39656

Log:
Apache Spark v3.0.0-rc2

Added:
dev/spark/v3.0.0-rc2-bin/
dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz   (with 
props)
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-without-hadoop.tgz.asc
dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-rc2-bin/spark-3.0.0.tgz   (with props)
dev/spark/v3.0.0-rc2-bin/spark-3.0.0.tgz.asc
dev/spark/v3.0.0-rc2-bin/spark-3.0.0.tgz.sha512

Added: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc Mon May 18 15:42:56 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7CmHgQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZllrEACaCgpeO1qK4uJLQC00J1iU2970iVn9Aqh/
+gZnikK7mBClXekg2Q8+poAhueXS1XfGoJfOCwTeOp8iMvD0BcLhIxftKBg7CxmOa
+yKrtL/dehNyYMTWofxluZzolPR4O0DDNva2W6ExKPhrUAAOTPjPkMx9ty0C57IqO
+Pwblsr6iI3BWrmRdN2Dpfo+enxJ1rd6H/0kYCmXEFgyW8lBbGiN23KrjkriZOJxo
+6Ad8zFIEI+rSmmgvy6lkXdlJFduCmRFFZguRtWq48rYEY3pu6geIUetPMsosBnDW
+mb5ywNMuqZomeEes1JoWp96E65K3HUO8LxPrP3wJY9TfUGduAAwwBX8nGsa0r+mz
+JJq2f4zwvINM2eQGXIfcpg21K3ijqdkqylAKuBGiil5QcHABGQIQ6N1M+1ruKjKp
+zHeXh6tac2IM3dvpyh12mC7ZhKPBAC1sUZD8qzvB6sjaHgvv3uSUc2xTW7kzs8l2
+mwNT8SmCscR6+PAm29dY6CoRtVtDEygt+oOMhRkturaDQ9vtYgduKo+p6PiqffUE
+7SUKwk7a3Cqe46uxHabHdi+6NedFuX7/bPSAX51Q4MpeHC8l4HpgHDPodtfRcEQm
+VDSeLBfhs3WHi+OrqZ2et/EYaGFxiZTTi2PfpeMBPmC4d4k+yymZEenJcXVps7+G
+fFFeOvCfyQ==
+=2zdl
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512
==
--- dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512 Mon May 18 15:42:56 2020
@@ -0,0 +1,3 @@
+SparkR_3.0.0.tar.gz: B50B8062 8C2158C5 5931EB47 275FB32D 52EFF715 F3B39524
+ 29C03A21 583459D5 32EC2135 D27AB970 0F345B7A 620E4281
+ 950CC383 58231D1D BB08817C 4EDC6A05

Added: dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc Mon May 18 15:42:56 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7CmHoQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9Zn7ED/9Ujdr6jmTAFbtJtJiaDCevVGDhoND+9wca
+4MEaUYecgrYWSx12YBZe+d4nIbTuVWK6X29C76E/wbwREWFqG1fA17P7ZpBh8x3W
+xHSfzyYAP6G63I6IC+7jiHkOIOYBScGKj9h6z5j39eqt05HGAv088YEeTMpAC32B
+GbACEglWGgrE3JsrKXf77hIU8AizcE6rhS5OapqWdxFoqTHbxgjg3uJjsxVKsMXG
+wchOtedVfcDZihoqrPoO+pwjP8LIt+iv53luaUJowosC8K62OcjL1ay9Gw4a8KMQ
+9pEr9HgjAj9abel0q+ic4reLcCh+bjFSBzXR8/uJHjmSsWHNlwyXJq5Ymff7T2xJ
+s75vYuHI9bcOqqb2X1r5TY6v34p13PzKuzL7Y5la1ZCPo0nXjCne5NcSTxu9sQY5
+jl9BsVwWONGSZHsNlW6dy3XeXRaAFAPDCHJvqEsP8cgxMd9ryLG2niITVBGrs3jV
+Q3ylNTsM5G7/As6PR5hYYmTqCBBXJWizJmENMJq0zXinNe83ycWmKikACUXtBDlO
+qfRr3op3DAxdcNWbfCG7l9Ifoyr6w7HYDHEA6mMSsZ0MSSaiWcnhBc4ul5P4JUN8
+1p9/4o2WV6lfT2c6VmCfx4W4d5w3pgEVRHakvGzXE59datTZs1AQREG9G87jEd7R
+wv/RT1q+dA

[spark] branch branch-3.0 updated (740da34 -> f6053b9)

2020-05-18 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 740da34  [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern 
letters
 add 29853ec  Preparing Spark release v3.0.0-rc2
 new f6053b9  Preparing development version 3.0.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v3.0.0-rc2

2020-05-18 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to tag v3.0.0-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 29853eca69bceefd227cbe8421a09c116b7b753a
Author: Reynold Xin 
AuthorDate: Mon May 18 13:21:37 2020 +

Preparing Spark release v3.0.0-rc2
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 21f3eaa..3bad429 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.0.1
+Version: 3.0.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 8bef9d8..0a52a00 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index fc1441d..fa4fcb1f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index de2a6fb..14a1b7d 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 6c0c016..e75a843 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index b8df191..004af0a 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 8119709..a35156a 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.1-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/ta

[spark] tag v3.0.0-rc2 created (now 29853ec)

2020-05-18 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to tag v3.0.0-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 29853ec  (commit)
This tag includes the following new commits:

 new 29853ec  Preparing Spark release v3.0.0-rc2

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing development version 3.0.1-SNAPSHOT

2020-05-18 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit f6053b94f874c62856baa7bfa35df14c78bebc9f
Author: Reynold Xin 
AuthorDate: Mon May 18 13:21:43 2020 +

Preparing development version 3.0.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 3bad429..21f3eaa 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.0.0
+Version: 3.0.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 0a52a00..8bef9d8 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index fa4fcb1f..fc1441d 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 14a1b7d..de2a6fb 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index e75a843..6c0c016 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 004af0a..b8df191 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index a35156a..8119709 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --g

svn commit: r38759 - in /dev/spark/v3.0.0-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2020-03-31 Thread rxin

Author: rxin
Date: Tue Mar 31 13:45:27 2020
New Revision: 38759

Log:
Apache Spark v3.0.0-rc1 docs


[This commit notification would consist of 1911 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r38754 - /dev/spark/v3.0.0-rc1-bin/

2020-03-31 Thread rxin

Author: rxin
Date: Tue Mar 31 09:57:10 2020
New Revision: 38754

Log:
Apache Spark v3.0.0-rc1

Added:
dev/spark/v3.0.0-rc1-bin/
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz   (with 
props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.sha512

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc Tue Mar 31 09:57:10 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6C/0sQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZtCiD/9GtNXfxGR9oh2B4k+fg38uCrloGUYo3Dx9
+eJU6G55fbKtXK24dKlxZQCVDpwLihycnLULcV+/D75vWa4tSoG6n/FTHimCnUJWQ
+UkEsxqhWuGi25rUx4VsOQeHPYIP9/2pVGVyanFzRp+yAyldATGG36u3Xv5lqox6b
+6pARVwC6FZWKuk1b47xbRfYKUoNTkObhGjcKKyigexqx/nZOp99NP+sVlEqRD/l/
+B7l3kgAVq3XlZKUCkMhWgAHT6rPNkvwBdYZFce9gJHuG75Zw5rQ2hHesEqDOVlC1
+kqJPtpmb2U93ItBF6ArlmXcm+60rLa++B8cyrEsKLIyYxRpHH1bQmLB9TTzDeFpz
+e+WWlUiDpC1Lorzvg+44MeOXSj9EhNgqsYypGKhlh6WTN8A+BRzvJRMpDMLElRz6
+lHaceqn9NC4eE5tzcyXAFL+8Y644nCTIZQuND72LvIv7rO0YXq/6yeudM+SDeANU
+vscR4LiQ7/a3oSpxoIuA0MjKz6gWUaYFgsb8OuUC4VQPJKQZG+57SOazq1VTlB6/
+Ur8pePIUxU52EmzmIp08ws8v+NOo9pMxw7lyBwpmGX0/ax6p9v1xVcCeXqH4HYvA
+9d7a7hZy9yoguAGsVkibSym8e6XITCDoXLb9/HPEhfdyxFgi87DVjKZ84HkyFw9/
+OzHhumSp/Q==
+=zl/N
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512
==
--- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 Tue Mar 31 09:57:10 2020
@@ -0,0 +1,3 @@
+SparkR_3.0.0.tar.gz: C2D9C0A5 E71C5B56 48AC15AA 998ABD06 2FDB4D5C D2B7C344
+ B1949A7B 28508364 A9A45767 F2642F17 7EBFF4B0 55823EBD
+ BE76A2CE 5604660F 62D1654D 8271287B

Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc Tue Mar 31 09:57:10 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6C/0wQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZkfTD/4zQ5FuCr+giluZHaBnaZy7PAtSkoTjAWKX
+8zObXESsoTlIIjHEpBUmUU6O0tZODFOF7Zau9HkftroGurYxpTWE5nX0e//71JuC
+smBWLCgAeOlNEdeZUd2zm7pPWJfwRpsOcEfexb+RvaFQriw559Erxb5NoWHFIkg/
+tsjtjitMqLxcMlzZW7A/89zqmrnzBu1vhh/q8STzA0Ub6Jq+JzD4e6yatYAzjRj3
++Um7+NL+g/2tmweH8f9TtYzQFcowm6DdXi53fWZX55oVc1xBRTNuSnAdCJlkgEPg
+nUxEcuXUvHn/NbNNHPBwP6xMKyKqJu8+4vNLzr2ZxaxArPYF2FqTl8sFNxwVBM1Y
+PnKun7iZiLq5JqC2OopiDa8FJP0JQkYVyBWAx3BOscsAELfdlZHlPdekcLE6YHHV
+pde79YJ0tzUFIdH/Ulw4Jag4Ixunrg+ajmLS8n9ncpX0I81Zv8IJDaBf0cBboFw8
+kTqAvNkcsoGdRn1OiQnlE2IUib/R0fk7MktOyoZpfKzbCzxBZgLTO4FKTbRCydQX
+I8UhuRhELHCI7YXJHwbk0Swp6+h36dUQtLxFfD/OZdDQABOK+nEVjNsBIHb7ULDB
+pCckj8HBHwaynvNLogS1KJHThW8LEXAmVQFCD39XTNMnhfCUePyzlAC4RPByIFR4
+yD6VQ7bJDA

svn commit: r38753 - /dev/spark/v3.0.0-rc1-bin/

2020-03-31 Thread rxin

Author: rxin
Date: Tue Mar 31 07:25:15 2020
New Revision: 38753

Log:
retry

Removed:
dev/spark/v3.0.0-rc1-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r38740 - /dev/spark/v3.0.0-rc1-bin/

2020-03-30 Thread rxin

Author: rxin
Date: Mon Mar 30 16:00:46 2020
New Revision: 38740

Log:
Apache Spark v3.0.0-rc1

Added:
dev/spark/v3.0.0-rc1-bin/
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz   (with props)
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc
dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz   (with 
props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz   (with props)
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.asc
dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.sha512

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc Mon Mar 30 16:00:46 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6CCPMQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9Zr8LD/9WOO4mDufkmhhXk78zWAyhRjJpG0Kjuvla
+KEnx8MK4MUtr77cQsmVLgj+FXFwmUvtZTZXHJX704Jk6xAAFXzii4EwIfk46wka0
+CY0arEleHJ6MBohLbOVW3sp86LduQBBd+dmBbIh7spJjd054RRqsAe8sVx0uqezD
+y4Fv+LM0B7kQhHdhsYymVClAwgwKOwecdks0l9PonE9YwyJixMEOZwxxk4aaRNwR
+VUH6X4mHlpWiQ+zHWTAmE7aOvjOwxQqciqtmgzLLRlDjuTtz160XLthUneoOVoDw
+spphs7pMpj8r4T9BZQCeIiuRvE5VeT6037Uz03X56xhzEvna9+0/frHR/Vb88gW8
+U5YJio4p8h286vLwb0X48K7lyfd60VM0kyfh31xl1ZppdAFXhV9qA7435wn6R4NU
+1zi/oXnHOgAWW037C+QFXpPnKzCY3BpmLw3uAGMgYRA+2NqrAT2HE8vmnlxJkrBS
+JT3OlJCCkIw2yitPN5zZaWZLpbvT07wFEH8KFoh7Wgs4FBl1mDeyGT53RhbSHjy1
++i85E6g9366CZNoD3bSUlPlY9iOtP4QK4Qp+VOn1j13Bu3BE9Fpuprani1ESsGME
+16qzwf5It3TVWK9czXqa8HBJvlrjaEInloWThmSysYFweKIRT+8CEu9+KyakTKVL
+fnGKXfbXzQ==
+=0ZBt
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512
==
--- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 Mon Mar 30 16:00:46 2020
@@ -0,0 +1,3 @@
+SparkR_3.0.0.tar.gz: A4828C8D BA3BA1AA 116EEA62 D7028B85 85FF87AE 8AE9F0B5
+ 421F1A3E E5E04F19 F1D4F0A6 144CEF29 8D690FC8 D9836830
+ 4518FF9E 96004114 1083326B 84B5C0EC

Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc
==
--- dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc Mon Mar 30 16:00:46 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6CCPUQHHJ4aW5AYXBh
+Y2hlLm9yZwAKCRDeqWPi6TR9ZmRGD/9UkePDo4IawkYALJoaqpwnjp1Md3RP5dbK
+l/x1VLfHzAkbYQo+tKe692koHo45tE0izt+99humvZT7SjP4sVPHuR16Ik0gE6h0
+Yn8CG4Qsof30Se9feg6EllACBDEvueGlcchHN+aPyYJoLjajAzfH/5P6fC9rHe5Z
+d3aYd93cqYtIKbDtQ6fxnI387wTmWkVKAXWNB7K5iEB8KFjzCjGeyac5JbnYBC6G
+Y9uWcxqQ+3XV2SIfDQuxFuj421RBx2IIu56qJLgVEzcs8yLh4APM29DfYv7YcRGg
+ILex3j8SWjgqG1rdDhc2U/SeakR/rErJ+oebxD9dTC19wMTnp37cgS0HgtWLHaU2
+RvxaMdAvF3GjN2LFhSRht/uZV350O3EI+L6ye9WauXzaK4iD7Mi5x7BIBN1csNWn
+MW0B+goqTpzvC78h5R2ETCw1xmAarjKmdLKf3AUuqGeobv/7+4sLuwq+PSyrTgUi
+BHPIgkYYk+EhHryB6wLkKYRXWKKmMyGCl+5HLYPuY4GyZm4rwc2et8v1pX3RvcCF
+NoOcg/TZgn6+Tz0OjUm4TARs9RkbJEhKk1EWKCFvPalhenLbHHOvDJJPoqp3LNVT
+/HQ1f1JRWqXWfc/O1BR9CRFNbZTxKorPxMXIEYn583lufZyvWiyAnYKD6ev0UAdB
+/iwwQeeM/Q

[spark] 01/01: Preparing development version 3.0.1-SNAPSHOT

2020-03-30 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit fc5079841907443369af98b17c20f1ac24b3727d
Author: Reynold Xin 
AuthorDate: Mon Mar 30 08:42:27 2020 +

Preparing development version 3.0.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index c8cb1c3..3eff30b 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.0.0
+Version: 3.0.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 0a52a00..8bef9d8 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index fa4fcb1f..fc1441d 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 14a1b7d..de2a6fb 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index e75a843..6c0c016 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 004af0a..b8df191 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index a35156a..8119709 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0
+3.0.1-SNAPSHOT
 ../../pom.xml
   
 
diff --g

[spark] branch branch-3.0 updated (5687b31 -> fc50798)

2020-03-30 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5687b31  [SPARK-30532] DataFrameStatFunctions to work with 
TABLE.COLUMN syntax
 add 6550d0d  Preparing Spark release v3.0.0-rc1
 new fc50798  Preparing development version 3.0.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] tag v3.0.0-rc1 created (now 6550d0d)

2020-03-30 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to tag v3.0.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 6550d0d  (commit)
This tag includes the following new commits:

 new 6550d0d  Preparing Spark release v3.0.0-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v3.0.0-rc1

2020-03-30 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to tag v3.0.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1
Author: Reynold Xin 
AuthorDate: Mon Mar 30 08:42:10 2020 +

Preparing Spark release v3.0.0-rc1
---
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 2 +-
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 38 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index 193ad3d..0a52a00 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index a1c8a8e..fa4fcb1f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 163c250..14a1b7d 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index a6d9981..e75a843 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 76a402b..004af0a 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 3c3c0d2..a35156a 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index 883b73a..dedc7df 100644
--- a/common/tags/pom.xml
+++ b/common/tags/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml
index 93a4f67..ebb0525 100644
--- a/common/unsafe/pom.xml
+++ b/common/unsafe/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.0.0-SNAPSHOT
+3.0.0
 ../../pom.xml
   
 
diff --git

svn commit: r38725 - /dev/spark/KEYS

2020-03-30 Thread rxin

Author: rxin
Date: Mon Mar 30 07:26:00 2020
New Revision: 38725

Log:
Update KEYS

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Mon Mar 30 07:26:00 2020
@@ -1167,3 +1167,61 @@ rMA+YcuC9o2K7dKjVv3KinQ2Tiv4TVxyTjcyZurg
 0TbepIdiQlc=
 =wdlY
 -END PGP PUBLIC KEY BLOCK-
+
+pub   rsa4096 2020-03-30 [SC]
+  4A8BDA48E6E212A734632502DEA963E2E9347D66
+uid   [ultimate] Reynold Xin (CODE SIGNING KEY) 
+sub   rsa4096 2020-03-30 [E]
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBF6BkJkBEACmRKcV6c575E6jOyZBwLteV7hJsETNYx9jMkENiyeyTFJ3A8Hg
++gPAmoU6jvzugR98qgVSH0uj/HZH1zEkJx049+OHwBcZ48mGJakIaKcg3k1CPRTL
+VDRWg7M4P7nQisMHsPHrdGPJFVBE7Mn6pafuRZ46gtnXf2Ec1EsvMBOYjRNt6nSg
+GvoQdiv5SjUuwxfrw7CICj1agxwLarBcWpIF6PMU7yG+XjTIrSM63KuuV+fOZvKM
+AdjwwUNNj2aOkprPHfmFIgSnEMsxvoJQNqYTaWzwT8WAyW1qTd0LhYYDTnb4J+j2
+BxgG5ASHYpsLQ1Moy+lYsTxWsoZMvqTqv/h+Mlb8fiUTiYppeMnLzxtI/t8Trvt8
+rXNGSkNd8dM5uqJ9Ba2MS6UB6EZUd5e7aPy8z5ThlhygRjLk0527O4BYAWlZw5F8
+egq/X0liCeRHoFUsyNnuQYSqo2spdTIV2ExKo/hEF1FgbXF6s1v/TcfzS0PkSYEH
+5yhKYoEkYOXIneIjUasy8xM9O2578NsVu1GH0n+E29KDA0w+QKwpbjgb9VWKCjk1
+CPvK7oi3DKA4A28w/h5jI9Xzb343L0gb+IhdgL5lNWp2HoSy+y7Smnbz6IchjAP7
+zCtQ9ZJCLdXgCtDlXUeF+TXzEfKUYwa0jnha/fArM3PVGvQlWdpVhe/oLQARAQAB
+tDBSZXlub2xkIFhpbiAoQ09ERSBTSUdOSU5HIEtFWSkgPHJ4aW5AYXBhY2hlLm9y
+Zz6JAk4EEwEIADgWIQRKi9pI5uISpzRjJQLeqWPi6TR9ZgUCXoGQmQIbAwULCQgH
+AgYVCgkICwIEFgIDAQIeAQIXgAAKCRDeqWPi6TR9ZrBJEACW92VdruNL+dYYH0Cu
+9oxZx0thCE1twc/6rvgvIj//0kZ4ZA6RoDId8vSmKSkB0GwMT7daIoeIvRTiEdMQ
+Wai7zqvNEdT1qdNn7MfN1rveN1tBNVndzbZ8S8Nz4sqZ/8R3wG90c2XLwno3joXA
+FhFRfVa+TWI1Ux84/ZXuzD14f54dorVo0CT51CnU67ERBAijl7UugPM3Fs7ApU/o
+SWCMq7ScPde81jmgMqBDLcj/hueCOTU5m8irOGGY439qEF+H41I+IB60yzAS4Gez
+xZl55Mv7ZKdwWtCcwtUYIm4R8NNu4alTxUpxw4ttRW3Kzue78TOIMTWTwRKrP5t2
+yq9bMT1fSO7h/Ntn8dXUL0EM/h+6k5py5Kr0+mrV/s0Z530Fit6AC/ReWV6hSGdk
+F1Z1ECa4AoUHqtoQKL+CNgO2qlJn/sKj3g10NiSwqUdUuxCSOpsY72udRLG9tfkB
+OwW3lTKLp66gYYE3nYaHzJKGdRs7aJ8RRALMQkadsyqpdVMp+Yvbj/3Hn3uB3jTt
+S+RolH545toeuhXaiIWlm2434oHW6QjzpPwaNp5AiWm+vMfPkhhCX6WT0jv9nEtM
+kJJVgwlWNKYEW9nLaIRMWWONSy9aJapZfLW0XDiKidibPHqNFih9z49eDVLobi5e
+mzmOFkKFxs9D4sg9oVmId6Y9SbkCDQRegZCZARAA5ZMv1ki5mKJVpASRGfTHVH5o
+9HixwJOinkHjSK3zFpuvh0bs+rKZL2+TUXci9Em64xXuYbiGH3YgH061H9tgAMaN
+iSIFGPlbBPbduJjdiUALqauOjjCIoWJLyuAC25zSGCeAwzQiRXN6VJUYwjQnDMDG
+8iUyL+IdXjq2T6vFVZGR/uVteRqqvEcg9km6IrFmXefqfry4hZ5a7SbmThCHqGxx
+5Oy+VkWw1IP7fHIUdC9ie45X6n08yC2BfWI4+RBny8906pSXEN/ag0Yw7vWkiyuK
+wZsoe0pRczV8mx6QF2+oJjRMtziKYW72jKE9a/DXXzQ3Luq5gyZeq0cluYNGHVdj
+ijA2ORNLloAfGjVGRKVznUFN8LMkcxm4jiiHKRkZEcjgm+1tRzGPufFidyhQIYO2
+YCOpnPQh5IXznb3RZ0JqJcXdne+7Nge85URTEMmMyx5kXvD03ZmUObshDL12YoM3
+bGzObo6jYg+h38Xlx9+9QAwGkf+gApIPI8KqPAVyP6s60AR4iR6iehEOciz7h6/b
+T9bKMw0w9cvyJzY1IJsy2sQYFwNyHYWQkyDciRAmIwriHhBDfXdBodF95V3uGbIp
+DZw3jVxcgJWKZ3y65N1aCguEI1fyy9JU12++GMBa+wuv9kdhSoj2qgInFB1VXGC7
+bBlRnHB44tsFTBEqqOcAEQEAAYkCNgQYAQgAIBYhBEqL2kjm4hKnNGMlAt6pY+Lp
+NH1mBQJegZCZAhsMAAoJEN6pY+LpNH1mwIYQAIRqbhEjL6uMxM19OMPDydbhiWoI
+8BmoqzsvRNF9VidjPRicYJ5JL5FFvvTyT6g87L8aRhiAdX/la92PdJ9DTS3sfIKF
+pIcUDFybKgk4pmGWl0fNIwEjHewf6HlndCFmVuPe32V/ZkCwb58dro15xzxblckB
+kgsqb0Xbfz/3Iwlqr5eTKH5iPrDFcYKy1ODcFmXS+udMm5uwn+d/RNmj8B3kgwrw
+brs53264qdWbfsxGPC1ZkDNNSRyIy6wGvc/diRm4TSV/Lmd5OoDX4UkPJ++JhGoO
+cYKxc2KzrEZxzMgJ3xFRs3zeymOwtgXUU1GBCuD7uxr1vacFwUV+9ymTeyUdTxB3
++/DzxYOJGQL/3IXlyQ2azoCWUpCjW0MFM1OolragOFJeQ+V0xrlOiXXAFfHo0KPG
+y0QdK810Ok+XYR6U9Y7yb6tYDgi+w9r46XjurdiZnUxxLUpFG++tSgBQ5X4y2UGw
+C4n0T8/jn6KIUZ0kx51ZZ6CEChjBt+AU+HCnw2sZfgq8Nlos95tw2MT6kn8BrY68
+n297ev/1T6B0OasQaw3Itw29+T+FdzdU4c6XW/rC6VAlBikWIS5zCT//vAeBacxL
+HYoqwKL52HzG121lfWXhx5vNF4bg/fKrFEOy2Wp1fMG6nRcuUUROvieD6ZU4ZrLA
+NjpTIP+lOkfxRwUi
+=rggH
+-END PGP PUBLIC KEY BLOCK-



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch test-branch deleted (was 0f8b07e)

2019-02-01 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to branch test-branch
in repository https://gitbox.apache.org/repos/asf/spark.git.


 was 0f8b07e  test

This change permanently discards the following revisions:

 discard 0f8b07e  test


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch test-branch created (now 0f8b07e)

2019-02-01 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a change to branch test-branch
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 0f8b07e  test

This branch includes the following new commits:

 new 0f8b07e  test

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: test

2019-02-01 Thread rxin

This is an automated email from the ASF dual-hosted git repository.

rxin pushed a commit to branch test-branch
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 0f8b07e5034af2819b75b53aadffda82ae0c31b8
Author: Reynold Xin 
AuthorDate: Fri Feb 1 13:28:18 2019 -0800

test
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 271f2f5..2c1e02a 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ For general development tips, including info on developing 
Spark using an IDE, s
 
 The easiest way to start using Spark is through the Scala shell:
 
-./bin/spark-shell
+./bin/spark-shella
 
 Try the following command, which should return 1000:
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-26142] followup: Move sql shuffle read metrics relatives to SQLShuffleMetricsReporter

2018-11-29 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 9fdc7a840 -> cb368f2c2


[SPARK-26142] followup: Move sql shuffle read metrics relatives to 
SQLShuffleMetricsReporter

## What changes were proposed in this pull request?

Follow up for https://github.com/apache/spark/pull/23128, move sql read metrics 
relatives to `SQLShuffleMetricsReporter`, in order to put sql shuffle read 
metrics relatives closer and avoid possible problem about forgetting update 
SQLShuffleMetricsReporter while new metrics added by others.

## How was this patch tested?

Existing tests.

Closes #23175 from xuanyuanking/SPARK-26142-follow.

Authored-by: Yuanjian Li 
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cb368f2c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cb368f2c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cb368f2c

Branch: refs/heads/master
Commit: cb368f2c2964797d7313d3a4151e2352ff7847a9
Parents: 9fdc7a8
Author: Yuanjian Li 
Authored: Thu Nov 29 12:09:30 2018 -0800
Committer: Reynold Xin 
Committed: Thu Nov 29 12:09:30 2018 -0800

--
 .../exchange/ShuffleExchangeExec.scala  |  4 +-
 .../org/apache/spark/sql/execution/limit.scala  |  6 +--
 .../spark/sql/execution/metric/SQLMetrics.scala | 20 
 .../metric/SQLShuffleMetricsReporter.scala  | 50 
 .../execution/UnsafeRowSerializerSuite.scala|  4 +-
 5 files changed, 47 insertions(+), 37 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cb368f2c/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
index 8938d93..c9ca395 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
@@ -30,7 +30,7 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, 
BoundReference, Uns
 import 
org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering
 import org.apache.spark.sql.catalyst.plans.physical._
 import org.apache.spark.sql.execution._
-import org.apache.spark.sql.execution.metric.SQLMetrics
+import org.apache.spark.sql.execution.metric.{SQLMetrics, 
SQLShuffleMetricsReporter}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.StructType
 import org.apache.spark.util.MutablePair
@@ -49,7 +49,7 @@ case class ShuffleExchangeExec(
 
   override lazy val metrics = Map(
 "dataSize" -> SQLMetrics.createSizeMetric(sparkContext, "data size")
-  ) ++ SQLMetrics.getShuffleReadMetrics(sparkContext)
+  ) ++ SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext)
 
   override def nodeName: String = {
 val extraInfo = coordinator match {

http://git-wip-us.apache.org/repos/asf/spark/blob/cb368f2c/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
index ea845da..e9ab7cd 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
@@ -25,7 +25,7 @@ import 
org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodeGe
 import org.apache.spark.sql.catalyst.plans.physical._
 import org.apache.spark.sql.catalyst.util.truncatedString
 import org.apache.spark.sql.execution.exchange.ShuffleExchangeExec
-import org.apache.spark.sql.execution.metric.SQLMetrics
+import org.apache.spark.sql.execution.metric.SQLShuffleMetricsReporter
 
 /**
  * Take the first `limit` elements and collect them to a single partition.
@@ -38,7 +38,7 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) 
extends UnaryExecNode
   override def outputPartitioning: Partitioning = SinglePartition
   override def executeCollect(): Array[InternalRow] = child.executeTake(limit)
   private val serializer: Serializer = new 
UnsafeRowSerializer(child.output.size)
-  override lazy val metrics = SQLMetrics.getShuffleReadMetrics(sparkContext)
+  override lazy val metrics = 
SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext)
   protected override def doExecute(): RDD[InternalRow] = {
 val locallyLimited = child.execute().mapPartitionsInternal(_.take(limit))
 val shuffled = new ShuffledRowRDD(
@@ -154,7 +154,7 @@ case class TakeOrderedAndProjectExec(

spark git commit: [SPARK-26141] Enable custom metrics implementation in shuffle write

2018-11-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 85383d29e -> 6a064ba8f


[SPARK-26141] Enable custom metrics implementation in shuffle write

## What changes were proposed in this pull request?
This is the write side counterpart to https://github.com/apache/spark/pull/23105

## How was this patch tested?
No behavior change expected, as it is a straightforward refactoring. Updated 
all existing test cases.

Closes #23106 from rxin/SPARK-26141.

Authored-by: Reynold Xin 
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a064ba8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a064ba8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a064ba8

Branch: refs/heads/master
Commit: 6a064ba8f271d5f9d04acd41d0eea50a5b0f5018
Parents: 85383d2
Author: Reynold Xin 
Authored: Mon Nov 26 22:35:52 2018 -0800
Committer: Reynold Xin 
Committed: Mon Nov 26 22:35:52 2018 -0800

--
 .../sort/BypassMergeSortShuffleWriter.java| 11 +--
 .../spark/shuffle/sort/ShuffleExternalSorter.java | 18 --
 .../spark/shuffle/sort/UnsafeShuffleWriter.java   |  9 +
 .../spark/storage/TimeTrackingOutputStream.java   |  7 ---
 .../spark/executor/ShuffleWriteMetrics.scala  | 13 +++--
 .../apache/spark/scheduler/ShuffleMapTask.scala   |  3 ++-
 .../org/apache/spark/shuffle/ShuffleManager.scala |  6 +-
 .../spark/shuffle/sort/SortShuffleManager.scala   | 10 ++
 .../org/apache/spark/storage/BlockManager.scala   |  7 +++
 .../spark/storage/DiskBlockObjectWriter.scala |  4 ++--
 .../spark/util/collection/ExternalSorter.scala|  4 ++--
 .../shuffle/sort/UnsafeShuffleWriterSuite.java|  6 --
 .../scala/org/apache/spark/ShuffleSuite.scala | 12 
 .../sort/BypassMergeSortShuffleWriterSuite.scala  | 16 
 project/MimaExcludes.scala|  7 ++-
 15 files changed, 79 insertions(+), 54 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6a064ba8/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
 
b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
index b020a6d..fda33cd 100644
--- 
a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
+++ 
b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
@@ -37,12 +37,11 @@ import org.slf4j.LoggerFactory;
 import org.apache.spark.Partitioner;
 import org.apache.spark.ShuffleDependency;
 import org.apache.spark.SparkConf;
-import org.apache.spark.TaskContext;
-import org.apache.spark.executor.ShuffleWriteMetrics;
 import org.apache.spark.scheduler.MapStatus;
 import org.apache.spark.scheduler.MapStatus$;
 import org.apache.spark.serializer.Serializer;
 import org.apache.spark.serializer.SerializerInstance;
+import org.apache.spark.shuffle.ShuffleWriteMetricsReporter;
 import org.apache.spark.shuffle.IndexShuffleBlockResolver;
 import org.apache.spark.shuffle.ShuffleWriter;
 import org.apache.spark.storage.*;
@@ -79,7 +78,7 @@ final class BypassMergeSortShuffleWriter extends 
ShuffleWriter {
   private final int numPartitions;
   private final BlockManager blockManager;
   private final Partitioner partitioner;
-  private final ShuffleWriteMetrics writeMetrics;
+  private final ShuffleWriteMetricsReporter writeMetrics;
   private final int shuffleId;
   private final int mapId;
   private final Serializer serializer;
@@ -103,8 +102,8 @@ final class BypassMergeSortShuffleWriter extends 
ShuffleWriter {
   IndexShuffleBlockResolver shuffleBlockResolver,
   BypassMergeSortShuffleHandle handle,
   int mapId,
-  TaskContext taskContext,
-  SparkConf conf) {
+  SparkConf conf,
+  ShuffleWriteMetricsReporter writeMetrics) {
 // Use getSizeAsKb (not bytes) to maintain backwards compatibility if no 
units are provided
 this.fileBufferSize = (int) conf.getSizeAsKb("spark.shuffle.file.buffer", 
"32k") * 1024;
 this.transferToEnabled = conf.getBoolean("spark.file.transferTo", true);
@@ -114,7 +113,7 @@ final class BypassMergeSortShuffleWriter extends 
ShuffleWriter {
 this.shuffleId = dep.shuffleId();
 this.partitioner = dep.partitioner();
 this.numPartitions = partitioner.numPartitions();
-this.writeMetrics = taskContext.taskMetrics().shuffleWriteMetrics();
+this.writeMetrics = writeMetrics;
 this.serializer = dep.serializer();
 this.shuffleBlockResolver = shuffleBlockResolver;
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/6a064

spark git commit: [SPARK-26129][SQL] Instrumentation for per-query planning time

2018-11-21 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 6bbdf34ba -> 07a700b37


[SPARK-26129][SQL] Instrumentation for per-query planning time

## What changes were proposed in this pull request?
We currently don't have good visibility into query planning time (analysis vs 
optimization vs physical planning). This patch adds a simple utility to track 
the runtime of various rules and various planning phases.

## How was this patch tested?
Added unit tests and end-to-end integration tests.

Closes #23096 from rxin/SPARK-26129.

Authored-by: Reynold Xin 
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/07a700b3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/07a700b3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/07a700b3

Branch: refs/heads/master
Commit: 07a700b3711057553dfbb7b047216565726509c7
Parents: 6bbdf34
Author: Reynold Xin 
Authored: Wed Nov 21 16:41:12 2018 +0100
Committer: Reynold Xin 
Committed: Wed Nov 21 16:41:12 2018 +0100

--
 .../sql/catalyst/QueryPlanningTracker.scala | 127 +++
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  22 ++--
 .../spark/sql/catalyst/rules/RuleExecutor.scala |  19 ++-
 .../catalyst/QueryPlanningTrackerSuite.scala|  78 
 .../sql/catalyst/analysis/AnalysisTest.scala|   3 +-
 .../ResolveGroupingAnalyticsSuite.scala |   3 +-
 .../analysis/ResolvedUuidExpressionsSuite.scala |  10 +-
 .../scala/org/apache/spark/sql/Dataset.scala|   9 ++
 .../org/apache/spark/sql/SparkSession.scala |   6 +-
 .../spark/sql/execution/QueryExecution.scala|  21 ++-
 .../QueryPlanningTrackerEndToEndSuite.scala |  52 
 .../apache/spark/sql/hive/test/TestHive.scala   |  16 ++-
 12 files changed, 338 insertions(+), 28 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/07a700b3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala
new file mode 100644
index 000..420f2a1
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.util.BoundedPriorityQueue
+
+
+/**
+ * A simple utility for tracking runtime and associated stats in query 
planning.
+ *
+ * There are two separate concepts we track:
+ *
+ * 1. Phases: These are broad scope phases in query planning, as listed below, 
i.e. analysis,
+ * optimizationm and physical planning (just planning).
+ *
+ * 2. Rules: These are the individual Catalyst rules that we track. In 
addition to time, we also
+ * track the number of invocations and effective invocations.
+ */
+object QueryPlanningTracker {
+
+  // Define a list of common phases here.
+  val PARSING = "parsing"
+  val ANALYSIS = "analysis"
+  val OPTIMIZATION = "optimization"
+  val PLANNING = "planning"
+
+  class RuleSummary(
+var totalTimeNs: Long, var numInvocations: Long, var 
numEffectiveInvocations: Long) {
+
+def this() = this(totalTimeNs = 0, numInvocations = 0, 
numEffectiveInvocations = 0)
+
+override def toString: String = {
+  s"RuleSummary($totalTimeNs, $numInvocations, $numEffectiveInvocations)"
+}
+  }
+
+  /**
+   * A thread local variable to implicitly pass the tracker around. This 
assumes the query planner
+   * is single-threaded, and avoids passing the same tracker context in every 
function call.
+   */
+  private val localTracker = new ThreadLocal[QueryPlanningTracker]() {
+override def initialValue: QueryPlanningTracker = null
+  }
+
+  /** Returns the current tra

spark-website git commit: Use Heilmeier Catechism for SPIP template.

2018-10-25 Thread rxin

Repository: spark-website
Updated Branches:
  refs/heads/asf-site e4b87718d -> 005a2a0d1


Use Heilmeier Catechism for SPIP template.


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/005a2a0d
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/005a2a0d
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/005a2a0d

Branch: refs/heads/asf-site
Commit: 005a2a0d1d88c893518d98cddcb7d373a562b339
Parents: e4b8771
Author: Reynold Xin 
Authored: Wed Oct 24 11:51:43 2018 -0700
Committer: Reynold Xin 
Committed: Thu Oct 25 11:25:30 2018 -0700

--
 improvement-proposals.md| 34 ++
 site/improvement-proposals.html | 32 
 2 files changed, 42 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark-website/blob/005a2a0d/improvement-proposals.md
--
diff --git a/improvement-proposals.md b/improvement-proposals.md
index 8fab696..55d57d9 100644
--- a/improvement-proposals.md
+++ b/improvement-proposals.md
@@ -11,7 +11,7 @@ navigation:
 
 The purpose of an SPIP is to inform and involve the user community in major 
improvements to the Spark codebase throughout the development process, to 
increase the likelihood that user needs are met.
 
-SPIPs should be used for significant user-facing or cross-cutting changes, not 
small incremental  improvements. When in doubt, if a committer thinks a change 
needs an SPIP, it does.
+SPIPs should be used for significant user-facing or cross-cutting changes, not 
small incremental improvements. When in doubt, if a committer thinks a change 
needs an SPIP, it does.
 
 What is a SPIP?
 
@@ -48,30 +48,40 @@ Any community member can help by 
discussing whether an SPIP is
 SPIP Process
 Proposing an SPIP
 
-Anyone may propose an SPIP, using the template below. Please only submit an 
SPIP if you are willing to help, at least with discussion.
+Anyone may propose an SPIP, using the document template below. Please only 
submit an SPIP if you are willing to help, at least with discussion.
 
 After a SPIP is created, the author should email mailto:d...@spark.apache.org;>d...@spark.apache.org to notify the 
community of the SPIP, and discussions should ensue on the JIRA ticket.
 
 If an SPIP is too small or incremental and should have been done through the 
normal JIRA process, a committer should remove the SPIP label.
 
 
-Template for an SPIP
+SPIP Document Template
 
-
-Background and Motivation: What problem is this solving?
+A SPIP document is a short document with a few questions, inspired by the 
Heilmeier Catechism:
 
-Target Personas: Examples include data scientists, data 
engineers, library developers, devops. A single SPIP can have multiple target 
personas.
+Q1. What are you trying to do? Articulate your objectives using 
absolutely no jargon.
 
-Goals: What must this allow users to do, that they can't 
currently?
+Q2. What problem is this proposal NOT designed to solve?
 
-Non-Goals: What problem is this proposal not designed to 
solve?
+Q3. How is it done today, and what are the limits of current practice?
 
-Proposed API Changes: Optional section defining APIs changes, if 
any. Backward and forward compatibility must be taken into account.
+Q4. What is new in your approach and why do you think it will be 
successful?
+
+Q5. Who cares? If you are successful, what difference will it make?
+
+Q6. What are the risks?
+
+Q7. How long will it take?
+
+Q8. What are the mid-term and final âexamsâ to check for success?
+
+Appendix A. Proposed API Changes. Optional section defining APIs 
changes, if any. Backward and forward compatibility must be taken into account.
+
+Appendix B. Optional Design Sketch: How are the goals going to be 
accomplished? Give sufficient technical detail to allow a contributor to judge 
whether it's likely to be feasible. Note that this is not a full design 
document.
+
+Appendix C. Optional Rejected Designs: What alternatives were 
considered? Why were they rejected? If no alternatives have been considered, 
the problem needs more thought.
 
-Optional Design Sketch: How are the goals going to be 
accomplished? Give sufficient technical detail to allow a contributor to judge 
whether it's likely to be feasible. This is not a full design document.
 
-Optional Rejected Designs: What alternatives were considered? Why 
were they rejected? If no alternatives have been considered, the problem needs 
more thought.
-
 
 Discussing an SPIP
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/005a2a0d/site/improvement-proposals.html
--
diff --git a/site/improvement-proposals.html

spark git commit: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled

2018-09-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 99ae693b3 -> 535bf1cc9


[SPARK-24157][SS][FOLLOWUP] Rename to 
spark.sql.streaming.noDataMicroBatches.enabled

## What changes were proposed in this pull request?
This patch changes the config option 
`spark.sql.streaming.noDataMicroBatchesEnabled` to 
`spark.sql.streaming.noDataMicroBatches.enabled` to be more consistent with 
rest of the configs. Unfortunately there is one streaming config called 
`spark.sql.streaming.metricsEnabled`. For that one we should just use a 
fallback config and change it in a separate patch.

## How was this patch tested?
Made sure no other references to this config are in the code base:
```
> git grep "noDataMicro"
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
buildConf("spark.sql.streaming.noDataMicroBatches.enabled")
```

Closes #22476 from rxin/SPARK-24157.

Authored-by: Reynold Xin 
Signed-off-by: Reynold Xin 
(cherry picked from commit 936c920347e196381b48bc3656ca81a06f2ff46d)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/535bf1cc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/535bf1cc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/535bf1cc

Branch: refs/heads/branch-2.4
Commit: 535bf1cc9e6b54df7059ac3109b8cba30057d040
Parents: 99ae693
Author: Reynold Xin 
Authored: Wed Sep 19 18:51:20 2018 -0700
Committer: Reynold Xin 
Committed: Wed Sep 19 18:51:31 2018 -0700

--
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/535bf1cc/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 3e9cde4..8b82fe1 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1056,7 +1056,7 @@ object SQLConf {
   .createWithDefault(1L)
 
   val STREAMING_NO_DATA_MICRO_BATCHES_ENABLED =
-buildConf("spark.sql.streaming.noDataMicroBatchesEnabled")
+buildConf("spark.sql.streaming.noDataMicroBatches.enabled")
   .doc(
 "Whether streaming micro-batch engine will execute batches without 
data " +
   "for eager state management for stateful streaming queries.")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled

2018-09-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 90e3955f3 -> 936c92034


[SPARK-24157][SS][FOLLOWUP] Rename to 
spark.sql.streaming.noDataMicroBatches.enabled

## What changes were proposed in this pull request?
This patch changes the config option 
`spark.sql.streaming.noDataMicroBatchesEnabled` to 
`spark.sql.streaming.noDataMicroBatches.enabled` to be more consistent with 
rest of the configs. Unfortunately there is one streaming config called 
`spark.sql.streaming.metricsEnabled`. For that one we should just use a 
fallback config and change it in a separate patch.

## How was this patch tested?
Made sure no other references to this config are in the code base:
```
> git grep "noDataMicro"
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
buildConf("spark.sql.streaming.noDataMicroBatches.enabled")
```

Closes #22476 from rxin/SPARK-24157.

Authored-by: Reynold Xin 
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/936c9203
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/936c9203
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/936c9203

Branch: refs/heads/master
Commit: 936c920347e196381b48bc3656ca81a06f2ff46d
Parents: 90e3955
Author: Reynold Xin 
Authored: Wed Sep 19 18:51:20 2018 -0700
Committer: Reynold Xin 
Committed: Wed Sep 19 18:51:20 2018 -0700

--
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/936c9203/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index b1e9b17..c3328a6 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1076,7 +1076,7 @@ object SQLConf {
   .createWithDefault(1L)
 
   val STREAMING_NO_DATA_MICRO_BATCHES_ENABLED =
-buildConf("spark.sql.streaming.noDataMicroBatchesEnabled")
+buildConf("spark.sql.streaming.noDataMicroBatches.enabled")
   .doc(
 "Whether streaming micro-batch engine will execute batches without 
data " +
   "for eager state management for stateful streaming queries.")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: add one supported type missing from the javadoc

2018-06-15 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master e4fee395e -> c7c0b086a


add one supported type missing from the javadoc

## What changes were proposed in this pull request?

The supported java.math.BigInteger type is not mentioned in the javadoc of 
Encoders.bean()

## How was this patch tested?

only Javadoc fix

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: James Yu 

Closes #21544 from yuj/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c7c0b086
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c7c0b086
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c7c0b086

Branch: refs/heads/master
Commit: c7c0b086a0b18424725433ade840d5121ac2b86e
Parents: e4fee39
Author: James Yu 
Authored: Fri Jun 15 21:04:04 2018 -0700
Committer: Reynold Xin 
Committed: Fri Jun 15 21:04:04 2018 -0700

--
 sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c7c0b086/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
--
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
index 0b95a88..b47ec0b 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
@@ -132,7 +132,7 @@ object Encoders {
*  - primitive types: boolean, int, double, etc.
*  - boxed types: Boolean, Integer, Double, etc.
*  - String
-   *  - java.math.BigDecimal
+   *  - java.math.BigDecimal, java.math.BigInteger
*  - time related: java.sql.Date, java.sql.Timestamp
*  - collection types: only array and java.util.List currently, map support 
is in progress
*  - nested java bean.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[1/2] spark-website git commit: Update text/wording to more "modern" Spark and more consistent.

2018-04-12 Thread rxin

Repository: spark-website
Updated Branches:
  refs/heads/asf-site 91b561749 -> 658467248


http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/strata-exercises-now-available-online.html
--
diff --git a/site/news/strata-exercises-now-available-online.html 
b/site/news/strata-exercises-now-available-online.html
index 916f242..4f250a3 100644
--- a/site/news/strata-exercises-now-available-online.html
+++ b/site/news/strata-exercises-now-available-online.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/submit-talks-to-spark-summit-2014.html
--
diff --git a/site/news/submit-talks-to-spark-summit-2014.html 
b/site/news/submit-talks-to-spark-summit-2014.html
index 4f43c23..18f2642 100644
--- a/site/news/submit-talks-to-spark-summit-2014.html
+++ b/site/news/submit-talks-to-spark-summit-2014.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/submit-talks-to-spark-summit-2016.html
--
diff --git a/site/news/submit-talks-to-spark-summit-2016.html 
b/site/news/submit-talks-to-spark-summit-2016.html
index 3163bab..3766932 100644
--- a/site/news/submit-talks-to-spark-summit-2016.html
+++ b/site/news/submit-talks-to-spark-summit-2016.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/submit-talks-to-spark-summit-east-2016.html
--
diff --git a/site/news/submit-talks-to-spark-summit-east-2016.html 
b/site/news/submit-talks-to-spark-summit-east-2016.html
index 1984db7..b4a51a7 100644
--- a/site/news/submit-talks-to-spark-summit-east-2016.html
+++ b/site/news/submit-talks-to-spark-summit-east-2016.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/submit-talks-to-spark-summit-eu-2016.html
--
diff --git a/site/news/submit-talks-to-spark-summit-eu-2016.html 
b/site/news/submit-talks-to-spark-summit-eu-2016.html
index 8e33a17..940bc6f 100644
--- a/site/news/submit-talks-to-spark-summit-eu-2016.html
+++ b/site/news/submit-talks-to-spark-summit-eu-2016.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/two-weeks-to-spark-summit-2014.html
--
diff --git a/site/news/two-weeks-to-spark-summit-2014.html 
b/site/news/two-weeks-to-spark-summit-2014.html
index 3863298..d4e993a 100644
--- a/site/news/two-weeks-to-spark-summit-2014.html
+++ b/site/news/two-weeks-to-spark-summit-2014.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/video-from-first-spark-development-meetup.html
--
diff --git a/site/news/video-from-first-spark-development-meetup.html 
b/site/news/video-from-first-spark-development-meetup.html
index 2be7f50..04151a8 100644
--- a/site/news/video-from-first-spark-development-meetup.html
+++ b/site/news/video-from-first-spark-development-meetup.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/powered-by.html
--
diff --git a/site/powered-by.html b/site/powered-by.html
index 3449782..b303df0 100644
--- a/site/powered-by.html
+++ b/site/powered-by.html
@@ -66,7 +66,7 @@
   
   
-  Lightning-fast cluster computing
+  Lightning-fast unified analytics engine
   
 
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/release-process.html
--
diff --git a/site/release-process.html b/site/release-process.html
index

[2/2] spark-website git commit: Update text/wording to more "modern" Spark and more consistent.

2018-04-12 Thread rxin

Update text/wording to more "modern" Spark and more consistent.

1. Use DataFrame examples.

2. Reduce explicit comparison with MapReduce, since the topic does not really 
come up.

3. More focus on analytics rather than "cluster compute".

4. Update committer affiliation.

5. Make it more clear Spark runs in diverse environments (especially on MLlib 
page).

There are a lot that needs to be done that I don't have time today, e.g. refer 
to Structured Streaming.


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/65846724
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/65846724
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/65846724

Branch: refs/heads/asf-site
Commit: 658467248b278b109bc3d2594b0ef08ff0c727cb
Parents: 91b5617
Author: Reynold Xin 
Authored: Thu Apr 12 12:56:05 2018 -0700
Committer: Reynold Xin 
Committed: Thu Apr 12 12:56:05 2018 -0700

--
 _layouts/global.html|   2 +-
 committers.md   |  22 +-
 index.md|  34 +--
 mllib/index.md  |  18 +-
 site/committers.html|  24 +-
 site/community.html |   2 +-
 site/contributing.html  |   2 +-
 site/developer-tools.html   |   2 +-
 site/documentation.html |   2 +-
 site/downloads.html |   2 +-
 site/examples.html  |   2 +-
 site/faq.html   |   2 +-
 site/history.html   |   2 +-
 site/improvement-proposals.html |   2 +-
 site/index.html |  36 +--
 site/mailing-lists.html |   4 +-
 site/mllib/index.html   |  18 +-
 site/news/amp-camp-2013-registration-ope.html   |   2 +-
 .../news/announcing-the-first-spark-summit.html |   2 +-
 .../news/fourth-spark-screencast-published.html |   2 +-
 site/news/index.html|   2 +-
 site/news/nsdi-paper.html   |   2 +-
 site/news/one-month-to-spark-summit-2015.html   |   2 +-
 .../proposals-open-for-spark-summit-east.html   |   2 +-
 ...registration-open-for-spark-summit-east.html |   2 +-
 .../news/run-spark-and-shark-on-amazon-emr.html |   2 +-
 site/news/spark-0-6-1-and-0-5-2-released.html   |   2 +-
 site/news/spark-0-6-2-released.html |   2 +-
 site/news/spark-0-7-0-released.html |   2 +-
 site/news/spark-0-7-2-released.html |   2 +-
 site/news/spark-0-7-3-released.html |   2 +-
 site/news/spark-0-8-0-released.html |   2 +-
 site/news/spark-0-8-1-released.html |   2 +-
 site/news/spark-0-9-0-released.html |   2 +-
 site/news/spark-0-9-1-released.html |   2 +-
 site/news/spark-0-9-2-released.html |   2 +-
 site/news/spark-1-0-0-released.html |   2 +-
 site/news/spark-1-0-1-released.html |   2 +-
 site/news/spark-1-0-2-released.html |   2 +-
 site/news/spark-1-1-0-released.html |   2 +-
 site/news/spark-1-1-1-released.html |   2 +-
 site/news/spark-1-2-0-released.html |   2 +-
 site/news/spark-1-2-1-released.html |   2 +-
 site/news/spark-1-2-2-released.html |   2 +-
 site/news/spark-1-3-0-released.html |   2 +-
 site/news/spark-1-4-0-released.html |   2 +-
 site/news/spark-1-4-1-released.html |   2 +-
 site/news/spark-1-5-0-released.html |   2 +-
 site/news/spark-1-5-1-released.html |   2 +-
 site/news/spark-1-5-2-released.html |   2 +-
 site/news/spark-1-6-0-released.html |   2 +-
 site/news/spark-1-6-1-released.html |   2 +-
 site/news/spark-1-6-2-released.html |   2 +-
 site/news/spark-1-6-3-released.html |   2 +-
 site/news/spark-2-0-0-released.html |   2 +-
 site/news/spark-2-0-1-released.html |   2 +-
 site/news/spark-2-0-2-released.html |   2 +-
 site/news/spark-2-1-0-released.html |   2 +-
 site/news/spark-2-1-1-released.html |   2 +-
 site/news/spark-2-1-2-released.html |   2 +-
 site/news/spark-2-2-0-released.html |   2 +-
 site/news/spark-2-2-1-released.html |   2 +-
 site/news/spark-2-3-0-released.html |   2 +-
 site/news/spark-2.0.0-preview.html  |   2 +-
 .../spark-accepted-into-apache-incubator.html   |   2 +-
 site/news/spark-and-shark-in-the-news.html  |   2 +-
 site/news/spark-becomes-tlp.html|   2 +-

[2/2] spark-website git commit: Squashed commit of the following:

2018-03-16 Thread rxin

Squashed commit of the following:

commit 8e2dd71cf5613be6f019bb76b46226771422a40e
Merge: 8bd24fb6d 01f0b4e0c
Author: Reynold Xin 
Date:   Fri Mar 16 10:24:54 2018 -0700

Merge pull request #104 from mateiz/history

Add a project history page

commit 01f0b4e0c1fe77781850cf994058980664201bce
Author: Matei Zaharia 
Date:   Wed Mar 14 23:29:01 2018 -0700

Add a project history page


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/a1d84bcb
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/a1d84bcb
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/a1d84bcb

Branch: refs/heads/asf-site
Commit: a1d84bcbf53099be51c39914528bea3f4e2735a0
Parents: 8bd24fb
Author: Reynold Xin 
Authored: Fri Mar 16 10:26:14 2018 -0700
Committer: Reynold Xin 
Committed: Fri Mar 16 10:26:14 2018 -0700

--
 _layouts/global.html|   1 +
 community.md|  24 +-
 history.md  |  29 +++
 index.md|  16 +-
 site/committers.html|   1 +
 site/community.html |  24 +-
 site/contributing.html  |   1 +
 site/developer-tools.html   |   1 +
 site/documentation.html |   1 +
 site/downloads.html |   1 +
 site/examples.html  |   1 +
 site/faq.html   |   1 +
 site/graphx/index.html  |   1 +
 site/history.html   | 235 +++
 site/improvement-proposals.html |   1 +
 site/index.html |  17 +-
 site/mailing-lists.html |   1 +
 site/mllib/index.html   |   1 +
 site/news/amp-camp-2013-registration-ope.html   |   1 +
 .../news/announcing-the-first-spark-summit.html |   1 +
 .../news/fourth-spark-screencast-published.html |   1 +
 site/news/index.html|   1 +
 site/news/nsdi-paper.html   |   1 +
 site/news/one-month-to-spark-summit-2015.html   |   1 +
 .../proposals-open-for-spark-summit-east.html   |   1 +
 ...registration-open-for-spark-summit-east.html |   1 +
 .../news/run-spark-and-shark-on-amazon-emr.html |   1 +
 site/news/spark-0-6-1-and-0-5-2-released.html   |   1 +
 site/news/spark-0-6-2-released.html |   1 +
 site/news/spark-0-7-0-released.html |   1 +
 site/news/spark-0-7-2-released.html |   1 +
 site/news/spark-0-7-3-released.html |   1 +
 site/news/spark-0-8-0-released.html |   1 +
 site/news/spark-0-8-1-released.html |   1 +
 site/news/spark-0-9-0-released.html |   1 +
 site/news/spark-0-9-1-released.html |   1 +
 site/news/spark-0-9-2-released.html |   1 +
 site/news/spark-1-0-0-released.html |   1 +
 site/news/spark-1-0-1-released.html |   1 +
 site/news/spark-1-0-2-released.html |   1 +
 site/news/spark-1-1-0-released.html |   1 +
 site/news/spark-1-1-1-released.html |   1 +
 site/news/spark-1-2-0-released.html |   1 +
 site/news/spark-1-2-1-released.html |   1 +
 site/news/spark-1-2-2-released.html |   1 +
 site/news/spark-1-3-0-released.html |   1 +
 site/news/spark-1-4-0-released.html |   1 +
 site/news/spark-1-4-1-released.html |   1 +
 site/news/spark-1-5-0-released.html |   1 +
 site/news/spark-1-5-1-released.html |   1 +
 site/news/spark-1-5-2-released.html |   1 +
 site/news/spark-1-6-0-released.html |   1 +
 site/news/spark-1-6-1-released.html |   1 +
 site/news/spark-1-6-2-released.html |   1 +
 site/news/spark-1-6-3-released.html |   1 +
 site/news/spark-2-0-0-released.html |   1 +
 site/news/spark-2-0-1-released.html |   1 +
 site/news/spark-2-0-2-released.html |   1 +
 site/news/spark-2-1-0-released.html |   1 +
 site/news/spark-2-1-1-released.html |   1 +
 site/news/spark-2-1-2-released.html |   1 +
 site/news/spark-2-2-0-released.html |   1 +
 site/news/spark-2-2-1-released.html |   1 +
 site/news/spark-2-3-0-released.html |   1 +
 site/news/spark-2.0.0-preview.html  |   1 +
 .../spark-accepted-into-apache-incubator.html   |   1 +
 site/news/spark-and-shark-in-the-news.html  |   1 +
 site/news/spark-becomes-tlp.html|   1 +

[1/2] spark-website git commit: Squashed commit of the following:

2018-03-16 Thread rxin

Repository: spark-website
Updated Branches:
  refs/heads/asf-site 8bd24fb6d -> a1d84bcbf


http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-summit-june-2016-agenda-posted.html
--
diff --git a/site/news/spark-summit-june-2016-agenda-posted.html 
b/site/news/spark-summit-june-2016-agenda-posted.html
index ce68829..7947354 100644
--- a/site/news/spark-summit-june-2016-agenda-posted.html
+++ b/site/news/spark-summit-june-2016-agenda-posted.html
@@ -123,6 +123,7 @@
   https://issues.apache.org/jira/browse/SPARK;>Issue 
Tracker
   Powered By
   Project Committers
+  Project History
 
   
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-summit-june-2017-agenda-posted.html
--
diff --git a/site/news/spark-summit-june-2017-agenda-posted.html 
b/site/news/spark-summit-june-2017-agenda-posted.html
index 5d2df4b..e4055c3 100644
--- a/site/news/spark-summit-june-2017-agenda-posted.html
+++ b/site/news/spark-summit-june-2017-agenda-posted.html
@@ -123,6 +123,7 @@
   https://issues.apache.org/jira/browse/SPARK;>Issue 
Tracker
   Powered By
   Project Committers
+  Project History
 
   
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-summit-june-2018-agenda-posted.html
--
diff --git a/site/news/spark-summit-june-2018-agenda-posted.html 
b/site/news/spark-summit-june-2018-agenda-posted.html
index 17c284f..9b2f739 100644
--- a/site/news/spark-summit-june-2018-agenda-posted.html
+++ b/site/news/spark-summit-june-2018-agenda-posted.html
@@ -123,6 +123,7 @@
   https://issues.apache.org/jira/browse/SPARK;>Issue 
Tracker
   Powered By
   Project Committers
+  Project History
 
   
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-tips-from-quantifind.html
--
diff --git a/site/news/spark-tips-from-quantifind.html 
b/site/news/spark-tips-from-quantifind.html
index bfbac1d..00c71c2 100644
--- a/site/news/spark-tips-from-quantifind.html
+++ b/site/news/spark-tips-from-quantifind.html
@@ -123,6 +123,7 @@
   https://issues.apache.org/jira/browse/SPARK;>Issue 
Tracker
   Powered By
   Project Committers
+  Project History
 
   
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-user-survey-and-powered-by-page.html
--
diff --git a/site/news/spark-user-survey-and-powered-by-page.html 
b/site/news/spark-user-survey-and-powered-by-page.html
index 67935a9..c015e5c 100644
--- a/site/news/spark-user-survey-and-powered-by-page.html
+++ b/site/news/spark-user-survey-and-powered-by-page.html
@@ -123,6 +123,7 @@
   https://issues.apache.org/jira/browse/SPARK;>Issue 
Tracker
   Powered By
   Project Committers
+  Project History
 
   
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-version-0-6-0-released.html
--
diff --git a/site/news/spark-version-0-6-0-released.html 
b/site/news/spark-version-0-6-0-released.html
index 3f670d7..d9120b0 100644
--- a/site/news/spark-version-0-6-0-released.html
+++ b/site/news/spark-version-0-6-0-released.html
@@ -123,6 +123,7 @@
   https://issues.apache.org/jira/browse/SPARK;>Issue 
Tracker
   Powered By
   Project Committers
+  Project History
 
   
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-wins-cloudsort-100tb-benchmark.html
--
diff --git a/site/news/spark-wins-cloudsort-100tb-benchmark.html 
b/site/news/spark-wins-cloudsort-100tb-benchmark.html
index b498034..8bef605 100644
--- a/site/news/spark-wins-cloudsort-100tb-benchmark.html
+++ b/site/news/spark-wins-cloudsort-100tb-benchmark.html
@@ -123,6 +123,7 @@
   https://issues.apache.org/jira/browse/SPARK;>Issue 
Tracker
   Powered By
   Project Committers
+  Project History
 
   
   

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-wins-daytona-gray-sort-100tb-benchmark.html
--
diff --git a/site/news/spark-wins-daytona-gray-sort-100tb-benchmark.html 
b/site/news/spark-wins-daytona-gray-sort-100tb-benchmark.html
index 18646f4..32f53e9 100644
---

spark git commit: [SPARK-22648][K8S] Spark on Kubernetes - Documentation

2017-12-21 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 7beb375bf -> 7ab165b70


[SPARK-22648][K8S] Spark on Kubernetes - Documentation

What changes were proposed in this pull request?

This PR contains documentation on the usage of Kubernetes scheduler in Spark 
2.3, and a shell script to make it easier to build docker images required to 
use the integration. The changes detailed here are covered by 
https://github.com/apache/spark/pull/19717 and 
https://github.com/apache/spark/pull/19468 which have merged already.

How was this patch tested?
The script has been in use for releases on our fork. Rest is documentation.

cc rxin mateiz (shepherd)
k8s-big-data SIG members & contributors: foxish ash211 mccheah liyinan926 
erikerlandson ssuchter varunkatta kimoonkim tnachen ifilonenko
reviewers: vanzin felixcheung jiangxb1987 mridulm

TODO:
- [x] Add dockerfiles directory to built distribution. 
(https://github.com/apache/spark/pull/20007)
- [x] Change references to docker to instead say "container" 
(https://github.com/apache/spark/pull/19995)
- [x] Update configuration table.
- [x] Modify spark.kubernetes.allocation.batch.delay to take time instead of 
int (#20032)

Author: foxish <ramanath...@google.com>

Closes #19946 from foxish/update-k8s-docs.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7ab165b7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7ab165b7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7ab165b7

Branch: refs/heads/master
Commit: 7ab165b7061d9acc26523227076056e94354d204
Parents: 7beb375
Author: foxish <ramanath...@google.com>
Authored: Thu Dec 21 17:21:11 2017 -0800
Committer: Reynold Xin <r...@databricks.com>
Committed: Thu Dec 21 17:21:11 2017 -0800

--
 docs/_layouts/global.html|   1 +
 docs/building-spark.md   |   6 +-
 docs/cluster-overview.md |   7 +-
 docs/configuration.md|   2 +
 docs/img/k8s-cluster-mode.png| Bin 0 -> 55538 bytes
 docs/index.md|   3 +-
 docs/running-on-kubernetes.md| 578 ++
 docs/running-on-yarn.md  |   4 +-
 docs/submitting-applications.md  |  16 +
 sbin/build-push-docker-images.sh |  68 
 10 files changed, 677 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7ab165b7/docs/_layouts/global.html
--
diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index 67b05ec..e5af5ae 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -99,6 +99,7 @@
 Spark 
Standalone
 Mesos
 YARN
+Kubernetes
 
 
 

http://git-wip-us.apache.org/repos/asf/spark/blob/7ab165b7/docs/building-spark.md
--
diff --git a/docs/building-spark.md b/docs/building-spark.md
index 98f7df1..c391255 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by the
 to be runnable, use `./dev/make-distribution.sh` in the project root 
directory. It can be configured
 with Maven profile settings and so on like the direct Maven build. Example:
 
-./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr 
-Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn
+./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr 
-Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
 
 This will build Spark distribution along with Python pip and R packages. For 
more information on usage, run `./dev/make-distribution.sh --help`
 
@@ -90,6 +90,10 @@ like ZooKeeper and Hadoop itself.
 ## Building with Mesos support
 
 ./build/mvn -Pmesos -DskipTests clean package
+
+## Building with Kubernetes support
+
+./build/mvn -Pkubernetes -DskipTests clean package
 
 ## Building with Kafka 0.8 support
 

http://git-wip-us.apache.org/repos/asf/spark/blob/7ab165b7/docs/cluster-overview.md
--
diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md
index c42bb4b..658e67f 100644
--- a/docs/cluster-overview.md
+++ b/docs/cluster-overview.md
@@ -52,11 +52,8 @@ The system currently supports three cluster managers:
 * [Apache Mesos](running-on-mesos.html) -- a general cluster manager that can 
also run Hadoop MapReduce
   and service applications.
 * [Hadoop YARN](running-on-yarn.html) -- the resource manager in Hadoop 2.
-* [Kubernetes (experimental)](https://github.com/apac

[1/2] spark git commit: [SPARK-18278][SCHEDULER] Spark on Kubernetes - Basic Scheduler Backend

2017-11-28 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 475a29f11 -> e9b2070ab


http://git-wip-us.apache.org/repos/asf/spark/blob/e9b2070a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala
 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala
new file mode 100644
index 000..3febb2f
--- /dev/null
+++ 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala
@@ -0,0 +1,440 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.util.concurrent.{ExecutorService, ScheduledExecutorService, 
TimeUnit}
+
+import io.fabric8.kubernetes.api.model.{DoneablePod, Pod, PodBuilder, PodList}
+import io.fabric8.kubernetes.client.{KubernetesClient, Watch, Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+import io.fabric8.kubernetes.client.dsl.{FilterWatchListDeletable, 
MixedOperation, NonNamespaceOperation, PodResource}
+import org.mockito.{AdditionalAnswers, ArgumentCaptor, Mock, 
MockitoAnnotations}
+import org.mockito.Matchers.{any, eq => mockitoEq}
+import org.mockito.Mockito.{doNothing, never, times, verify, when}
+import org.scalatest.BeforeAndAfter
+import org.scalatest.mockito.MockitoSugar._
+import scala.collection.JavaConverters._
+import scala.concurrent.Future
+
+import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite}
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.rpc._
+import org.apache.spark.scheduler.{ExecutorExited, LiveListenerBus, SlaveLost, 
TaskSchedulerImpl}
+import 
org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RegisterExecutor,
 RemoveExecutor}
+import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
+import org.apache.spark.util.ThreadUtils
+
+class KubernetesClusterSchedulerBackendSuite extends SparkFunSuite with 
BeforeAndAfter {
+
+  private val APP_ID = "test-spark-app"
+  private val DRIVER_POD_NAME = "spark-driver-pod"
+  private val NAMESPACE = "test-namespace"
+  private val SPARK_DRIVER_HOST = "localhost"
+  private val SPARK_DRIVER_PORT = 7077
+  private val POD_ALLOCATION_INTERVAL = 60L
+  private val DRIVER_URL = RpcEndpointAddress(
+SPARK_DRIVER_HOST, SPARK_DRIVER_PORT, 
CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
+  private val FIRST_EXECUTOR_POD = new PodBuilder()
+.withNewMetadata()
+  .withName("pod1")
+  .endMetadata()
+.withNewSpec()
+  .withNodeName("node1")
+  .endSpec()
+.withNewStatus()
+  .withHostIP("192.168.99.100")
+  .endStatus()
+.build()
+  private val SECOND_EXECUTOR_POD = new PodBuilder()
+.withNewMetadata()
+  .withName("pod2")
+  .endMetadata()
+.withNewSpec()
+  .withNodeName("node2")
+  .endSpec()
+.withNewStatus()
+  .withHostIP("192.168.99.101")
+  .endStatus()
+.build()
+
+  private type PODS = MixedOperation[Pod, PodList, DoneablePod, 
PodResource[Pod, DoneablePod]]
+  private type LABELED_PODS = FilterWatchListDeletable[
+Pod, PodList, java.lang.Boolean, Watch, Watcher[Pod]]
+  private type IN_NAMESPACE_PODS = NonNamespaceOperation[
+Pod, PodList, DoneablePod, PodResource[Pod, DoneablePod]]
+
+  @Mock
+  private var sparkContext: SparkContext = _
+
+  @Mock
+  private var listenerBus: LiveListenerBus = _
+
+  @Mock
+  private var taskSchedulerImpl: TaskSchedulerImpl = _
+
+  @Mock
+  private var allocatorExecutor: ScheduledExecutorService = _
+
+  @Mock
+  private var requestExecutorsService: ExecutorService = _
+
+  @Mock
+  private var executorPodFactory: ExecutorPodFactory = _
+
+  @Mock
+  private var kubernetesClient: KubernetesClient = _
+
+  @Mock
+  private var podOperations: PODS = _
+
+  @Mock
+  private var podsWithLabelOperations: LABELED_PODS = _
+
+

[2/2] spark git commit: [SPARK-18278][SCHEDULER] Spark on Kubernetes - Basic Scheduler Backend

2017-11-28 Thread rxin

[SPARK-18278][SCHEDULER] Spark on Kubernetes - Basic Scheduler Backend

## What changes were proposed in this pull request?

This is a stripped down version of the `KubernetesClusterSchedulerBackend` for 
Spark with the following components:
- Static Allocation of Executors
- Executor Pod Factory
- Executor Recovery Semantics

It's step 1 from the step-wise plan documented 
[here](https://github.com/apache-spark-on-k8s/spark/issues/441#issuecomment-330802935).
This addition is covered by the [SPIP 
vote](http://apache-spark-developers-list.1001551.n3.nabble.com/SPIP-Spark-on-Kubernetes-td22147.html)
 which passed on Aug 31 .

## How was this patch tested?

- The patch contains unit tests which are passing.
- Manual testing: `./build/mvn -Pkubernetes clean package` succeeded.
- It is a **subset** of the entire changelist hosted in 
http://github.com/apache-spark-on-k8s/spark which is in active use in several 
organizations.
- There is integration testing enabled in the fork currently [hosted by 
PepperData](spark-k8s-jenkins.pepperdata.org:8080) which is being moved over to 
RiseLAB CI.
- Detailed documentation on trying out the patch in its entirety is in: 
https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html

cc rxin felixcheung mateiz (shepherd)
k8s-big-data SIG members & contributors: mccheah ash211 ssuchter varunkatta 
kimoonkim erikerlandson liyinan926 tnachen ifilonenko

Author: Yinan Li <liyinan...@gmail.com>
Author: foxish <ramanath...@google.com>
Author: mcheah <mch...@palantir.com>

Closes #19468 from foxish/spark-kubernetes-3.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e9b2070a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e9b2070a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e9b2070a

Branch: refs/heads/master
Commit: e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d
Parents: 475a29f
Author: Yinan Li <liyinan...@gmail.com>
Authored: Tue Nov 28 23:02:09 2017 -0800
Committer: Reynold Xin <r...@databricks.com>
Committed: Tue Nov 28 23:02:09 2017 -0800

--
 .travis.yml |   2 +-
 NOTICE  |   6 +
 .../cluster/SchedulerBackendUtils.scala |  47 ++
 dev/sparktestsupport/modules.py |   8 +
 docs/configuration.md   |   4 +-
 pom.xml |   7 +
 project/SparkBuild.scala|   8 +-
 resource-managers/kubernetes/core/pom.xml   | 100 +
 .../org/apache/spark/deploy/k8s/Config.scala| 123 ++
 .../spark/deploy/k8s/ConfigurationUtils.scala   |  41 ++
 .../org/apache/spark/deploy/k8s/Constants.scala |  50 +++
 .../k8s/SparkKubernetesClientFactory.scala  | 102 +
 .../cluster/k8s/ExecutorPodFactory.scala| 219 +
 .../cluster/k8s/KubernetesClusterManager.scala  |  70 +++
 .../k8s/KubernetesClusterSchedulerBackend.scala | 442 +++
 .../core/src/test/resources/log4j.properties|  31 ++
 .../cluster/k8s/ExecutorPodFactorySuite.scala   | 135 ++
 ...KubernetesClusterSchedulerBackendSuite.scala | 440 ++
 .../spark/deploy/yarn/YarnAllocator.scala   |   3 +-
 .../spark/deploy/yarn/YarnSparkHadoopUtil.scala |  24 -
 .../cluster/YarnClientSchedulerBackend.scala|   2 +-
 .../cluster/YarnClusterSchedulerBackend.scala   |   2 +-
 22 files changed, 1832 insertions(+), 34 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e9b2070a/.travis.yml
--
diff --git a/.travis.yml b/.travis.yml
index d7e9f8c..05b94ade 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -43,7 +43,7 @@ notifications:
 # 5. Run maven install before running lint-java.
 install:
   - export MAVEN_SKIP_RC=1
-  - build/mvn -T 4 -q -DskipTests -Pmesos -Pyarn -Pkinesis-asl -Phive 
-Phive-thriftserver install
+  - build/mvn -T 4 -q -DskipTests -Pkubernetes -Pmesos -Pyarn -Pkinesis-asl 
-Phive -Phive-thriftserver install
 
 # 6. Run lint-java.
 script:

http://git-wip-us.apache.org/repos/asf/spark/blob/e9b2070a/NOTICE
--
diff --git a/NOTICE b/NOTICE
index f4b64b5..6ec240e 100644
--- a/NOTICE
+++ b/NOTICE
@@ -448,6 +448,12 @@ Copyright (C) 2011 Google Inc.
 Apache Commons Pool
 Copyright 1999-2009 The Apache Software Foundation
 
+This product includes/uses Kubernetes & OpenShift 3 Java Client 
(https://github.com/fabric8io/kubernetes-client)
+Copyright (C) 2015 Red Hat, Inc.
+
+This product includes/uses OkHttp (https://github.com/square/okhttp)
+Copyright (C) 2012 The Android Open Source Project
+
 =
 ==  NO

spark git commit: [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark

2017-11-02 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master b2463fad7 -> 41b60125b


[SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark

## What changes were proposed in this pull request?

This PR proposes to add a link from `spark.catalog(..)` to `Catalog` and expose 
Catalog APIs in PySpark as below:

https://user-images.githubusercontent.com/6477701/32135863-f8e9b040-bc40-11e7-92ad-09c8043a1295.png;>

https://user-images.githubusercontent.com/6477701/32135849-bb257b86-bc40-11e7-9eda-4d58fc1301c2.png;>

Note that this is not shown in the list on the top - 
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql

https://user-images.githubusercontent.com/6477701/32135854-d50fab16-bc40-11e7-9181-812c56fd22f5.png;>

This is basically similar with `DataFrameReader` and `DataFrameWriter`.

## How was this patch tested?

Manually built the doc.

Author: hyukjinkwon 

Closes #19596 from HyukjinKwon/SPARK-22369.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/41b60125
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/41b60125
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/41b60125

Branch: refs/heads/master
Commit: 41b60125b673bad0c133cd5c825d353ac2e6dfd6
Parents: b2463fa
Author: hyukjinkwon 
Authored: Thu Nov 2 15:22:52 2017 +0100
Committer: Reynold Xin 
Committed: Thu Nov 2 15:22:52 2017 +0100

--
 python/pyspark/sql/__init__.py | 3 ++-
 python/pyspark/sql/session.py  | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/41b60125/python/pyspark/sql/__init__.py
--
diff --git a/python/pyspark/sql/__init__.py b/python/pyspark/sql/__init__.py
index 22ec416..c3c06c8 100644
--- a/python/pyspark/sql/__init__.py
+++ b/python/pyspark/sql/__init__.py
@@ -46,6 +46,7 @@ from pyspark.sql.types import Row
 from pyspark.sql.context import SQLContext, HiveContext, UDFRegistration
 from pyspark.sql.session import SparkSession
 from pyspark.sql.column import Column
+from pyspark.sql.catalog import Catalog
 from pyspark.sql.dataframe import DataFrame, DataFrameNaFunctions, 
DataFrameStatFunctions
 from pyspark.sql.group import GroupedData
 from pyspark.sql.readwriter import DataFrameReader, DataFrameWriter
@@ -54,7 +55,7 @@ from pyspark.sql.window import Window, WindowSpec
 
 __all__ = [
 'SparkSession', 'SQLContext', 'HiveContext', 'UDFRegistration',
-'DataFrame', 'GroupedData', 'Column', 'Row',
+'DataFrame', 'GroupedData', 'Column', 'Catalog', 'Row',
 'DataFrameNaFunctions', 'DataFrameStatFunctions', 'Window', 'WindowSpec',
 'DataFrameReader', 'DataFrameWriter'
 ]

http://git-wip-us.apache.org/repos/asf/spark/blob/41b60125/python/pyspark/sql/session.py
--
diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 2cc0e2d..c3dc1a46 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -271,6 +271,8 @@ class SparkSession(object):
 def catalog(self):
 """Interface through which the user may create, drop, alter or query 
underlying
 databases, tables, functions etc.
+
+:return: :class:`Catalog`
 """
 if not hasattr(self, "_catalog"):
 self._catalog = Catalog(self)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-22408][SQL] RelationalGroupedDataset's distinct pivot value calculation launches unnecessary stages

2017-11-02 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 849b465bb -> 277b1924b


[SPARK-22408][SQL] RelationalGroupedDataset's distinct pivot value calculation 
launches unnecessary stages

## What changes were proposed in this pull request?

Adding a global limit on top of the distinct values before sorting and 
collecting will reduce the overall work in the case where we have more distinct 
values. We will also eagerly perform a collect rather than a take because we 
know we only have at most (maxValues + 1) rows.

## How was this patch tested?

Existing tests cover sorted order

Author: Patrick Woody 

Closes #19629 from pwoody/SPARK-22408.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/277b1924
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/277b1924
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/277b1924

Branch: refs/heads/master
Commit: 277b1924b46a70ab25414f5670eb784906dbbfdf
Parents: 849b465
Author: Patrick Woody 
Authored: Thu Nov 2 14:19:21 2017 +0100
Committer: Reynold Xin 
Committed: Thu Nov 2 14:19:21 2017 +0100

--
 .../scala/org/apache/spark/sql/RelationalGroupedDataset.scala| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/277b1924/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
index 21e94fa..3e4edd4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
@@ -321,10 +321,10 @@ class RelationalGroupedDataset protected[sql](
 // Get the distinct values of the column and sort them so its consistent
 val values = df.select(pivotColumn)
   .distinct()
+  .limit(maxValues + 1)
   .sort(pivotColumn)  // ensure that the output columns are in a 
consistent logical order
-  .rdd
+  .collect()
   .map(_.get(0))
-  .take(maxValues + 1)
   .toSeq
 
 if (values.length > maxValues) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR] Data source v2 docs update.

2017-11-01 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 1ffe03d9e -> d43e1f06b


[MINOR] Data source v2 docs update.

## What changes were proposed in this pull request?
This patch includes some doc updates for data source API v2. I was reading the 
code and noticed some minor issues.

## How was this patch tested?
This is a doc only change.

Author: Reynold Xin <r...@databricks.com>

Closes #19626 from rxin/dsv2-update.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d43e1f06
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d43e1f06
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d43e1f06

Branch: refs/heads/master
Commit: d43e1f06bd545d00bfcaf1efb388b469effd5d64
Parents: 1ffe03d
Author: Reynold Xin <r...@databricks.com>
Authored: Wed Nov 1 18:39:15 2017 +0100
Committer: Reynold Xin <r...@databricks.com>
Committed: Wed Nov 1 18:39:15 2017 +0100

--
 .../org/apache/spark/sql/sources/v2/DataSourceV2.java|  9 -
 .../org/apache/spark/sql/sources/v2/WriteSupport.java|  4 ++--
 .../spark/sql/sources/v2/reader/DataSourceV2Reader.java  | 10 +-
 .../v2/reader/SupportsPushDownCatalystFilters.java   |  2 --
 .../sql/sources/v2/reader/SupportsScanUnsafeRow.java |  2 --
 .../spark/sql/sources/v2/writer/DataSourceV2Writer.java  | 11 +++
 .../apache/spark/sql/sources/v2/writer/DataWriter.java   | 10 +-
 7 files changed, 19 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d43e1f06/sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java 
b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java
index dbcbe32..6234071 100644
--- a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java
+++ b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java
@@ -20,12 +20,11 @@ package org.apache.spark.sql.sources.v2;
 import org.apache.spark.annotation.InterfaceStability;
 
 /**
- * The base interface for data source v2. Implementations must have a public, 
no arguments
- * constructor.
+ * The base interface for data source v2. Implementations must have a public, 
0-arg constructor.
  *
- * Note that this is an empty interface, data source implementations should 
mix-in at least one of
- * the plug-in interfaces like {@link ReadSupport}. Otherwise it's just a 
dummy data source which is
- * un-readable/writable.
+ * Note that this is an empty interface. Data source implementations should 
mix-in at least one of
+ * the plug-in interfaces like {@link ReadSupport} and {@link WriteSupport}. 
Otherwise it's just
+ * a dummy data source which is un-readable/writable.
  */
 @InterfaceStability.Evolving
 public interface DataSourceV2 {}

http://git-wip-us.apache.org/repos/asf/spark/blob/d43e1f06/sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java 
b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java
index a8a9615..8fdfdfd 100644
--- a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java
+++ b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java
@@ -36,8 +36,8 @@ public interface WriteSupport {
* sources can return None if there is no writing needed to be done 
according to the save mode.
*
* @param jobId A unique string for the writing job. It's possible that 
there are many writing
-   *  jobs running at the same time, and the returned {@link 
DataSourceV2Writer} should
-   *  use this job id to distinguish itself with writers of other 
jobs.
+   *  jobs running at the same time, and the returned {@link 
DataSourceV2Writer} can
+   *  use this job id to distinguish itself from other jobs.
* @param schema the schema of the data to be written.
* @param mode the save mode which determines what to do when the data are 
already in this data
* source, please refer to {@link SaveMode} for more details.

http://git-wip-us.apache.org/repos/asf/spark/blob/d43e1f06/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceV2Reader.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceV2Reader.java
 
b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceV2Reader.java
index 5989a4a..88c3219 100644
--- 
a/sql/core/src/main/java/org/apache/spark/

spark git commit: [SPARK-22160][SQL] Make sample points per partition (in range partitioner) configurable and bump the default value up to 100

2017-09-28 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master d29d1e879 -> 323806e68


[SPARK-22160][SQL] Make sample points per partition (in range partitioner) 
configurable and bump the default value up to 100

## What changes were proposed in this pull request?
Spark's RangePartitioner hard codes the number of sampling points per partition 
to be 20. This is sometimes too low. This ticket makes it configurable, via 
spark.sql.execution.rangeExchange.sampleSizePerPartition, and raises the 
default in Spark SQL to be 100.

## How was this patch tested?
Added a pretty sophisticated test based on chi square test ...

Author: Reynold Xin <r...@databricks.com>

Closes #19387 from rxin/SPARK-22160.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/323806e6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/323806e6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/323806e6

Branch: refs/heads/master
Commit: 323806e68f91f3c7521327186a37ddd1436267d0
Parents: d29d1e8
Author: Reynold Xin <r...@databricks.com>
Authored: Thu Sep 28 21:07:12 2017 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Thu Sep 28 21:07:12 2017 -0700

--
 .../scala/org/apache/spark/Partitioner.scala| 15 -
 .../org/apache/spark/sql/internal/SQLConf.scala | 10 +++
 .../exchange/ShuffleExchangeExec.scala  |  7 ++-
 .../apache/spark/sql/ConfigBehaviorSuite.scala  | 66 
 4 files changed, 95 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/323806e6/core/src/main/scala/org/apache/spark/Partitioner.scala
--
diff --git a/core/src/main/scala/org/apache/spark/Partitioner.scala 
b/core/src/main/scala/org/apache/spark/Partitioner.scala
index 1484f29..debbd8d 100644
--- a/core/src/main/scala/org/apache/spark/Partitioner.scala
+++ b/core/src/main/scala/org/apache/spark/Partitioner.scala
@@ -108,11 +108,21 @@ class HashPartitioner(partitions: Int) extends 
Partitioner {
 class RangePartitioner[K : Ordering : ClassTag, V](
 partitions: Int,
 rdd: RDD[_ <: Product2[K, V]],
-private var ascending: Boolean = true)
+private var ascending: Boolean = true,
+val samplePointsPerPartitionHint: Int = 20)
   extends Partitioner {
 
+  // A constructor declared in order to maintain backward compatibility for 
Java, when we add the
+  // 4th constructor parameter samplePointsPerPartitionHint. See SPARK-22160.
+  // This is added to make sure from a bytecode point of view, there is still 
a 3-arg ctor.
+  def this(partitions: Int, rdd: RDD[_ <: Product2[K, V]], ascending: Boolean) 
= {
+this(partitions, rdd, ascending, samplePointsPerPartitionHint = 20)
+  }
+
   // We allow partitions = 0, which happens when sorting an empty RDD under 
the default settings.
   require(partitions >= 0, s"Number of partitions cannot be negative but found 
$partitions.")
+  require(samplePointsPerPartitionHint > 0,
+s"Sample points per partition must be greater than 0 but found 
$samplePointsPerPartitionHint")
 
   private var ordering = implicitly[Ordering[K]]
 
@@ -122,7 +132,8 @@ class RangePartitioner[K : Ordering : ClassTag, V](
   Array.empty
 } else {
   // This is the sample size we need to have roughly balanced output 
partitions, capped at 1M.
-  val sampleSize = math.min(20.0 * partitions, 1e6)
+  // Cast to double to avoid overflowing ints or longs
+  val sampleSize = math.min(samplePointsPerPartitionHint.toDouble * 
partitions, 1e6)
   // Assume the input partitions are roughly balanced and over-sample a 
little bit.
   val sampleSizePerPartition = math.ceil(3.0 * sampleSize / 
rdd.partitions.length).toInt
   val (numItems, sketched) = RangePartitioner.sketch(rdd.map(_._1), 
sampleSizePerPartition)

http://git-wip-us.apache.org/repos/asf/spark/blob/323806e6/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 358cf62..1a73d16 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -907,6 +907,14 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val RANGE_EXCHANGE_SAMPLE_SIZE_PER_PARTITION =
+buildConf("spark.sql.execution.rangeExchange.sampleSizePerPartition")
+  .internal()
+  .doc("Number of points to sample per partition in order to determine the 
range boundaries" +
+  &

spark git commit: [MINOR][TYPO] Fix typos: runnning and Excecutors

2017-08-18 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 7880909c4 -> a2db5c576


[MINOR][TYPO] Fix typos: runnning and Excecutors

## What changes were proposed in this pull request?

Fix typos

## How was this patch tested?

Existing tests

Author: Andrew Ash 

Closes #18996 from ash211/patch-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a2db5c57
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a2db5c57
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a2db5c57

Branch: refs/heads/master
Commit: a2db5c5761b0c72babe48b79859d3b208ee8e9f6
Parents: 7880909
Author: Andrew Ash 
Authored: Fri Aug 18 13:43:42 2017 -0700
Committer: Reynold Xin 
Committed: Fri Aug 18 13:43:42 2017 -0700

--
 .../main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a2db5c57/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
--
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
index f73e7dc..7052fb3 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
@@ -551,8 +551,8 @@ private[yarn] class YarnAllocator(
   updateInternalState()
 }
   } else {
-logInfo(("Skip launching executorRunnable as runnning Excecutors 
count: %d " +
-  "reached target Executors count: %d.").format(
+logInfo(("Skip launching executorRunnable as running executors count: 
%d " +
+  "reached target executors count: %d.").format(
   numExecutorsRunning.get, targetNumExecutors))
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog

2017-08-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 3ca55eaaf -> c90949698


[SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog

## What changes were proposed in this pull request?
This patch removes the unused SessionCatalog.getTableMetadataOption and 
ExternalCatalog. getTableOption.

## How was this patch tested?
Removed the test case.

Author: Reynold Xin <r...@databricks.com>

Closes #18912 from rxin/remove-getTableOption.

(cherry picked from commit 584c7f14370cdfafdc6cd554b2760b7ce7709368)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c9094969
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c9094969
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c9094969

Branch: refs/heads/branch-2.2
Commit: c909496983314b48dd4d8587e586b553b04ff0ce
Parents: 3ca55ea
Author: Reynold Xin <r...@databricks.com>
Authored: Thu Aug 10 18:56:25 2017 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Thu Aug 10 18:56:43 2017 -0700

--
 .../sql/catalyst/catalog/ExternalCatalog.scala |  2 --
 .../sql/catalyst/catalog/InMemoryCatalog.scala |  4 
 .../sql/catalyst/catalog/SessionCatalog.scala  | 17 +++--
 .../sql/catalyst/catalog/SessionCatalogSuite.scala | 11 ---
 .../spark/sql/hive/HiveExternalCatalog.scala   |  4 
 5 files changed, 3 insertions(+), 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c9094969/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
index 974ef90..18644b0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
@@ -162,8 +162,6 @@ abstract class ExternalCatalog
 
   def getTable(db: String, table: String): CatalogTable
 
-  def getTableOption(db: String, table: String): Option[CatalogTable]
-
   def tableExists(db: String, table: String): Boolean
 
   def listTables(db: String): Seq[String]

http://git-wip-us.apache.org/repos/asf/spark/blob/c9094969/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
index 864ee48..bf8542c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
@@ -315,10 +315,6 @@ class InMemoryCatalog(
 catalog(db).tables(table).table
   }
 
-  override def getTableOption(db: String, table: String): Option[CatalogTable] 
= synchronized {
-if (!tableExists(db, table)) None else 
Option(catalog(db).tables(table).table)
-  }
-
   override def tableExists(db: String, table: String): Boolean = synchronized {
 requireDbExists(db)
 catalog(db).tables.contains(table)

http://git-wip-us.apache.org/repos/asf/spark/blob/c9094969/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index 57006bf..8d9fb4c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -388,9 +388,10 @@ class SessionCatalog(
 
   /**
* Retrieve the metadata of an existing permanent table/view. If no database 
is specified,
-   * assume the table/view is in the current database. If the specified 
table/view is not found
-   * in the database then a [[NoSuchTableException]] is thrown.
+   * assume the table/view is in the current database.
*/
+  @throws[NoSuchDatabaseException]
+  @throws[NoSuchTableException]
   def getTableMetadata(name: TableIdentifier): CatalogTable = {
 val db = formatDatabaseName(name.database.getOrElse(getCurrentDatabase))
 val table = formatTableName(name.table)
@@ -400,18 +401,6 @@ class SessionCatalog(
   }
 
   /**
-   * Retrieve the metadata of an existing metastore

spark git commit: [SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog

2017-08-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master ca6955858 -> 584c7f143


[SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog

## What changes were proposed in this pull request?
This patch removes the unused SessionCatalog.getTableMetadataOption and 
ExternalCatalog. getTableOption.

## How was this patch tested?
Removed the test case.

Author: Reynold Xin <r...@databricks.com>

Closes #18912 from rxin/remove-getTableOption.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/584c7f14
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/584c7f14
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/584c7f14

Branch: refs/heads/master
Commit: 584c7f14370cdfafdc6cd554b2760b7ce7709368
Parents: ca69558
Author: Reynold Xin <r...@databricks.com>
Authored: Thu Aug 10 18:56:25 2017 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Thu Aug 10 18:56:25 2017 -0700

--
 .../sql/catalyst/catalog/ExternalCatalog.scala |  2 --
 .../sql/catalyst/catalog/InMemoryCatalog.scala |  4 
 .../sql/catalyst/catalog/SessionCatalog.scala  | 17 +++--
 .../sql/catalyst/catalog/SessionCatalogSuite.scala | 11 ---
 .../spark/sql/hive/HiveExternalCatalog.scala   |  4 
 5 files changed, 3 insertions(+), 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/584c7f14/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
index 68644f4..d4c58db 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
@@ -167,8 +167,6 @@ abstract class ExternalCatalog
 
   def getTable(db: String, table: String): CatalogTable
 
-  def getTableOption(db: String, table: String): Option[CatalogTable]
-
   def tableExists(db: String, table: String): Boolean
 
   def listTables(db: String): Seq[String]

http://git-wip-us.apache.org/repos/asf/spark/blob/584c7f14/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
index 37e9eea..98370c1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
@@ -326,10 +326,6 @@ class InMemoryCatalog(
 catalog(db).tables(table).table
   }
 
-  override def getTableOption(db: String, table: String): Option[CatalogTable] 
= synchronized {
-if (!tableExists(db, table)) None else 
Option(catalog(db).tables(table).table)
-  }
-
   override def tableExists(db: String, table: String): Boolean = synchronized {
 requireDbExists(db)
 catalog(db).tables.contains(table)

http://git-wip-us.apache.org/repos/asf/spark/blob/584c7f14/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index b44d2ee..e3237a8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -387,9 +387,10 @@ class SessionCatalog(
 
   /**
* Retrieve the metadata of an existing permanent table/view. If no database 
is specified,
-   * assume the table/view is in the current database. If the specified 
table/view is not found
-   * in the database then a [[NoSuchTableException]] is thrown.
+   * assume the table/view is in the current database.
*/
+  @throws[NoSuchDatabaseException]
+  @throws[NoSuchTableException]
   def getTableMetadata(name: TableIdentifier): CatalogTable = {
 val db = formatDatabaseName(name.database.getOrElse(getCurrentDatabase))
 val table = formatTableName(name.table)
@@ -399,18 +400,6 @@ class SessionCatalog(
   }
 
   /**
-   * Retrieve the metadata of an existing metastore table.
-   * If no database is specified, assume the table is in the current database.
-   * If the specified table is not found in the

spark git commit: [SPARK-21669] Internal API for collecting metrics/stats during FileFormatWriter jobs

2017-08-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 84454d7d3 -> 95ad960ca


[SPARK-21669] Internal API for collecting metrics/stats during FileFormatWriter 
jobs

## What changes were proposed in this pull request?

This patch introduces an internal interface for tracking metrics and/or 
statistics on data on the fly, as it is being written to disk during a 
`FileFormatWriter` job and partially reimplements SPARK-20703 in terms of it.

The interface basically consists of 3 traits:
- `WriteTaskStats`: just a tag for classes that represent statistics collected 
during a `WriteTask`
  The only constraint it adds is that the class should be `Serializable`, as 
instances of it will be collected on the driver from all executors at the end 
of the `WriteJob`.
- `WriteTaskStatsTracker`: a trait for classes that can actually compute 
statistics based on tuples that are processed by a given `WriteTask` and 
eventually produce a `WriteTaskStats` instance.
- `WriteJobStatsTracker`: a trait for classes that act as containers of 
`Serializable` state that's necessary for instantiating `WriteTaskStatsTracker` 
on executors and finally process the resulting collection of `WriteTaskStats`, 
once they're gathered back on the driver.

Potential future use of this interface is e.g. CBO stats maintenance during 
`INSERT INTO table ... ` operations.

## How was this patch tested?
Existing tests for SPARK-20703 exercise the new code: `hive/SQLMetricsSuite`, 
`sql/JavaDataFrameReaderWriterSuite`, etc.

Author: Adrian Ionescu 

Closes #18884 from adrian-ionescu/write-stats-tracker-api.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/95ad960c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/95ad960c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/95ad960c

Branch: refs/heads/master
Commit: 95ad960caf009d843ec700ee41cbccc2fa3a68a5
Parents: 84454d7
Author: Adrian Ionescu 
Authored: Thu Aug 10 12:37:10 2017 -0700
Committer: Reynold Xin 
Committed: Thu Aug 10 12:37:10 2017 -0700

--
 .../execution/command/DataWritingCommand.scala  |  34 +--
 .../datasources/BasicWriteStatsTracker.scala| 133 ++
 .../datasources/FileFormatWriter.scala  | 245 ++-
 .../InsertIntoHadoopFsRelationCommand.scala |  43 ++--
 .../datasources/WriteStatsTracker.scala | 121 +
 .../execution/streaming/FileStreamSink.scala|   2 +-
 .../hive/execution/InsertIntoHiveTable.scala|   4 +-
 7 files changed, 420 insertions(+), 162 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/95ad960c/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
index 700f7f8..4e1c5e4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
@@ -17,10 +17,13 @@
 
 package org.apache.spark.sql.execution.command
 
+import org.apache.hadoop.conf.Configuration
+
 import org.apache.spark.SparkContext
-import org.apache.spark.sql.execution.SQLExecution
-import org.apache.spark.sql.execution.datasources.ExecutedWriteSummary
+import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker
 import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
+import org.apache.spark.util.SerializableConfiguration
+
 
 /**
  * A special `RunnableCommand` which writes data out and updates metrics.
@@ -37,29 +40,8 @@ trait DataWritingCommand extends RunnableCommand {
 )
   }
 
-  /**
-   * Callback function that update metrics collected from the writing 
operation.
-   */
-  protected def updateWritingMetrics(writeSummaries: 
Seq[ExecutedWriteSummary]): Unit = {
-val sparkContext = SparkContext.getActive.get
-var numPartitions = 0
-var numFiles = 0
-var totalNumBytes: Long = 0L
-var totalNumOutput: Long = 0L
-
-writeSummaries.foreach { summary =>
-  numPartitions += summary.updatedPartitions.size
-  numFiles += summary.numOutputFile
-  totalNumBytes += summary.numOutputBytes
-  totalNumOutput += summary.numOutputRows
-}
-
-metrics("numFiles").add(numFiles)
-metrics("numOutputBytes").add(totalNumBytes)
-metrics("numOutputRows").add(totalNumOutput)
-metrics("numParts").add(numPartitions)
-
-val executionId = 
sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)
-SQLMetrics.postDriverMetricUpdates(sparkContext,

spark git commit: [SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator

2017-08-09 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 0fb73253f -> c06f3f5ac


[SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator

## What changes were proposed in this pull request?

This modification increases the timeout for `serveIterator` (which is not 
dynamically configurable). This fixes timeout issues in pyspark when using 
`collect` and similar functions, in cases where Python may take more than a 
couple seconds to connect.

See https://issues.apache.org/jira/browse/SPARK-21551

## How was this patch tested?

Ran the tests.

cc rxin

Author: peay <p...@protonmail.com>

Closes #18752 from peay/spark-21551.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c06f3f5a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c06f3f5a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c06f3f5a

Branch: refs/heads/master
Commit: c06f3f5ac500b02d38ca7ec5fcb33085e07f2f75
Parents: 0fb7325
Author: peay <p...@protonmail.com>
Authored: Wed Aug 9 14:03:18 2017 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Wed Aug 9 14:03:18 2017 -0700

--
 .../src/main/scala/org/apache/spark/api/python/PythonRDD.scala | 6 +++---
 python/pyspark/rdd.py  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c06f3f5a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
index 6a81752..3377101 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
@@ -683,7 +683,7 @@ private[spark] object PythonRDD extends Logging {
* Create a socket server and a background thread to serve the data in 
`items`,
*
* The socket server can only accept one connection, or close if no 
connection
-   * in 3 seconds.
+   * in 15 seconds.
*
* Once a connection comes in, it tries to serialize all the data in `items`
* and send them into this connection.
@@ -692,8 +692,8 @@ private[spark] object PythonRDD extends Logging {
*/
   def serveIterator[T](items: Iterator[T], threadName: String): Int = {
 val serverSocket = new ServerSocket(0, 1, 
InetAddress.getByName("localhost"))
-// Close the socket if no connection in 3 seconds
-serverSocket.setSoTimeout(3000)
+// Close the socket if no connection in 15 seconds
+serverSocket.setSoTimeout(15000)
 
 new Thread(threadName) {
   setDaemon(true)

http://git-wip-us.apache.org/repos/asf/spark/blob/c06f3f5a/python/pyspark/rdd.py
--
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 3325b65..ea993c5 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -127,7 +127,7 @@ def _load_from_socket(port, serializer):
 af, socktype, proto, canonname, sa = res
 sock = socket.socket(af, socktype, proto)
 try:
-sock.settimeout(3)
+sock.settimeout(15)
 sock.connect(sa)
 except socket.error:
 sock.close()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21485][SQL][DOCS] Spark SQL documentation generation for built-in functions

2017-07-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master cf29828d7 -> 60472dbfd


[SPARK-21485][SQL][DOCS] Spark SQL documentation generation for built-in 
functions

## What changes were proposed in this pull request?

This generates a documentation for Spark SQL built-in functions.

One drawback is, this requires a proper build to generate built-in function 
list.
Once it is built, it only takes few seconds by `sql/create-docs.sh`.

Please see https://spark-test.github.io/sparksqldoc/ that I hosted to show the 
output documentation.

There are few more works to be done in order to make the documentation pretty, 
for example, separating `Arguments:` and `Examples:` but I guess this should be 
done within `ExpressionDescription` and `ExpressionInfo` rather than manually 
parsing it. I will fix these in a follow up.

This requires `pip install mkdocs` to generate HTMLs from markdown files.

## How was this patch tested?

Manually tested:

```
cd docs
jekyll build
```
,

```
cd docs
jekyll serve
```

and

```
cd sql
create-docs.sh
```

Author: hyukjinkwon 

Closes #18702 from HyukjinKwon/SPARK-21485.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60472dbf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60472dbf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60472dbf

Branch: refs/heads/master
Commit: 60472dbfd97acfd6c4420a13f9b32bc9d84219f3
Parents: cf29828
Author: hyukjinkwon 
Authored: Wed Jul 26 09:38:51 2017 -0700
Committer: Reynold Xin 
Committed: Wed Jul 26 09:38:51 2017 -0700

--
 .gitignore  |  2 +
 docs/README.md  |  6 +-
 docs/_layouts/global.html   |  1 +
 docs/_plugins/copy_api_dirs.rb  | 27 ++
 docs/api.md |  1 +
 docs/index.md   |  1 +
 sql/README.md   |  2 +
 .../spark/sql/api/python/PythonSQLUtils.scala   |  7 ++
 sql/create-docs.sh  | 49 +++
 sql/gen-sql-markdown.py | 91 
 sql/mkdocs.yml  | 19 
 11 files changed, 203 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/60472dbf/.gitignore
--
diff --git a/.gitignore b/.gitignore
index cf9780d..903297d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -47,6 +47,8 @@ dev/pr-deps/
 dist/
 docs/_site
 docs/api
+sql/docs
+sql/site
 lib_managed/
 lint-r-report.log
 log/

http://git-wip-us.apache.org/repos/asf/spark/blob/60472dbf/docs/README.md
--
diff --git a/docs/README.md b/docs/README.md
index 90e10a1..0090dd0 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -68,6 +68,6 @@ jekyll plugin to run `build/sbt unidoc` before building the 
site so if you haven
 may take some time as it generates all of the scaladoc.  The jekyll plugin 
also generates the
 PySpark docs using [Sphinx](http://sphinx-doc.org/).
 
-NOTE: To skip the step of building and copying over the Scala, Python, R API 
docs, run `SKIP_API=1
-jekyll`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, and `SKIP_RDOC=1` 
can be used to skip a single
-step of the corresponding language.
+NOTE: To skip the step of building and copying over the Scala, Python, R and 
SQL API docs, run `SKIP_API=1
+jekyll`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and 
`SKIP_SQLDOC=1` can be used
+to skip a single step of the corresponding language.

http://git-wip-us.apache.org/repos/asf/spark/blob/60472dbf/docs/_layouts/global.html
--
diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index 570483c..67b05ec 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -86,6 +86,7 @@
 Java
 Python
 R
+SQL, Built-in 
Functions
 
 
 

http://git-wip-us.apache.org/repos/asf/spark/blob/60472dbf/docs/_plugins/copy_api_dirs.rb
--
diff --git a/docs/_plugins/copy_api_dirs.rb b/docs/_plugins/copy_api_dirs.rb
index 95e3ba3..00366f8 100644
--- a/docs/_plugins/copy_api_dirs.rb
+++ b/docs/_plugins/copy_api_dirs.rb
@@ -150,4 +150,31 @@ if not (ENV['SKIP_API'] == '1')
 cp("../R/pkg/DESCRIPTION", "api")
   end
 
+  if not (ENV['SKIP_SQLDOC'] == '1')
+# Build SQL API docs
+
+puts

spark git commit: [SPARK-21382] The note about Scala 2.10 in building-spark.md is wrong.

2017-07-12 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 2cbfc975b -> 24367f23f


[SPARK-21382] The note about Scala 2.10 in building-spark.md is wrong.

[https://issues.apache.org/jira/browse/SPARK-21382](https://issues.apache.org/jira/browse/SPARK-21382)
There should be "Note that support for Scala 2.10 is deprecated as of Spark 
2.1.0 and may be removed in Spark 2.3.0",right?

Author: liuzhaokun 

Closes #18606 from liu-zhaokun/new07120923.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/24367f23
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/24367f23
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/24367f23

Branch: refs/heads/master
Commit: 24367f23f77349a864da340573e39ab2168c5403
Parents: 2cbfc97
Author: liuzhaokun 
Authored: Tue Jul 11 23:02:20 2017 -0700
Committer: Reynold Xin 
Committed: Tue Jul 11 23:02:20 2017 -0700

--
 docs/building-spark.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/24367f23/docs/building-spark.md
--
diff --git a/docs/building-spark.md b/docs/building-spark.md
index 777635a..815843c 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -97,7 +97,7 @@ To produce a Spark package compiled with Scala 2.10, use the 
`-Dscala-2.10` prop
 ./dev/change-scala-version.sh 2.10
 ./build/mvn -Pyarn -Dscala-2.10 -DskipTests clean package
 
-Note that support for Scala 2.10 is deprecated as of Spark 2.1.0 and may be 
removed in Spark 2.2.0.
+Note that support for Scala 2.10 is deprecated as of Spark 2.1.0 and may be 
removed in Spark 2.3.0.
 
 ## Building submodules individually
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21358][EXAMPLES] Argument of repartitionandsortwithinpartitions at pyspark

2017-07-10 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master d03aebbe6 -> c3713fde8


[SPARK-21358][EXAMPLES] Argument of repartitionandsortwithinpartitions at 
pyspark

## What changes were proposed in this pull request?
At example of repartitionAndSortWithinPartitions at rdd.py, third argument 
should be True or False.
I proposed fix of example code.

## How was this patch tested?
* I rename test_repartitionAndSortWithinPartitions to 
test_repartitionAndSortWIthinPartitions_asc to specify boolean argument.
* I added test_repartitionAndSortWithinPartitions_desc to test False pattern at 
third argument.

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: chie8842 

Closes #18586 from chie8842/SPARK-21358.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c3713fde
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c3713fde
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c3713fde

Branch: refs/heads/master
Commit: c3713fde86204bf3f027483914ff9e60e7aad261
Parents: d03aebb
Author: chie8842 
Authored: Mon Jul 10 18:56:54 2017 -0700
Committer: Reynold Xin 
Committed: Mon Jul 10 18:56:54 2017 -0700

--
 python/pyspark/rdd.py   |  2 +-
 python/pyspark/tests.py | 12 ++--
 2 files changed, 11 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c3713fde/python/pyspark/rdd.py
--
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 7dfa17f..3325b65 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -608,7 +608,7 @@ class RDD(object):
 sort records by their keys.
 
 >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 
3)])
->>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 
2)
+>>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 
True)
 >>> rdd2.glom().collect()
 [[(0, 5), (0, 8), (2, 6)], [(1, 3), (3, 8), (3, 8)]]
 """

http://git-wip-us.apache.org/repos/asf/spark/blob/c3713fde/python/pyspark/tests.py
--
diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index bb13de5..73ab442 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -1019,14 +1019,22 @@ class RDDTests(ReusedPySparkTestCase):
 self.assertEqual((["ab", "ef"], [5]), rdd.histogram(1))
 self.assertRaises(TypeError, lambda: rdd.histogram(2))
 
-def test_repartitionAndSortWithinPartitions(self):
+def test_repartitionAndSortWithinPartitions_asc(self):
 rdd = self.sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 
3)], 2)
 
-repartitioned = rdd.repartitionAndSortWithinPartitions(2, lambda key: 
key % 2)
+repartitioned = rdd.repartitionAndSortWithinPartitions(2, lambda key: 
key % 2, True)
 partitions = repartitioned.glom().collect()
 self.assertEqual(partitions[0], [(0, 5), (0, 8), (2, 6)])
 self.assertEqual(partitions[1], [(1, 3), (3, 8), (3, 8)])
 
+def test_repartitionAndSortWithinPartitions_desc(self):
+rdd = self.sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 
3)], 2)
+
+repartitioned = rdd.repartitionAndSortWithinPartitions(2, lambda key: 
key % 2, False)
+partitions = repartitioned.glom().collect()
+self.assertEqual(partitions[0], [(2, 6), (0, 5), (0, 8)])
+self.assertEqual(partitions[1], [(3, 8), (3, 8), (1, 3)])
+
 def test_repartition_no_skewed(self):
 num_partitions = 20
 a = self.sc.parallelize(range(int(1000)), 2)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21323][SQL] Rename plans.logical.statsEstimation.Range to ValueInterval

2017-07-06 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 48e44b24a -> bf66335ac


[SPARK-21323][SQL] Rename plans.logical.statsEstimation.Range to ValueInterval

## What changes were proposed in this pull request?

Rename org.apache.spark.sql.catalyst.plans.logical.statsEstimation.Range to 
ValueInterval.
The current naming is identical to logical operator "range".
Refactoring it to ValueInterval is more accurate.

## How was this patch tested?

unit test

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: Wang Gengliang 

Closes #18549 from gengliangwang/ValueInterval.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bf66335a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bf66335a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bf66335a

Branch: refs/heads/master
Commit: bf66335acab3c0c188f6c378eb8aa6948a259cb2
Parents: 48e44b2
Author: Wang Gengliang 
Authored: Thu Jul 6 13:58:27 2017 -0700
Committer: Reynold Xin 
Committed: Thu Jul 6 13:58:27 2017 -0700

--
 .../statsEstimation/FilterEstimation.scala  | 36 
 .../statsEstimation/JoinEstimation.scala| 14 +--
 .../plans/logical/statsEstimation/Range.scala   | 88 ---
 .../logical/statsEstimation/ValueInterval.scala | 91 
 4 files changed, 117 insertions(+), 112 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bf66335a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
index 5a3bee7..e13db85 100755
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
@@ -316,8 +316,8 @@ case class FilterEstimation(plan: Filter) extends Logging {
 // decide if the value is in [min, max] of the column.
 // We currently don't store min/max for binary/string type.
 // Hence, we assume it is in boundary for binary/string type.
-val statsRange = Range(colStat.min, colStat.max, attr.dataType)
-if (statsRange.contains(literal)) {
+val statsInterval = ValueInterval(colStat.min, colStat.max, attr.dataType)
+if (statsInterval.contains(literal)) {
   if (update) {
 // We update ColumnStat structure after apply this equality predicate:
 // Set distinctCount to 1, nullCount to 0, and min/max values (if 
exist) to the literal
@@ -388,9 +388,10 @@ case class FilterEstimation(plan: Filter) extends Logging {
 // use [min, max] to filter the original hSet
 dataType match {
   case _: NumericType | BooleanType | DateType | TimestampType =>
-val statsRange = Range(colStat.min, colStat.max, 
dataType).asInstanceOf[NumericRange]
+val statsInterval =
+  ValueInterval(colStat.min, colStat.max, 
dataType).asInstanceOf[NumericValueInterval]
 val validQuerySet = hSet.filter { v =>
-  v != null && statsRange.contains(Literal(v, dataType))
+  v != null && statsInterval.contains(Literal(v, dataType))
 }
 
 if (validQuerySet.isEmpty) {
@@ -440,12 +441,13 @@ case class FilterEstimation(plan: Filter) extends Logging 
{
   update: Boolean): Option[BigDecimal] = {
 
 val colStat = colStatsMap(attr)
-val statsRange = Range(colStat.min, colStat.max, 
attr.dataType).asInstanceOf[NumericRange]
-val max = statsRange.max.toBigDecimal
-val min = statsRange.min.toBigDecimal
+val statsInterval =
+  ValueInterval(colStat.min, colStat.max, 
attr.dataType).asInstanceOf[NumericValueInterval]
+val max = statsInterval.max.toBigDecimal
+val min = statsInterval.min.toBigDecimal
 val ndv = BigDecimal(colStat.distinctCount)
 
-// determine the overlapping degree between predicate range and column's 
range
+// determine the overlapping degree between predicate interval and 
column's interval
 val numericLiteral = if (literal.dataType == BooleanType) {
   if (literal.value.asInstanceOf[Boolean]) BigDecimal(1) else BigDecimal(0)
 } else {
@@ -566,18 +568,18 @@ case class FilterEstimation(plan: Filter) extends Logging 
{
 }
 
 val colStatLeft = colStatsMap(attrLeft)
-val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, 
attrLeft.dataType)
-  .asInstanceOf[NumericRange]
-val maxLeft =

spark git commit: [SPARK-21103][SQL] QueryPlanConstraints should be part of LogicalPlan

2017-06-20 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master e862dc904 -> b6b108826


[SPARK-21103][SQL] QueryPlanConstraints should be part of LogicalPlan

## What changes were proposed in this pull request?
QueryPlanConstraints should be part of LogicalPlan, rather than QueryPlan, 
since the constraint framework is only used for query plan rewriting and not 
for physical planning.

## How was this patch tested?
Should be covered by existing tests, since it is a simple refactoring.

Author: Reynold Xin <r...@databricks.com>

Closes #18310 from rxin/SPARK-21103.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b6b10882
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b6b10882
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b6b10882

Branch: refs/heads/master
Commit: b6b108826a5dd5c889a70180365f9320452557fc
Parents: e862dc9
Author: Reynold Xin <r...@databricks.com>
Authored: Tue Jun 20 11:34:22 2017 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Tue Jun 20 11:34:22 2017 -0700

--
 .../spark/sql/catalyst/plans/QueryPlan.scala|   5 +-
 .../catalyst/plans/QueryPlanConstraints.scala   | 195 --
 .../catalyst/plans/logical/LogicalPlan.scala|   2 +-
 .../plans/logical/QueryPlanConstraints.scala| 196 +++
 4 files changed, 198 insertions(+), 200 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b6b10882/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
index 9130b14..1f6d05b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
@@ -22,10 +22,7 @@ import org.apache.spark.sql.catalyst.trees.TreeNode
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.{DataType, StructType}
 
-abstract class QueryPlan[PlanType <: QueryPlan[PlanType]]
-  extends TreeNode[PlanType]
-  with QueryPlanConstraints[PlanType] {
-
+abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends 
TreeNode[PlanType] {
   self: PlanType =>
 
   def conf: SQLConf = SQLConf.get

http://git-wip-us.apache.org/repos/asf/spark/blob/b6b10882/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala
deleted file mode 100644
index b08a009..000
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala
+++ /dev/null
@@ -1,195 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.plans
-
-import org.apache.spark.sql.catalyst.expressions._
-
-
-trait QueryPlanConstraints[PlanType <: QueryPlan[PlanType]] { self: 
QueryPlan[PlanType] =>
-
-  /**
-   * An [[ExpressionSet]] that contains invariants about the rows output by 
this operator. For
-   * example, if this set contains the expression `a = 2` then that expression 
is guaranteed to
-   * evaluate to `true` for all rows produced.
-   */
-  lazy val constraints: ExpressionSet = {
-if (conf.constraintPropagationEnabled) {
-  ExpressionSet(
-validConstraints
-  .union(inferAdditionalConstraints(validConstraints))
-  .union(constructIsNotNullConstraints(validConstraints))
-  .filter { c =>
-c.references.nonEmpty && c.references.subsetOf(outputSet) && 
c.deterministic
-  }
-  )
-} else {
-  ExpressionSet(Set.e

spark git commit: [SPARK-21092][SQL] Wire SQLConf in logical plan and expressions

2017-06-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 292467440 -> fffeb6d7c


[SPARK-21092][SQL] Wire SQLConf in logical plan and expressions

## What changes were proposed in this pull request?
It is really painful to not have configs in logical plan and expressions. We 
had to add all sorts of hacks (e.g. pass SQLConf explicitly in functions). This 
patch exposes SQLConf in logical plan, using a thread local variable and a 
getter closure that's set once there is an active SparkSession.

The implementation is a bit of a hack, since we didn't anticipate this need in 
the beginning (config was only exposed in physical plan). The implementation is 
described in `SQLConf.get`.

In terms of future work, we should follow up to clean up CBO (remove the need 
for passing in config).

## How was this patch tested?
Updated relevant tests for constraint propagation.

Author: Reynold Xin <r...@databricks.com>

Closes #18299 from rxin/SPARK-21092.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fffeb6d7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fffeb6d7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fffeb6d7

Branch: refs/heads/master
Commit: fffeb6d7c37ee673a32584f3b2fd3afe86af793a
Parents: 2924674
Author: Reynold Xin <r...@databricks.com>
Authored: Wed Jun 14 22:11:41 2017 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Wed Jun 14 22:11:41 2017 -0700

--
 .../sql/catalyst/optimizer/Optimizer.scala  | 25 ++--
 .../spark/sql/catalyst/optimizer/joins.scala|  5 +--
 .../spark/sql/catalyst/plans/QueryPlan.scala|  3 ++
 .../catalyst/plans/QueryPlanConstraints.scala   | 33 +--
 .../org/apache/spark/sql/internal/SQLConf.scala | 42 
 .../BinaryComparisonSimplificationSuite.scala   |  2 +-
 .../optimizer/BooleanSimplificationSuite.scala  |  2 +-
 .../InferFiltersFromConstraintsSuite.scala  | 24 +--
 .../optimizer/OuterJoinEliminationSuite.scala   | 37 -
 .../optimizer/PropagateEmptyRelationSuite.scala |  4 +-
 .../catalyst/optimizer/PruneFiltersSuite.scala  | 36 +++--
 .../catalyst/optimizer/SetOperationSuite.scala  |  2 +-
 .../plans/ConstraintPropagationSuite.scala  | 29 +-
 .../org/apache/spark/sql/SparkSession.scala |  5 +++
 14 files changed, 141 insertions(+), 108 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fffeb6d7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index d16689a..3ab70fb 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -77,12 +77,12 @@ abstract class Optimizer(sessionCatalog: SessionCatalog, 
conf: SQLConf)
   // Operator push down
   PushProjectionThroughUnion,
   ReorderJoin(conf),
-  EliminateOuterJoin(conf),
+  EliminateOuterJoin,
   PushPredicateThroughJoin,
   PushDownPredicate,
   LimitPushDown(conf),
   ColumnPruning,
-  InferFiltersFromConstraints(conf),
+  InferFiltersFromConstraints,
   // Operator combine
   CollapseRepartition,
   CollapseProject,
@@ -102,7 +102,7 @@ abstract class Optimizer(sessionCatalog: SessionCatalog, 
conf: SQLConf)
   SimplifyConditionals,
   RemoveDispensableExpressions,
   SimplifyBinaryComparison,
-  PruneFilters(conf),
+  PruneFilters,
   EliminateSorts,
   SimplifyCasts,
   SimplifyCaseConversionExpressions,
@@ -619,14 +619,15 @@ object CollapseWindow extends Rule[LogicalPlan] {
  * Note: While this optimization is applicable to all types of join, it 
primarily benefits Inner and
  * LeftSemi joins.
  */
-case class InferFiltersFromConstraints(conf: SQLConf)
-extends Rule[LogicalPlan] with PredicateHelper {
-  def apply(plan: LogicalPlan): LogicalPlan = if 
(conf.constraintPropagationEnabled) {
-inferFilters(plan)
-  } else {
-plan
-  }
+object InferFiltersFromConstraints extends Rule[LogicalPlan] with 
PredicateHelper {
 
+  def apply(plan: LogicalPlan): LogicalPlan = {
+if (SQLConf.get.constraintPropagationEnabled) {
+  inferFilters(plan)
+} else {
+  plan
+}
+  }
 
   private def inferFilters(plan: LogicalPlan): LogicalPlan = plan transform {
 case filter @ Filter(condition, child) =>
@@ -717,7 +718,7 @@ object EliminateSorts extends Rule[LogicalPlan] {
  * 2) by substituting a dummy empty relation whe

spark git commit: [SPARK-21091][SQL] Move constraint code into QueryPlanConstraints

2017-06-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 77a2fc5b5 -> e254e868f


[SPARK-21091][SQL] Move constraint code into QueryPlanConstraints

## What changes were proposed in this pull request?
This patch moves constraint related code into a separate trait 
QueryPlanConstraints, so we don't litter QueryPlan with a lot of constraint 
private functions.

## How was this patch tested?
This is a simple move refactoring and should be covered by existing tests.

Author: Reynold Xin <r...@databricks.com>

Closes #18298 from rxin/SPARK-21091.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e254e868
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e254e868
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e254e868

Branch: refs/heads/master
Commit: e254e868f1e640b59d8d3e0e01a5e0c488dd6e70
Parents: 77a2fc5
Author: Reynold Xin <r...@databricks.com>
Authored: Wed Jun 14 14:28:21 2017 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Wed Jun 14 14:28:21 2017 -0700

--
 .../spark/sql/catalyst/plans/QueryPlan.scala| 187 +
 .../catalyst/plans/QueryPlanConstraints.scala   | 206 +++
 2 files changed, 210 insertions(+), 183 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e254e868/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
index 5ba043e..8bc462e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
@@ -21,194 +21,15 @@ import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.trees.TreeNode
 import org.apache.spark.sql.types.{DataType, StructType}
 
-abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends 
TreeNode[PlanType] {
+abstract class QueryPlan[PlanType <: QueryPlan[PlanType]]
+  extends TreeNode[PlanType]
+  with QueryPlanConstraints[PlanType] {
+
   self: PlanType =>
 
   def output: Seq[Attribute]
 
   /**
-   * Extracts the relevant constraints from a given set of constraints based 
on the attributes that
-   * appear in the [[outputSet]].
-   */
-  protected def getRelevantConstraints(constraints: Set[Expression]): 
Set[Expression] = {
-constraints
-  .union(inferAdditionalConstraints(constraints))
-  .union(constructIsNotNullConstraints(constraints))
-  .filter(constraint =>
-constraint.references.nonEmpty && 
constraint.references.subsetOf(outputSet) &&
-  constraint.deterministic)
-  }
-
-  /**
-   * Infers a set of `isNotNull` constraints from null intolerant expressions 
as well as
-   * non-nullable attributes. For e.g., if an expression is of the form (`a > 
5`), this
-   * returns a constraint of the form `isNotNull(a)`
-   */
-  private def constructIsNotNullConstraints(constraints: Set[Expression]): 
Set[Expression] = {
-// First, we propagate constraints from the null intolerant expressions.
-var isNotNullConstraints: Set[Expression] = 
constraints.flatMap(inferIsNotNullConstraints)
-
-// Second, we infer additional constraints from non-nullable attributes 
that are part of the
-// operator's output
-val nonNullableAttributes = output.filterNot(_.nullable)
-isNotNullConstraints ++= nonNullableAttributes.map(IsNotNull).toSet
-
-isNotNullConstraints -- constraints
-  }
-
-  /**
-   * Infer the Attribute-specific IsNotNull constraints from the null 
intolerant child expressions
-   * of constraints.
-   */
-  private def inferIsNotNullConstraints(constraint: Expression): 
Seq[Expression] =
-constraint match {
-  // When the root is IsNotNull, we can push IsNotNull through the child 
null intolerant
-  // expressions
-  case IsNotNull(expr) => 
scanNullIntolerantAttribute(expr).map(IsNotNull(_))
-  // Constraints always return true for all the inputs. That means, null 
will never be returned.
-  // Thus, we can infer `IsNotNull(constraint)`, and also push IsNotNull 
through the child
-  // null intolerant expressions.
-  case _ => scanNullIntolerantAttribute(constraint).map(IsNotNull(_))
-}
-
-  /**
-   * Recursively explores the expressions which are null intolerant and 
returns all attributes
-   * in these expressions.
-   */
-  private def scanNullIntolerantAttribute(expr: Expression): Seq[Attribute] = 
expr match {
-case a: Attribute => Seq(a)
-case _: NullIntolerant => 
expr.children.flatMap

spark git commit: [SPARK-21042][SQL] Document Dataset.union is resolution by position

2017-06-09 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 869af5bcb -> 815a0820b


[SPARK-21042][SQL] Document Dataset.union is resolution by position

## What changes were proposed in this pull request?
Document Dataset.union is resolution by position, not by name, since this has 
been a confusing point for a lot of users.

## How was this patch tested?
N/A - doc only change.

Author: Reynold Xin <r...@databricks.com>

Closes #18256 from rxin/SPARK-21042.

(cherry picked from commit b78e3849b20d0d09b7146efd7ce8f203ef67b890)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/815a0820
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/815a0820
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/815a0820

Branch: refs/heads/branch-2.2
Commit: 815a0820b1808118ae198a44f4aa0f0f2b6511e6
Parents: 869af5b
Author: Reynold Xin <r...@databricks.com>
Authored: Fri Jun 9 18:29:33 2017 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Fri Jun 9 18:29:39 2017 -0700

--
 R/pkg/R/DataFrame.R   |  1 +
 python/pyspark/sql/dataframe.py   | 13 +
 .../src/main/scala/org/apache/spark/sql/Dataset.scala | 14 --
 3 files changed, 18 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/815a0820/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index a7b1e3b..b606f1f 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -2642,6 +2642,7 @@ generateAliasesForIntersectedCols <- function (x, 
intersectedColNames, suffix) {
 #' Input SparkDataFrames can have different schemas (names and data types).
 #'
 #' Note: This does not remove duplicate rows across the two SparkDataFrames.
+#' Also as standard in SQL, this function resolves columns by position (not by 
name).
 #'
 #' @param x A SparkDataFrame
 #' @param y A SparkDataFrame

http://git-wip-us.apache.org/repos/asf/spark/blob/815a0820/python/pyspark/sql/dataframe.py
--
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index b1eb80e..d1b336d 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -1166,18 +1166,23 @@ class DataFrame(object):
 
 @since(2.0)
 def union(self, other):
-""" Return a new :class:`DataFrame` containing union of rows in this
-frame and another frame.
+""" Return a new :class:`DataFrame` containing union of rows in this 
and another frame.
 
 This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
 (that does deduplication of elements), use this function followed by a 
distinct.
+
+Also as standard in SQL, this function resolves columns by position 
(not by name).
 """
 return DataFrame(self._jdf.union(other._jdf), self.sql_ctx)
 
 @since(1.3)
 def unionAll(self, other):
-""" Return a new :class:`DataFrame` containing union of rows in this
-frame and another frame.
+""" Return a new :class:`DataFrame` containing union of rows in this 
and another frame.
+
+This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
+(that does deduplication of elements), use this function followed by a 
distinct.
+
+Also as standard in SQL, this function resolves columns by position 
(not by name).
 
 .. note:: Deprecated in 2.0, use union instead.
 """

http://git-wip-us.apache.org/repos/asf/spark/blob/815a0820/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index f37d433..3658890 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1630,10 +1630,11 @@ class Dataset[T] private[sql](
 
   /**
* Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
-   * This is equivalent to `UNION ALL` in SQL.
*
-   * To do a SQL-style set union (that does deduplication of elements), use 
this function followed
-   * by a [[distinct]].
+   * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union 
(that does
+   * deduplication of elements), use this function followed by a [[distinct]].
+   *
+   * Also as standard in SQL, this function resolves co

spark git commit: [SPARK-21042][SQL] Document Dataset.union is resolution by position

2017-06-09 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 571635488 -> b78e3849b


[SPARK-21042][SQL] Document Dataset.union is resolution by position

## What changes were proposed in this pull request?
Document Dataset.union is resolution by position, not by name, since this has 
been a confusing point for a lot of users.

## How was this patch tested?
N/A - doc only change.

Author: Reynold Xin <r...@databricks.com>

Closes #18256 from rxin/SPARK-21042.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b78e3849
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b78e3849
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b78e3849

Branch: refs/heads/master
Commit: b78e3849b20d0d09b7146efd7ce8f203ef67b890
Parents: 5716354
Author: Reynold Xin <r...@databricks.com>
Authored: Fri Jun 9 18:29:33 2017 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Fri Jun 9 18:29:33 2017 -0700

--
 R/pkg/R/DataFrame.R   |  1 +
 python/pyspark/sql/dataframe.py   | 13 +
 .../src/main/scala/org/apache/spark/sql/Dataset.scala | 14 --
 3 files changed, 18 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 166b398..3b9d42d 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -2646,6 +2646,7 @@ generateAliasesForIntersectedCols <- function (x, 
intersectedColNames, suffix) {
 #' Input SparkDataFrames can have different schemas (names and data types).
 #'
 #' Note: This does not remove duplicate rows across the two SparkDataFrames.
+#' Also as standard in SQL, this function resolves columns by position (not by 
name).
 #'
 #' @param x A SparkDataFrame
 #' @param y A SparkDataFrame

http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/python/pyspark/sql/dataframe.py
--
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 99abfcc..8541403 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -1175,18 +1175,23 @@ class DataFrame(object):
 
 @since(2.0)
 def union(self, other):
-""" Return a new :class:`DataFrame` containing union of rows in this
-frame and another frame.
+""" Return a new :class:`DataFrame` containing union of rows in this 
and another frame.
 
 This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
 (that does deduplication of elements), use this function followed by a 
distinct.
+
+Also as standard in SQL, this function resolves columns by position 
(not by name).
 """
 return DataFrame(self._jdf.union(other._jdf), self.sql_ctx)
 
 @since(1.3)
 def unionAll(self, other):
-""" Return a new :class:`DataFrame` containing union of rows in this
-frame and another frame.
+""" Return a new :class:`DataFrame` containing union of rows in this 
and another frame.
+
+This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union
+(that does deduplication of elements), use this function followed by a 
distinct.
+
+Also as standard in SQL, this function resolves columns by position 
(not by name).
 
 .. note:: Deprecated in 2.0, use union instead.
 """

http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index f7637e0..d28ff78 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1734,10 +1734,11 @@ class Dataset[T] private[sql](
 
   /**
* Returns a new Dataset containing union of rows in this Dataset and 
another Dataset.
-   * This is equivalent to `UNION ALL` in SQL.
*
-   * To do a SQL-style set union (that does deduplication of elements), use 
this function followed
-   * by a [[distinct]].
+   * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union 
(that does
+   * deduplication of elements), use this function followed by a [[distinct]].
+   *
+   * Also as standard in SQL, this function resolves columns by position (not 
by name).
*
* @group typedrel
* @since 2.0.0
@@ -1747,10 +1748,11 @@ class Dataset[T

spark git commit: [SPARK-20854][TESTS] Removing duplicate test case

2017-06-06 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 421d8ecb8 -> 3f93d076b


[SPARK-20854][TESTS] Removing duplicate test case

## What changes were proposed in this pull request?

Removed a duplicate case in "SPARK-20854: select hint syntax with expressions"

## How was this patch tested?
Existing tests.

Author: Bogdan Raducanu 

Closes #18217 from bogdanrdc/SPARK-20854-2.

(cherry picked from commit cb83ca1433c865cb0aef973df2b872a83671acfd)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3f93d076
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3f93d076
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3f93d076

Branch: refs/heads/branch-2.2
Commit: 3f93d076b8c4a932bace2ebef400abe60ad5927c
Parents: 421d8ec
Author: Bogdan Raducanu 
Authored: Tue Jun 6 22:51:10 2017 -0700
Committer: Reynold Xin 
Committed: Tue Jun 6 22:51:18 2017 -0700

--
 .../apache/spark/sql/catalyst/parser/PlanParserSuite.scala   | 8 
 1 file changed, 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3f93d076/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
index 954f6da..77ae843 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
@@ -545,14 +545,6 @@ class PlanParserSuite extends PlanTest {
 )
 
 comparePlans(
-  parsePlan("SELECT /*+ HINT1(a, array(1, 2, 3)) */ * from t"),
-  UnresolvedHint("HINT1", Seq($"a",
-UnresolvedFunction("array", Literal(1) :: Literal(2) :: Literal(3) :: 
Nil, false)),
-table("t").select(star())
-  )
-)
-
-comparePlans(
   parsePlan("SELECT /*+ HINT1(a, 5, 'a', b) */ * from t"),
   UnresolvedHint("HINT1", Seq($"a", Literal(5), Literal("a"), $"b"),
 table("t").select(star())


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20854][TESTS] Removing duplicate test case

2017-06-06 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master c92949ac2 -> cb83ca143


[SPARK-20854][TESTS] Removing duplicate test case

## What changes were proposed in this pull request?

Removed a duplicate case in "SPARK-20854: select hint syntax with expressions"

## How was this patch tested?
Existing tests.

Author: Bogdan Raducanu 

Closes #18217 from bogdanrdc/SPARK-20854-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cb83ca14
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cb83ca14
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cb83ca14

Branch: refs/heads/master
Commit: cb83ca1433c865cb0aef973df2b872a83671acfd
Parents: c92949a
Author: Bogdan Raducanu 
Authored: Tue Jun 6 22:51:10 2017 -0700
Committer: Reynold Xin 
Committed: Tue Jun 6 22:51:10 2017 -0700

--
 .../apache/spark/sql/catalyst/parser/PlanParserSuite.scala   | 8 
 1 file changed, 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cb83ca14/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
index d004d04..fef39a5 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
@@ -576,14 +576,6 @@ class PlanParserSuite extends PlanTest {
 )
 
 comparePlans(
-  parsePlan("SELECT /*+ HINT1(a, array(1, 2, 3)) */ * from t"),
-  UnresolvedHint("HINT1", Seq($"a",
-UnresolvedFunction("array", Literal(1) :: Literal(2) :: Literal(3) :: 
Nil, false)),
-table("t").select(star())
-  )
-)
-
-comparePlans(
   parsePlan("SELECT /*+ HINT1(a, 5, 'a', b) */ * from t"),
   UnresolvedHint("HINT1", Seq($"a", Literal(5), Literal("a"), $"b"),
 table("t").select(star())


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-8184][SQL] Add additional function description for weekofyear

2017-05-29 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 26640a269 -> 3b79e4cda


[SPARK-8184][SQL] Add additional function description for weekofyear

## What changes were proposed in this pull request?

Add additional function description for weekofyear.

## How was this patch tested?

 manual tests

![weekofyear](https://cloud.githubusercontent.com/assets/5399861/26525752/08a1c278-4394-11e7-8988-7cbf82c3a999.gif)

Author: Yuming Wang 

Closes #18132 from wangyum/SPARK-8184.

(cherry picked from commit 1c7db00c74ec6a91c7eefbdba85cbf41fbe8634a)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3b79e4cd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3b79e4cd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3b79e4cd

Branch: refs/heads/branch-2.2
Commit: 3b79e4cda74e0bf82ec55e673beb8f84e7cfaca4
Parents: 26640a2
Author: Yuming Wang 
Authored: Mon May 29 16:10:22 2017 -0700
Committer: Reynold Xin 
Committed: Mon May 29 16:10:29 2017 -0700

--
 .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3b79e4cd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 6a76058..0ab7207 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -402,13 +402,15 @@ case class DayOfMonth(child: Expression) extends 
UnaryExpression with ImplicitCa
   }
 }
 
+// scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(date) - Returns the week of the year of the given date.",
+  usage = "_FUNC_(date) - Returns the week of the year of the given date. A 
week is considered to start on a Monday and week 1 is the first week with >3 
days.",
   extended = """
 Examples:
   > SELECT _FUNC_('2008-02-20');
8
   """)
+// scalastyle:on line.size.limit
 case class WeekOfYear(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
 
   override def inputTypes: Seq[AbstractDataType] = Seq(DateType)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-8184][SQL] Add additional function description for weekofyear

2017-05-29 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master c9749068e -> 1c7db00c7


[SPARK-8184][SQL] Add additional function description for weekofyear

## What changes were proposed in this pull request?

Add additional function description for weekofyear.

## How was this patch tested?

 manual tests

![weekofyear](https://cloud.githubusercontent.com/assets/5399861/26525752/08a1c278-4394-11e7-8988-7cbf82c3a999.gif)

Author: Yuming Wang 

Closes #18132 from wangyum/SPARK-8184.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1c7db00c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1c7db00c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1c7db00c

Branch: refs/heads/master
Commit: 1c7db00c74ec6a91c7eefbdba85cbf41fbe8634a
Parents: c974906
Author: Yuming Wang 
Authored: Mon May 29 16:10:22 2017 -0700
Committer: Reynold Xin 
Committed: Mon May 29 16:10:22 2017 -0700

--
 .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1c7db00c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 43ca2cf..4098300 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -402,13 +402,15 @@ case class DayOfMonth(child: Expression) extends 
UnaryExpression with ImplicitCa
   }
 }
 
+// scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(date) - Returns the week of the year of the given date.",
+  usage = "_FUNC_(date) - Returns the week of the year of the given date. A 
week is considered to start on a Monday and week 1 is the first week with >3 
days.",
   extended = """
 Examples:
   > SELECT _FUNC_('2008-02-20');
8
   """)
+// scalastyle:on line.size.limit
 case class WeekOfYear(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
 
   override def inputTypes: Seq[AbstractDataType] = Seq(DateType)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20857][SQL] Generic resolved hint node

2017-05-23 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 dbb068f4f -> d20c64695


[SPARK-20857][SQL] Generic resolved hint node

## What changes were proposed in this pull request?
This patch renames BroadcastHint to ResolvedHint (and Hint to UnresolvedHint) 
so the hint framework is more generic and would allow us to introduce other 
hint types in the future without introducing new hint nodes.

## How was this patch tested?
Updated test cases.

Author: Reynold Xin <r...@databricks.com>

Closes #18072 from rxin/SPARK-20857.

(cherry picked from commit 0d589ba00b5d539fbfef5174221de046a70548cd)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d20c6469
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d20c6469
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d20c6469

Branch: refs/heads/branch-2.2
Commit: d20c6469565c4f7687f9af14a6f12a775b0c6e62
Parents: dbb068f
Author: Reynold Xin <r...@databricks.com>
Authored: Tue May 23 18:44:49 2017 +0200
Committer: Reynold Xin <r...@databricks.com>
Committed: Tue May 23 18:45:08 2017 +0200

--
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  2 +-
 .../sql/catalyst/analysis/CheckAnalysis.scala   |  2 +-
 .../sql/catalyst/analysis/ResolveHints.scala| 12 ++---
 .../sql/catalyst/optimizer/Optimizer.scala  |  2 +-
 .../sql/catalyst/optimizer/expressions.scala|  2 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala  |  4 +-
 .../spark/sql/catalyst/planning/patterns.scala  |  4 +-
 .../sql/catalyst/plans/logical/Statistics.scala |  5 ++
 .../plans/logical/basicLogicalOperators.scala   | 22 +
 .../sql/catalyst/plans/logical/hints.scala  | 49 
 .../catalyst/analysis/ResolveHintsSuite.scala   | 41 
 .../catalyst/optimizer/ColumnPruningSuite.scala |  5 +-
 .../optimizer/FilterPushdownSuite.scala |  4 +-
 .../optimizer/JoinOptimizationSuite.scala   |  4 +-
 .../sql/catalyst/parser/PlanParserSuite.scala   | 15 +++---
 .../BasicStatsEstimationSuite.scala |  2 +-
 .../scala/org/apache/spark/sql/Dataset.scala|  2 +-
 .../spark/sql/execution/SparkStrategies.scala   |  2 +-
 .../scala/org/apache/spark/sql/functions.scala  |  5 +-
 .../execution/joins/BroadcastJoinSuite.scala| 14 +++---
 20 files changed, 118 insertions(+), 80 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d20c6469/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 5be67ac..9979642 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1311,7 +1311,7 @@ class Analyzer(
 
 // Category 1:
 // BroadcastHint, Distinct, LeafNode, Repartition, and SubqueryAlias
-case _: BroadcastHint | _: Distinct | _: LeafNode | _: Repartition | 
_: SubqueryAlias =>
+case _: ResolvedHint | _: Distinct | _: LeafNode | _: Repartition | _: 
SubqueryAlias =>
 
 // Category 2:
 // These operators can be anywhere in a correlated subquery.

http://git-wip-us.apache.org/repos/asf/spark/blob/d20c6469/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index ea4560a..2e3ac3e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -399,7 +399,7 @@ trait CheckAnalysis extends PredicateHelper {
  |in operator ${operator.simpleString}
""".stripMargin)
 
-  case _: Hint =>
+  case _: UnresolvedHint =>
 throw new IllegalStateException(
   "Internal error: logical hint operator should have been removed 
during analysis")
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d20c6469/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
 
b/sq

spark git commit: [SPARK-20857][SQL] Generic resolved hint node

2017-05-23 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master ad09e4ca0 -> 0d589ba00


[SPARK-20857][SQL] Generic resolved hint node

## What changes were proposed in this pull request?
This patch renames BroadcastHint to ResolvedHint (and Hint to UnresolvedHint) 
so the hint framework is more generic and would allow us to introduce other 
hint types in the future without introducing new hint nodes.

## How was this patch tested?
Updated test cases.

Author: Reynold Xin <r...@databricks.com>

Closes #18072 from rxin/SPARK-20857.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0d589ba0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0d589ba0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0d589ba0

Branch: refs/heads/master
Commit: 0d589ba00b5d539fbfef5174221de046a70548cd
Parents: ad09e4c
Author: Reynold Xin <r...@databricks.com>
Authored: Tue May 23 18:44:49 2017 +0200
Committer: Reynold Xin <r...@databricks.com>
Committed: Tue May 23 18:44:49 2017 +0200

--
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  2 +-
 .../sql/catalyst/analysis/CheckAnalysis.scala   |  2 +-
 .../sql/catalyst/analysis/ResolveHints.scala| 12 ++---
 .../sql/catalyst/optimizer/Optimizer.scala  |  2 +-
 .../sql/catalyst/optimizer/expressions.scala|  2 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala  |  4 +-
 .../spark/sql/catalyst/planning/patterns.scala  |  4 +-
 .../sql/catalyst/plans/logical/Statistics.scala |  5 ++
 .../plans/logical/basicLogicalOperators.scala   | 22 +
 .../sql/catalyst/plans/logical/hints.scala  | 49 
 .../catalyst/analysis/ResolveHintsSuite.scala   | 41 
 .../catalyst/optimizer/ColumnPruningSuite.scala |  5 +-
 .../optimizer/FilterPushdownSuite.scala |  4 +-
 .../optimizer/JoinOptimizationSuite.scala   |  4 +-
 .../sql/catalyst/parser/PlanParserSuite.scala   | 15 +++---
 .../BasicStatsEstimationSuite.scala |  2 +-
 .../scala/org/apache/spark/sql/Dataset.scala|  2 +-
 .../spark/sql/execution/SparkStrategies.scala   |  2 +-
 .../scala/org/apache/spark/sql/functions.scala  |  5 +-
 .../execution/joins/BroadcastJoinSuite.scala| 14 +++---
 20 files changed, 118 insertions(+), 80 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0d589ba0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index d58b8ac..d130962 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1336,7 +1336,7 @@ class Analyzer(
 
 // Category 1:
 // BroadcastHint, Distinct, LeafNode, Repartition, and SubqueryAlias
-case _: BroadcastHint | _: Distinct | _: LeafNode | _: Repartition | 
_: SubqueryAlias =>
+case _: ResolvedHint | _: Distinct | _: LeafNode | _: Repartition | _: 
SubqueryAlias =>
 
 // Category 2:
 // These operators can be anywhere in a correlated subquery.

http://git-wip-us.apache.org/repos/asf/spark/blob/0d589ba0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index ea4560a..2e3ac3e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -399,7 +399,7 @@ trait CheckAnalysis extends PredicateHelper {
  |in operator ${operator.simpleString}
""".stripMargin)
 
-  case _: Hint =>
+  case _: UnresolvedHint =>
 throw new IllegalStateException(
   "Internal error: logical hint operator should have been removed 
during analysis")
 

http://git-wip-us.apache.org/repos/asf/spark/blob/0d589ba0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
index df688fa..9dfd84c 100644
--- 
a/sql/cata

spark git commit: Revert "[SPARK-12297][SQL] Hive compatibility for Parquet Timestamps"

2017-05-09 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 1b85bcd92 -> ac1ab6b9d


Revert "[SPARK-12297][SQL] Hive compatibility for Parquet Timestamps"

This reverts commit 22691556e5f0dfbac81b8cc9ca0a67c70c1711ca.

See JIRA ticket for more information.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ac1ab6b9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ac1ab6b9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ac1ab6b9

Branch: refs/heads/master
Commit: ac1ab6b9db188ac54c745558d57dd0a031d0b162
Parents: 1b85bcd
Author: Reynold Xin 
Authored: Tue May 9 11:35:59 2017 -0700
Committer: Reynold Xin 
Committed: Tue May 9 11:35:59 2017 -0700

--
 .../spark/sql/catalyst/catalog/interface.scala  |   4 +-
 .../spark/sql/catalyst/util/DateTimeUtils.scala |   5 -
 .../parquet/VectorizedColumnReader.java |  28 +-
 .../parquet/VectorizedParquetRecordReader.java  |   6 +-
 .../spark/sql/execution/command/tables.scala|   8 +-
 .../datasources/parquet/ParquetFileFormat.scala |   2 -
 .../parquet/ParquetReadSupport.scala|   3 +-
 .../parquet/ParquetRecordMaterializer.scala |   9 +-
 .../parquet/ParquetRowConverter.scala   |  53 +--
 .../parquet/ParquetWriteSupport.scala   |  25 +-
 .../spark/sql/hive/HiveExternalCatalog.scala|  11 +-
 .../spark/sql/hive/HiveMetastoreCatalog.scala   |  12 +-
 .../hive/ParquetHiveCompatibilitySuite.scala| 379 +--
 13 files changed, 29 insertions(+), 516 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ac1ab6b9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
index c39017e..cc0cbba 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
@@ -132,10 +132,10 @@ case class CatalogTablePartition(
   /**
* Given the partition schema, returns a row with that schema holding the 
partition values.
*/
-  def toRow(partitionSchema: StructType, defaultTimeZoneId: String): 
InternalRow = {
+  def toRow(partitionSchema: StructType, defaultTimeZondId: String): 
InternalRow = {
 val caseInsensitiveProperties = CaseInsensitiveMap(storage.properties)
 val timeZoneId = caseInsensitiveProperties.getOrElse(
-  DateTimeUtils.TIMEZONE_OPTION, defaultTimeZoneId)
+  DateTimeUtils.TIMEZONE_OPTION, defaultTimeZondId)
 InternalRow.fromSeq(partitionSchema.map { field =>
   val partValue = if (spec(field.name) == 
ExternalCatalogUtils.DEFAULT_PARTITION_NAME) {
 null

http://git-wip-us.apache.org/repos/asf/spark/blob/ac1ab6b9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index bf596fa..6c1592f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@@ -498,11 +498,6 @@ object DateTimeUtils {
 false
   }
 
-  lazy val validTimezones = TimeZone.getAvailableIDs().toSet
-  def isValidTimezone(timezoneId: String): Boolean = {
-validTimezones.contains(timezoneId)
-  }
-
   /**
* Returns the microseconds since year zero (-17999) from microseconds since 
epoch.
*/

http://git-wip-us.apache.org/repos/asf/spark/blob/ac1ab6b9/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
index dabbc2b..9d641b5 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
@@ -18,9 +18,7 @@
 package org.apache.spark.sql.execution.datasources.parquet;
 
 import java.io.IOException;
-import java.util.TimeZone;
 
-import org.apache.hadoop.conf.Configuration;
 import org.apache.parquet.bytes.BytesUtils;
 import

spark git commit: [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch

2017-05-05 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master b31648c08 -> 5d75b14bf


[SPARK-20616] RuleExecutor logDebug of batch results should show diff to start 
of batch

## What changes were proposed in this pull request?

Due to a likely typo, the logDebug msg printing the diff of query plans shows a 
diff to the initial plan, not diff to the start of batch.

## How was this patch tested?

Now the debug message prints the diff between start and end of batch.

Author: Juliusz Sompolski 

Closes #17875 from juliuszsompolski/SPARK-20616.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5d75b14b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5d75b14b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5d75b14b

Branch: refs/heads/master
Commit: 5d75b14bf0f4c1f0813287efaabf49797908ed55
Parents: b31648c
Author: Juliusz Sompolski 
Authored: Fri May 5 15:31:06 2017 -0700
Committer: Reynold Xin 
Committed: Fri May 5 15:31:06 2017 -0700

--
 .../scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5d75b14b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
index 6fc828f..85b368c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
@@ -122,7 +122,7 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
 logDebug(
   s"""
   |=== Result of Batch ${batch.name} ===
-  |${sideBySide(plan.treeString, curPlan.treeString).mkString("\n")}
+  |${sideBySide(batchStartPlan.treeString, 
curPlan.treeString).mkString("\n")}
 """.stripMargin)
   } else {
 logTrace(s"Batch ${batch.name} has no effect.")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch

2017-05-05 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 f59c74a94 -> 1d9b7a74a


[SPARK-20616] RuleExecutor logDebug of batch results should show diff to start 
of batch

## What changes were proposed in this pull request?

Due to a likely typo, the logDebug msg printing the diff of query plans shows a 
diff to the initial plan, not diff to the start of batch.

## How was this patch tested?

Now the debug message prints the diff between start and end of batch.

Author: Juliusz Sompolski 

Closes #17875 from juliuszsompolski/SPARK-20616.

(cherry picked from commit 5d75b14bf0f4c1f0813287efaabf49797908ed55)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1d9b7a74
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1d9b7a74
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1d9b7a74

Branch: refs/heads/branch-2.2
Commit: 1d9b7a74a839021814ab28d3eba3636c64483130
Parents: f59c74a
Author: Juliusz Sompolski 
Authored: Fri May 5 15:31:06 2017 -0700
Committer: Reynold Xin 
Committed: Fri May 5 15:31:13 2017 -0700

--
 .../scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1d9b7a74/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
index 6fc828f..85b368c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
@@ -122,7 +122,7 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
 logDebug(
   s"""
   |=== Result of Batch ${batch.name} ===
-  |${sideBySide(plan.treeString, curPlan.treeString).mkString("\n")}
+  |${sideBySide(batchStartPlan.treeString, 
curPlan.treeString).mkString("\n")}
 """.stripMargin)
   } else {
 logTrace(s"Batch ${batch.name} has no effect.")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch

2017-05-05 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 704b249b6 -> a1112c615


[SPARK-20616] RuleExecutor logDebug of batch results should show diff to start 
of batch

## What changes were proposed in this pull request?

Due to a likely typo, the logDebug msg printing the diff of query plans shows a 
diff to the initial plan, not diff to the start of batch.

## How was this patch tested?

Now the debug message prints the diff between start and end of batch.

Author: Juliusz Sompolski 

Closes #17875 from juliuszsompolski/SPARK-20616.

(cherry picked from commit 5d75b14bf0f4c1f0813287efaabf49797908ed55)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a1112c61
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a1112c61
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a1112c61

Branch: refs/heads/branch-2.1
Commit: a1112c615b05d615048159c9d324aa10a4391d4e
Parents: 704b249
Author: Juliusz Sompolski 
Authored: Fri May 5 15:31:06 2017 -0700
Committer: Reynold Xin 
Committed: Fri May 5 15:31:23 2017 -0700

--
 .../scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a1112c61/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
index 6fc828f..85b368c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
@@ -122,7 +122,7 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
 logDebug(
   s"""
   |=== Result of Batch ${batch.name} ===
-  |${sideBySide(plan.treeString, curPlan.treeString).mkString("\n")}
+  |${sideBySide(batchStartPlan.treeString, 
curPlan.treeString).mkString("\n")}
 """.stripMargin)
   } else {
 logTrace(s"Batch ${batch.name} has no effect.")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20584][PYSPARK][SQL] Python generic hint support

2017-05-03 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 13eb37c86 -> 02bbe7311


[SPARK-20584][PYSPARK][SQL] Python generic hint support

## What changes were proposed in this pull request?

Adds `hint` method to PySpark `DataFrame`.

## How was this patch tested?

Unit tests, doctests.

Author: zero323 

Closes #17850 from zero323/SPARK-20584.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/02bbe731
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/02bbe731
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/02bbe731

Branch: refs/heads/master
Commit: 02bbe73118a39e2fb378aa2002449367a92f6d67
Parents: 13eb37c
Author: zero323 
Authored: Wed May 3 19:15:28 2017 -0700
Committer: Reynold Xin 
Committed: Wed May 3 19:15:28 2017 -0700

--
 python/pyspark/sql/dataframe.py | 29 +
 python/pyspark/sql/tests.py | 16 
 2 files changed, 45 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/02bbe731/python/pyspark/sql/dataframe.py
--
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index ab6d35b..7b67985 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -380,6 +380,35 @@ class DataFrame(object):
 jdf = self._jdf.withWatermark(eventTime, delayThreshold)
 return DataFrame(jdf, self.sql_ctx)
 
+@since(2.2)
+def hint(self, name, *parameters):
+"""Specifies some hint on the current DataFrame.
+
+:param name: A name of the hint.
+:param parameters: Optional parameters.
+:return: :class:`DataFrame`
+
+>>> df.join(df2.hint("broadcast"), "name").show()
+++---+--+
+|name|age|height|
+++---+--+
+| Bob|  5|85|
+++---+--+
+"""
+if len(parameters) == 1 and isinstance(parameters[0], list):
+parameters = parameters[0]
+
+if not isinstance(name, str):
+raise TypeError("name should be provided as str, got 
{0}".format(type(name)))
+
+for p in parameters:
+if not isinstance(p, str):
+raise TypeError(
+"all parameters should be str, got {0} of type 
{1}".format(p, type(p)))
+
+jdf = self._jdf.hint(name, self._jseq(parameters))
+return DataFrame(jdf, self.sql_ctx)
+
 @since(1.3)
 def count(self):
 """Returns the number of rows in this :class:`DataFrame`.

http://git-wip-us.apache.org/repos/asf/spark/blob/02bbe731/python/pyspark/sql/tests.py
--
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index ce4abf8..f644624 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -1906,6 +1906,22 @@ class SQLTests(ReusedPySparkTestCase):
 # planner should not crash without a join
 broadcast(df1)._jdf.queryExecution().executedPlan()
 
+def test_generic_hints(self):
+from pyspark.sql import DataFrame
+
+df1 = self.spark.range(10e10).toDF("id")
+df2 = self.spark.range(10e10).toDF("id")
+
+self.assertIsInstance(df1.hint("broadcast"), DataFrame)
+self.assertIsInstance(df1.hint("broadcast", []), DataFrame)
+
+# Dummy rules
+self.assertIsInstance(df1.hint("broadcast", "foo", "bar"), DataFrame)
+self.assertIsInstance(df1.hint("broadcast", ["foo", "bar"]), DataFrame)
+
+plan = df1.join(df2.hint("broadcast"), 
"id")._jdf.queryExecution().executedPlan()
+self.assertEqual(1, plan.toString().count("BroadcastHashJoin"))
+
 def test_toDF_with_schema_string(self):
 data = [Row(key=i, value=str(i)) for i in range(100)]
 rdd = self.sc.parallelize(data, 5)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20584][PYSPARK][SQL] Python generic hint support

2017-05-03 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 a3a5fcfef -> d8bd213f1


[SPARK-20584][PYSPARK][SQL] Python generic hint support

## What changes were proposed in this pull request?

Adds `hint` method to PySpark `DataFrame`.

## How was this patch tested?

Unit tests, doctests.

Author: zero323 

Closes #17850 from zero323/SPARK-20584.

(cherry picked from commit 02bbe73118a39e2fb378aa2002449367a92f6d67)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d8bd213f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d8bd213f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d8bd213f

Branch: refs/heads/branch-2.2
Commit: d8bd213f13279664d50ffa57c1814d0b16fc5d23
Parents: a3a5fcf
Author: zero323 
Authored: Wed May 3 19:15:28 2017 -0700
Committer: Reynold Xin 
Committed: Wed May 3 19:15:42 2017 -0700

--
 python/pyspark/sql/dataframe.py | 29 +
 python/pyspark/sql/tests.py | 16 
 2 files changed, 45 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d8bd213f/python/pyspark/sql/dataframe.py
--
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index f567cc4..d62ba96 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -371,6 +371,35 @@ class DataFrame(object):
 jdf = self._jdf.withWatermark(eventTime, delayThreshold)
 return DataFrame(jdf, self.sql_ctx)
 
+@since(2.2)
+def hint(self, name, *parameters):
+"""Specifies some hint on the current DataFrame.
+
+:param name: A name of the hint.
+:param parameters: Optional parameters.
+:return: :class:`DataFrame`
+
+>>> df.join(df2.hint("broadcast"), "name").show()
+++---+--+
+|name|age|height|
+++---+--+
+| Bob|  5|85|
+++---+--+
+"""
+if len(parameters) == 1 and isinstance(parameters[0], list):
+parameters = parameters[0]
+
+if not isinstance(name, str):
+raise TypeError("name should be provided as str, got 
{0}".format(type(name)))
+
+for p in parameters:
+if not isinstance(p, str):
+raise TypeError(
+"all parameters should be str, got {0} of type 
{1}".format(p, type(p)))
+
+jdf = self._jdf.hint(name, self._jseq(parameters))
+return DataFrame(jdf, self.sql_ctx)
+
 @since(1.3)
 def count(self):
 """Returns the number of rows in this :class:`DataFrame`.

http://git-wip-us.apache.org/repos/asf/spark/blob/d8bd213f/python/pyspark/sql/tests.py
--
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index cd92148..2aa2d23 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -1906,6 +1906,22 @@ class SQLTests(ReusedPySparkTestCase):
 # planner should not crash without a join
 broadcast(df1)._jdf.queryExecution().executedPlan()
 
+def test_generic_hints(self):
+from pyspark.sql import DataFrame
+
+df1 = self.spark.range(10e10).toDF("id")
+df2 = self.spark.range(10e10).toDF("id")
+
+self.assertIsInstance(df1.hint("broadcast"), DataFrame)
+self.assertIsInstance(df1.hint("broadcast", []), DataFrame)
+
+# Dummy rules
+self.assertIsInstance(df1.hint("broadcast", "foo", "bar"), DataFrame)
+self.assertIsInstance(df1.hint("broadcast", ["foo", "bar"]), DataFrame)
+
+plan = df1.join(df2.hint("broadcast"), 
"id")._jdf.queryExecution().executedPlan()
+self.assertEqual(1, plan.toString().count("BroadcastHashJoin"))
+
 def test_toDF_with_schema_string(self):
 data = [Row(key=i, value=str(i)) for i in range(100)]
 rdd = self.sc.parallelize(data, 5)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][SQL] Fix the test title from =!= to <=>, remove a duplicated test and add a test for =!=

2017-05-03 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 6b9e49d12 -> 13eb37c86


[MINOR][SQL] Fix the test title from =!= to <=>, remove a duplicated test and 
add a test for =!=

## What changes were proposed in this pull request?

This PR proposes three things as below:

- This test looks not testing `<=>` and identical with the test above, `===`. 
So, it removes the test.

  ```diff
  -   test("<=>") {
  - checkAnswer(
  -  testData2.filter($"a" === 1),
  -  testData2.collect().toSeq.filter(r => r.getInt(0) == 1))
  -
  -checkAnswer(
  -  testData2.filter($"a" === $"b"),
  -  testData2.collect().toSeq.filter(r => r.getInt(0) == r.getInt(1)))
  -   }
  ```

- Replace the test title from `=!=` to `<=>`. It looks the test actually 
testing `<=>`.

  ```diff
  +  private lazy val nullData = Seq(
  +(Some(1), Some(1)), (Some(1), Some(2)), (Some(1), None), (None, 
None)).toDF("a", "b")
  +
...
  -  test("=!=") {
  +  test("<=>") {
  -val nullData = spark.createDataFrame(sparkContext.parallelize(
  -  Row(1, 1) ::
  -  Row(1, 2) ::
  -  Row(1, null) ::
  -  Row(null, null) :: Nil),
  -  StructType(Seq(StructField("a", IntegerType), StructField("b", 
IntegerType
  -
 checkAnswer(
   nullData.filter($"b" <=> 1),
...
  ```

- Add the tests for `=!=` which looks not existing.

  ```diff
  +  test("=!=") {
  +checkAnswer(
  +  nullData.filter($"b" =!= 1),
  +  Row(1, 2) :: Nil)
  +
  +checkAnswer(nullData.filter($"b" =!= null), Nil)
  +
  +checkAnswer(
  +  nullData.filter($"a" =!= $"b"),
  +  Row(1, 2) :: Nil)
  +  }
  ```

## How was this patch tested?

Manually running the tests.

Author: hyukjinkwon 

Closes #17842 from HyukjinKwon/minor-test-fix.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/13eb37c8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/13eb37c8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/13eb37c8

Branch: refs/heads/master
Commit: 13eb37c860c8f672d0e9d9065d0333f981db71e3
Parents: 6b9e49d
Author: hyukjinkwon 
Authored: Wed May 3 13:08:25 2017 -0700
Committer: Reynold Xin 
Committed: Wed May 3 13:08:25 2017 -0700

--
 .../spark/sql/ColumnExpressionSuite.scala   | 31 +---
 1 file changed, 14 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/13eb37c8/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
index b0f398d..bc708ca 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
@@ -39,6 +39,9 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSQLContext {
   StructType(Seq(StructField("a", BooleanType), StructField("b", 
BooleanType
   }
 
+  private lazy val nullData = Seq(
+(Some(1), Some(1)), (Some(1), Some(2)), (Some(1), None), (None, 
None)).toDF("a", "b")
+
   test("column names with space") {
 val df = Seq((1, "a")).toDF("name with space", "name.with.dot")
 
@@ -284,23 +287,6 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSQLContext {
 
   test("<=>") {
 checkAnswer(
-  testData2.filter($"a" === 1),
-  testData2.collect().toSeq.filter(r => r.getInt(0) == 1))
-
-checkAnswer(
-  testData2.filter($"a" === $"b"),
-  testData2.collect().toSeq.filter(r => r.getInt(0) == r.getInt(1)))
-  }
-
-  test("=!=") {
-val nullData = spark.createDataFrame(sparkContext.parallelize(
-  Row(1, 1) ::
-  Row(1, 2) ::
-  Row(1, null) ::
-  Row(null, null) :: Nil),
-  StructType(Seq(StructField("a", IntegerType), StructField("b", 
IntegerType
-
-checkAnswer(
   nullData.filter($"b" <=> 1),
   Row(1, 1) :: Nil)
 
@@ -321,7 +307,18 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(
   nullData2.filter($"a" <=> null),
   Row(null) :: Nil)
+  }
 
+  test("=!=") {
+checkAnswer(
+  nullData.filter($"b" =!= 1),
+  Row(1, 2) :: Nil)
+
+checkAnswer(nullData.filter($"b" =!= null), Nil)
+
+checkAnswer(
+  nullData.filter($"a" =!= $"b"),
+  Row(1, 2) :: Nil)
   }
 
   test(">") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][SQL] Fix the test title from =!= to <=>, remove a duplicated test and add a test for =!=

2017-05-03 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 36d807906 -> 2629e7c7a


[MINOR][SQL] Fix the test title from =!= to <=>, remove a duplicated test and 
add a test for =!=

## What changes were proposed in this pull request?

This PR proposes three things as below:

- This test looks not testing `<=>` and identical with the test above, `===`. 
So, it removes the test.

  ```diff
  -   test("<=>") {
  - checkAnswer(
  -  testData2.filter($"a" === 1),
  -  testData2.collect().toSeq.filter(r => r.getInt(0) == 1))
  -
  -checkAnswer(
  -  testData2.filter($"a" === $"b"),
  -  testData2.collect().toSeq.filter(r => r.getInt(0) == r.getInt(1)))
  -   }
  ```

- Replace the test title from `=!=` to `<=>`. It looks the test actually 
testing `<=>`.

  ```diff
  +  private lazy val nullData = Seq(
  +(Some(1), Some(1)), (Some(1), Some(2)), (Some(1), None), (None, 
None)).toDF("a", "b")
  +
...
  -  test("=!=") {
  +  test("<=>") {
  -val nullData = spark.createDataFrame(sparkContext.parallelize(
  -  Row(1, 1) ::
  -  Row(1, 2) ::
  -  Row(1, null) ::
  -  Row(null, null) :: Nil),
  -  StructType(Seq(StructField("a", IntegerType), StructField("b", 
IntegerType
  -
 checkAnswer(
   nullData.filter($"b" <=> 1),
...
  ```

- Add the tests for `=!=` which looks not existing.

  ```diff
  +  test("=!=") {
  +checkAnswer(
  +  nullData.filter($"b" =!= 1),
  +  Row(1, 2) :: Nil)
  +
  +checkAnswer(nullData.filter($"b" =!= null), Nil)
  +
  +checkAnswer(
  +  nullData.filter($"a" =!= $"b"),
  +  Row(1, 2) :: Nil)
  +  }
  ```

## How was this patch tested?

Manually running the tests.

Author: hyukjinkwon 

Closes #17842 from HyukjinKwon/minor-test-fix.

(cherry picked from commit 13eb37c860c8f672d0e9d9065d0333f981db71e3)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2629e7c7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2629e7c7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2629e7c7

Branch: refs/heads/branch-2.2
Commit: 2629e7c7a1dacfb267d866cf825fa8a078612462
Parents: 36d8079
Author: hyukjinkwon 
Authored: Wed May 3 13:08:25 2017 -0700
Committer: Reynold Xin 
Committed: Wed May 3 13:08:31 2017 -0700

--
 .../spark/sql/ColumnExpressionSuite.scala   | 31 +---
 1 file changed, 14 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2629e7c7/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
index b0f398d..bc708ca 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
@@ -39,6 +39,9 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSQLContext {
   StructType(Seq(StructField("a", BooleanType), StructField("b", 
BooleanType
   }
 
+  private lazy val nullData = Seq(
+(Some(1), Some(1)), (Some(1), Some(2)), (Some(1), None), (None, 
None)).toDF("a", "b")
+
   test("column names with space") {
 val df = Seq((1, "a")).toDF("name with space", "name.with.dot")
 
@@ -284,23 +287,6 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSQLContext {
 
   test("<=>") {
 checkAnswer(
-  testData2.filter($"a" === 1),
-  testData2.collect().toSeq.filter(r => r.getInt(0) == 1))
-
-checkAnswer(
-  testData2.filter($"a" === $"b"),
-  testData2.collect().toSeq.filter(r => r.getInt(0) == r.getInt(1)))
-  }
-
-  test("=!=") {
-val nullData = spark.createDataFrame(sparkContext.parallelize(
-  Row(1, 1) ::
-  Row(1, 2) ::
-  Row(1, null) ::
-  Row(null, null) :: Nil),
-  StructType(Seq(StructField("a", IntegerType), StructField("b", 
IntegerType
-
-checkAnswer(
   nullData.filter($"b" <=> 1),
   Row(1, 1) :: Nil)
 
@@ -321,7 +307,18 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(
   nullData2.filter($"a" <=> null),
   Row(null) :: Nil)
+  }
 
+  test("=!=") {
+checkAnswer(
+  nullData.filter($"b" =!= 1),
+  Row(1, 2) :: Nil)
+
+checkAnswer(nullData.filter($"b" =!= null), Nil)
+
+checkAnswer(
+  nullData.filter($"a" =!= $"b"),
+  Row(1, 2) :: Nil)
   }
 
   test(">") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For

spark git commit: [SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame

2017-05-03 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 b1a732fea -> f0e80aa2d


[SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame

## What changes were proposed in this pull request?
We allow users to specify hints (currently only "broadcast" is supported) in 
SQL and DataFrame. However, while SQL has a standard hint format (/*+ ... */), 
DataFrame doesn't have one and sometimes users are confused that they can't 
find how to apply a broadcast hint. This ticket adds a generic hint function on 
DataFrame that allows using the same hint on DataFrames as well as SQL.

As an example, after this patch, the following will apply a broadcast hint on a 
DataFrame using the new hint function:

```
df1.join(df2.hint("broadcast"))
```

## How was this patch tested?
Added a test case in DataFrameJoinSuite.

Author: Reynold Xin <r...@databricks.com>

Closes #17839 from rxin/SPARK-20576.

(cherry picked from commit 527fc5d0c990daaacad4740f62cfe6736609b77b)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f0e80aa2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f0e80aa2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f0e80aa2

Branch: refs/heads/branch-2.2
Commit: f0e80aa2ddee80819ef33ee24eb6a15a73bc02d5
Parents: b1a732f
Author: Reynold Xin <r...@databricks.com>
Authored: Wed May 3 09:22:25 2017 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Wed May 3 09:22:41 2017 -0700

--
 .../sql/catalyst/analysis/ResolveHints.scala  |  8 +++-
 .../main/scala/org/apache/spark/sql/Dataset.scala | 16 
 .../org/apache/spark/sql/DataFrameJoinSuite.scala | 18 +-
 3 files changed, 40 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f0e80aa2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
index c4827b8..df688fa 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
@@ -86,7 +86,13 @@ object ResolveHints {
 
 def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
   case h: Hint if 
BROADCAST_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT)) =>
-applyBroadcastHint(h.child, h.parameters.toSet)
+if (h.parameters.isEmpty) {
+  // If there is no table alias specified, turn the entire subtree 
into a BroadcastHint.
+  BroadcastHint(h.child)
+} else {
+  // Otherwise, find within the subtree query plans that should be 
broadcasted.
+  applyBroadcastHint(h.child, h.parameters.toSet)
+}
 }
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/f0e80aa2/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index 06dd550..5f602dc 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1074,6 +1074,22 @@ class Dataset[T] private[sql](
   def apply(colName: String): Column = col(colName)
 
   /**
+   * Specifies some hint on the current Dataset. As an example, the following 
code specifies
+   * that one of the plan can be broadcasted:
+   *
+   * {{{
+   *   df1.join(df2.hint("broadcast"))
+   * }}}
+   *
+   * @group basic
+   * @since 2.2.0
+   */
+  @scala.annotation.varargs
+  def hint(name: String, parameters: String*): Dataset[T] = withTypedPlan {
+Hint(name, parameters, logicalPlan)
+  }
+
+  /**
* Selects column based on the column name and return it as a [[Column]].
*
* @note The column name can also reference to a nested column like `a.b`.

http://git-wip-us.apache.org/repos/asf/spark/blob/f0e80aa2/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
index 541ffb5..4a52af6 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
@@

spark git commit: [SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame

2017-05-03 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 27f543b15 -> 527fc5d0c


[SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame

## What changes were proposed in this pull request?
We allow users to specify hints (currently only "broadcast" is supported) in 
SQL and DataFrame. However, while SQL has a standard hint format (/*+ ... */), 
DataFrame doesn't have one and sometimes users are confused that they can't 
find how to apply a broadcast hint. This ticket adds a generic hint function on 
DataFrame that allows using the same hint on DataFrames as well as SQL.

As an example, after this patch, the following will apply a broadcast hint on a 
DataFrame using the new hint function:

```
df1.join(df2.hint("broadcast"))
```

## How was this patch tested?
Added a test case in DataFrameJoinSuite.

Author: Reynold Xin <r...@databricks.com>

Closes #17839 from rxin/SPARK-20576.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/527fc5d0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/527fc5d0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/527fc5d0

Branch: refs/heads/master
Commit: 527fc5d0c990daaacad4740f62cfe6736609b77b
Parents: 27f543b
Author: Reynold Xin <r...@databricks.com>
Authored: Wed May 3 09:22:25 2017 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Wed May 3 09:22:25 2017 -0700

--
 .../sql/catalyst/analysis/ResolveHints.scala  |  8 +++-
 .../main/scala/org/apache/spark/sql/Dataset.scala | 16 
 .../org/apache/spark/sql/DataFrameJoinSuite.scala | 18 +-
 3 files changed, 40 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/527fc5d0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
index c4827b8..df688fa 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
@@ -86,7 +86,13 @@ object ResolveHints {
 
 def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
   case h: Hint if 
BROADCAST_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT)) =>
-applyBroadcastHint(h.child, h.parameters.toSet)
+if (h.parameters.isEmpty) {
+  // If there is no table alias specified, turn the entire subtree 
into a BroadcastHint.
+  BroadcastHint(h.child)
+} else {
+  // Otherwise, find within the subtree query plans that should be 
broadcasted.
+  applyBroadcastHint(h.child, h.parameters.toSet)
+}
 }
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/527fc5d0/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index 147e765..620c8bd 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1161,6 +1161,22 @@ class Dataset[T] private[sql](
   def apply(colName: String): Column = col(colName)
 
   /**
+   * Specifies some hint on the current Dataset. As an example, the following 
code specifies
+   * that one of the plan can be broadcasted:
+   *
+   * {{{
+   *   df1.join(df2.hint("broadcast"))
+   * }}}
+   *
+   * @group basic
+   * @since 2.2.0
+   */
+  @scala.annotation.varargs
+  def hint(name: String, parameters: String*): Dataset[T] = withTypedPlan {
+Hint(name, parameters, logicalPlan)
+  }
+
+  /**
* Selects column based on the column name and return it as a [[Column]].
*
* @note The column name can also reference to a nested column like `a.b`.

http://git-wip-us.apache.org/repos/asf/spark/blob/527fc5d0/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
index 541ffb5..4a52af6 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
@@ -151,7 +151,7 @@ class DataFrameJoinSuite extends QueryTest with 
SharedSQLContext {
   Row(1, 1, 1, 1) :: Row(2, 1, 2, 2) :: Nil)
   }

spark git commit: [SPARK-20474] Fixing OnHeapColumnVector reallocation

2017-04-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 6709bcf6e -> e278876ba


[SPARK-20474] Fixing OnHeapColumnVector reallocation

## What changes were proposed in this pull request?
OnHeapColumnVector reallocation copies to the new storage data up to 
'elementsAppended'. This variable is only updated when using the 
ColumnVector.appendX API, while ColumnVector.putX is more commonly used.

## How was this patch tested?
Tested using existing unit tests.

Author: Michal Szafranski 

Closes #17773 from michal-databricks/spark-20474.

(cherry picked from commit a277ae80a2836e6533b338d2b9c4e59ed8a1daae)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e278876b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e278876b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e278876b

Branch: refs/heads/branch-2.2
Commit: e278876ba3d66d3fb249df59c3de8d78ca25c5f0
Parents: 6709bcf
Author: Michal Szafranski 
Authored: Wed Apr 26 12:47:37 2017 -0700
Committer: Reynold Xin 
Committed: Wed Apr 26 12:47:50 2017 -0700

--
 .../vectorized/OnHeapColumnVector.java  | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e278876b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
index 9b410ba..94ed322 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
@@ -410,53 +410,53 @@ public final class OnHeapColumnVector extends 
ColumnVector {
   int[] newLengths = new int[newCapacity];
   int[] newOffsets = new int[newCapacity];
   if (this.arrayLengths != null) {
-System.arraycopy(this.arrayLengths, 0, newLengths, 0, 
elementsAppended);
-System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, 
elementsAppended);
+System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity);
+System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, capacity);
   }
   arrayLengths = newLengths;
   arrayOffsets = newOffsets;
 } else if (type instanceof BooleanType) {
   if (byteData == null || byteData.length < newCapacity) {
 byte[] newData = new byte[newCapacity];
-if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
elementsAppended);
+if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
capacity);
 byteData = newData;
   }
 } else if (type instanceof ByteType) {
   if (byteData == null || byteData.length < newCapacity) {
 byte[] newData = new byte[newCapacity];
-if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
elementsAppended);
+if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
capacity);
 byteData = newData;
   }
 } else if (type instanceof ShortType) {
   if (shortData == null || shortData.length < newCapacity) {
 short[] newData = new short[newCapacity];
-if (shortData != null) System.arraycopy(shortData, 0, newData, 0, 
elementsAppended);
+if (shortData != null) System.arraycopy(shortData, 0, newData, 0, 
capacity);
 shortData = newData;
   }
 } else if (type instanceof IntegerType || type instanceof DateType ||
   DecimalType.is32BitDecimalType(type)) {
   if (intData == null || intData.length < newCapacity) {
 int[] newData = new int[newCapacity];
-if (intData != null) System.arraycopy(intData, 0, newData, 0, 
elementsAppended);
+if (intData != null) System.arraycopy(intData, 0, newData, 0, 
capacity);
 intData = newData;
   }
 } else if (type instanceof LongType || type instanceof TimestampType ||
 DecimalType.is64BitDecimalType(type)) {
   if (longData == null || longData.length < newCapacity) {
 long[] newData = new long[newCapacity];
-if (longData != null) System.arraycopy(longData, 0, newData, 0, 
elementsAppended);
+if (longData != null) System.arraycopy(longData, 0, newData, 0, 
capacity);
 longData = newData;
   }
 } else if (type instanceof FloatType) {
   if (floatData == null || floatData.length < newCapacity) {
 float[] newData = new float[newCapacity];
-if (floatData != null) System.arraycopy(floatData, 0,

spark git commit: [SPARK-20474] Fixing OnHeapColumnVector reallocation

2017-04-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 99c6cf9ef -> a277ae80a


[SPARK-20474] Fixing OnHeapColumnVector reallocation

## What changes were proposed in this pull request?
OnHeapColumnVector reallocation copies to the new storage data up to 
'elementsAppended'. This variable is only updated when using the 
ColumnVector.appendX API, while ColumnVector.putX is more commonly used.

## How was this patch tested?
Tested using existing unit tests.

Author: Michal Szafranski 

Closes #17773 from michal-databricks/spark-20474.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a277ae80
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a277ae80
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a277ae80

Branch: refs/heads/master
Commit: a277ae80a2836e6533b338d2b9c4e59ed8a1daae
Parents: 99c6cf9
Author: Michal Szafranski 
Authored: Wed Apr 26 12:47:37 2017 -0700
Committer: Reynold Xin 
Committed: Wed Apr 26 12:47:37 2017 -0700

--
 .../vectorized/OnHeapColumnVector.java  | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a277ae80/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
index 9b410ba..94ed322 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
@@ -410,53 +410,53 @@ public final class OnHeapColumnVector extends 
ColumnVector {
   int[] newLengths = new int[newCapacity];
   int[] newOffsets = new int[newCapacity];
   if (this.arrayLengths != null) {
-System.arraycopy(this.arrayLengths, 0, newLengths, 0, 
elementsAppended);
-System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, 
elementsAppended);
+System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity);
+System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, capacity);
   }
   arrayLengths = newLengths;
   arrayOffsets = newOffsets;
 } else if (type instanceof BooleanType) {
   if (byteData == null || byteData.length < newCapacity) {
 byte[] newData = new byte[newCapacity];
-if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
elementsAppended);
+if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
capacity);
 byteData = newData;
   }
 } else if (type instanceof ByteType) {
   if (byteData == null || byteData.length < newCapacity) {
 byte[] newData = new byte[newCapacity];
-if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
elementsAppended);
+if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
capacity);
 byteData = newData;
   }
 } else if (type instanceof ShortType) {
   if (shortData == null || shortData.length < newCapacity) {
 short[] newData = new short[newCapacity];
-if (shortData != null) System.arraycopy(shortData, 0, newData, 0, 
elementsAppended);
+if (shortData != null) System.arraycopy(shortData, 0, newData, 0, 
capacity);
 shortData = newData;
   }
 } else if (type instanceof IntegerType || type instanceof DateType ||
   DecimalType.is32BitDecimalType(type)) {
   if (intData == null || intData.length < newCapacity) {
 int[] newData = new int[newCapacity];
-if (intData != null) System.arraycopy(intData, 0, newData, 0, 
elementsAppended);
+if (intData != null) System.arraycopy(intData, 0, newData, 0, 
capacity);
 intData = newData;
   }
 } else if (type instanceof LongType || type instanceof TimestampType ||
 DecimalType.is64BitDecimalType(type)) {
   if (longData == null || longData.length < newCapacity) {
 long[] newData = new long[newCapacity];
-if (longData != null) System.arraycopy(longData, 0, newData, 0, 
elementsAppended);
+if (longData != null) System.arraycopy(longData, 0, newData, 0, 
capacity);
 longData = newData;
   }
 } else if (type instanceof FloatType) {
   if (floatData == null || floatData.length < newCapacity) {
 float[] newData = new float[newCapacity];
-if (floatData != null) System.arraycopy(floatData, 0, newData, 0, 
elementsAppended);
+if (floatData != null) System.arraycopy(floatData, 0, newData, 0, 
capacity);

spark git commit: [SPARK-20473] Enabling missing types in ColumnVector.Array

2017-04-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 b65858bb3 -> 6709bcf6e


[SPARK-20473] Enabling missing types in ColumnVector.Array

## What changes were proposed in this pull request?
ColumnVector implementations originally did not support some Catalyst types 
(float, short, and boolean). Now that they do, those types should be also added 
to the ColumnVector.Array.

## How was this patch tested?
Tested using existing unit tests.

Author: Michal Szafranski 

Closes #17772 from michal-databricks/spark-20473.

(cherry picked from commit 99c6cf9ef16bf8fae6edb23a62e46546a16bca80)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6709bcf6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6709bcf6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6709bcf6

Branch: refs/heads/branch-2.2
Commit: 6709bcf6e66e99e17ba2a3b1482df2dba1a15716
Parents: b65858b
Author: Michal Szafranski 
Authored: Wed Apr 26 11:21:25 2017 -0700
Committer: Reynold Xin 
Committed: Wed Apr 26 11:21:57 2017 -0700

--
 .../apache/spark/sql/execution/vectorized/ColumnVector.java| 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6709bcf6/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
index 354c878..b105e60 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
@@ -180,7 +180,7 @@ public abstract class ColumnVector implements AutoCloseable 
{
 
 @Override
 public boolean getBoolean(int ordinal) {
-  throw new UnsupportedOperationException();
+  return data.getBoolean(offset + ordinal);
 }
 
 @Override
@@ -188,7 +188,7 @@ public abstract class ColumnVector implements AutoCloseable 
{
 
 @Override
 public short getShort(int ordinal) {
-  throw new UnsupportedOperationException();
+  return data.getShort(offset + ordinal);
 }
 
 @Override
@@ -199,7 +199,7 @@ public abstract class ColumnVector implements AutoCloseable 
{
 
 @Override
 public float getFloat(int ordinal) {
-  throw new UnsupportedOperationException();
+  return data.getFloat(offset + ordinal);
 }
 
 @Override


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20473] Enabling missing types in ColumnVector.Array

2017-04-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 66dd5b83f -> 99c6cf9ef


[SPARK-20473] Enabling missing types in ColumnVector.Array

## What changes were proposed in this pull request?
ColumnVector implementations originally did not support some Catalyst types 
(float, short, and boolean). Now that they do, those types should be also added 
to the ColumnVector.Array.

## How was this patch tested?
Tested using existing unit tests.

Author: Michal Szafranski 

Closes #17772 from michal-databricks/spark-20473.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/99c6cf9e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/99c6cf9e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/99c6cf9e

Branch: refs/heads/master
Commit: 99c6cf9ef16bf8fae6edb23a62e46546a16bca80
Parents: 66dd5b8
Author: Michal Szafranski 
Authored: Wed Apr 26 11:21:25 2017 -0700
Committer: Reynold Xin 
Committed: Wed Apr 26 11:21:25 2017 -0700

--
 .../apache/spark/sql/execution/vectorized/ColumnVector.java| 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/99c6cf9e/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
index 354c878..b105e60 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
@@ -180,7 +180,7 @@ public abstract class ColumnVector implements AutoCloseable 
{
 
 @Override
 public boolean getBoolean(int ordinal) {
-  throw new UnsupportedOperationException();
+  return data.getBoolean(offset + ordinal);
 }
 
 @Override
@@ -188,7 +188,7 @@ public abstract class ColumnVector implements AutoCloseable 
{
 
 @Override
 public short getShort(int ordinal) {
-  throw new UnsupportedOperationException();
+  return data.getShort(offset + ordinal);
 }
 
 @Override
@@ -199,7 +199,7 @@ public abstract class ColumnVector implements AutoCloseable 
{
 
 @Override
 public float getFloat(int ordinal) {
-  throw new UnsupportedOperationException();
+  return data.getFloat(offset + ordinal);
 }
 
 @Override


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20453] Bump master branch version to 2.3.0-SNAPSHOT

2017-04-24 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 5280d93e6 -> f44c8a843


[SPARK-20453] Bump master branch version to 2.3.0-SNAPSHOT

This patch bumps the master branch version to `2.3.0-SNAPSHOT`.

Author: Josh Rosen 

Closes #17753 from JoshRosen/SPARK-20453.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f44c8a84
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f44c8a84
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f44c8a84

Branch: refs/heads/master
Commit: f44c8a843ca512b319f099477415bc13eca2e373
Parents: 5280d93
Author: Josh Rosen 
Authored: Mon Apr 24 21:48:04 2017 -0700
Committer: Reynold Xin 
Committed: Mon Apr 24 21:48:04 2017 -0700

--
 assembly/pom.xml  | 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 project/MimaExcludes.scala| 5 +
 repl/pom.xml  | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 37 files changed, 42 insertions(+), 37 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f44c8a84/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 9d8607d..742a4a1 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0-SNAPSHOT
+2.3.0-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/f44c8a84/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 8657af7..066970f 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0-SNAPSHOT
+2.3.0-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/f44c8a84/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 24c10fb..2de882a 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0-SNAPSHOT
+2.3.0-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/f44c8a84/common/network-yarn/pom.xml
--
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 5e5a80b..a8488d8 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0-SNAPSHOT
+2.3.0-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/f44c8a84/common/sketch/pom.xml
--
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 1356c47..6b81fc2 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@

spark git commit: [SPARK-20420][SQL] Add events to the external catalog

2017-04-21 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 48d760d02 -> e2b3d2367


[SPARK-20420][SQL] Add events to the external catalog

## What changes were proposed in this pull request?
It is often useful to be able to track changes to the `ExternalCatalog`. This 
PR makes the `ExternalCatalog` emit events when a catalog object is changed. 
Events are fired before and after the change.

The following events are fired per object:

- Database
  - CreateDatabasePreEvent: event fired before the database is created.
  - CreateDatabaseEvent: event fired after the database has been created.
  - DropDatabasePreEvent: event fired before the database is dropped.
  - DropDatabaseEvent: event fired after the database has been dropped.
- Table
  - CreateTablePreEvent: event fired before the table is created.
  - CreateTableEvent: event fired after the table has been created.
  - RenameTablePreEvent: event fired before the table is renamed.
  - RenameTableEvent: event fired after the table has been renamed.
  - DropTablePreEvent: event fired before the table is dropped.
  - DropTableEvent: event fired after the table has been dropped.
- Function
  - CreateFunctionPreEvent: event fired before the function is created.
  - CreateFunctionEvent: event fired after the function has been created.
  - RenameFunctionPreEvent: event fired before the function is renamed.
  - RenameFunctionEvent: event fired after the function has been renamed.
  - DropFunctionPreEvent: event fired before the function is dropped.
  - DropFunctionPreEvent: event fired after the function has been dropped.

The current events currently only contain the names of the object modified. We 
add more events, and more details at a later point.

A user can monitor changes to the external catalog by adding a listener to the 
Spark listener bus checking for `ExternalCatalogEvent`s using the 
`SparkListener.onOtherEvent` hook. A more direct approach is add listener 
directly to the `ExternalCatalog`.

## How was this patch tested?
Added the `ExternalCatalogEventSuite`.

Author: Herman van Hovell 

Closes #17710 from hvanhovell/SPARK-20420.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e2b3d236
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e2b3d236
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e2b3d236

Branch: refs/heads/master
Commit: e2b3d2367a563d4600d8d87b5317e71135c362f0
Parents: 48d760d
Author: Herman van Hovell 
Authored: Fri Apr 21 00:05:03 2017 -0700
Committer: Reynold Xin 
Committed: Fri Apr 21 00:05:03 2017 -0700

--
 .../sql/catalyst/catalog/ExternalCatalog.scala  |  85 -
 .../sql/catalyst/catalog/InMemoryCatalog.scala  |  22 ++-
 .../spark/sql/catalyst/catalog/events.scala | 158 
 .../catalog/ExternalCatalogEventSuite.scala | 188 +++
 .../apache/spark/sql/internal/SharedState.scala |   7 +
 .../spark/sql/hive/HiveExternalCatalog.scala|  22 ++-
 6 files changed, 457 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e2b3d236/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
index 08a01e8..974ef90 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.catalog
 import org.apache.spark.sql.catalyst.analysis.{FunctionAlreadyExistsException, 
NoSuchDatabaseException, NoSuchFunctionException, NoSuchTableException}
 import org.apache.spark.sql.catalyst.expressions.Expression
 import org.apache.spark.sql.types.StructType
+import org.apache.spark.util.ListenerBus
 
 /**
  * Interface for the system catalog (of functions, partitions, tables, and 
databases).
@@ -30,7 +31,8 @@ import org.apache.spark.sql.types.StructType
  *
  * Implementations should throw [[NoSuchDatabaseException]] when databases 
don't exist.
  */
-abstract class ExternalCatalog {
+abstract class ExternalCatalog
+  extends ListenerBus[ExternalCatalogEventListener, ExternalCatalogEvent] {
   import CatalogTypes.TablePartitionSpec
 
   protected def requireDbExists(db: String): Unit = {
@@ -61,9 +63,22 @@ abstract class ExternalCatalog {
   // Databases
   // --
 
-  def createDatabase(dbDefinition: CatalogDatabase, ignoreIfExists: Boolean):

spark git commit: [SPARK-20420][SQL] Add events to the external catalog

2017-04-21 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 6cd2f16b1 -> cddb4b7db


[SPARK-20420][SQL] Add events to the external catalog

## What changes were proposed in this pull request?
It is often useful to be able to track changes to the `ExternalCatalog`. This 
PR makes the `ExternalCatalog` emit events when a catalog object is changed. 
Events are fired before and after the change.

The following events are fired per object:

- Database
  - CreateDatabasePreEvent: event fired before the database is created.
  - CreateDatabaseEvent: event fired after the database has been created.
  - DropDatabasePreEvent: event fired before the database is dropped.
  - DropDatabaseEvent: event fired after the database has been dropped.
- Table
  - CreateTablePreEvent: event fired before the table is created.
  - CreateTableEvent: event fired after the table has been created.
  - RenameTablePreEvent: event fired before the table is renamed.
  - RenameTableEvent: event fired after the table has been renamed.
  - DropTablePreEvent: event fired before the table is dropped.
  - DropTableEvent: event fired after the table has been dropped.
- Function
  - CreateFunctionPreEvent: event fired before the function is created.
  - CreateFunctionEvent: event fired after the function has been created.
  - RenameFunctionPreEvent: event fired before the function is renamed.
  - RenameFunctionEvent: event fired after the function has been renamed.
  - DropFunctionPreEvent: event fired before the function is dropped.
  - DropFunctionPreEvent: event fired after the function has been dropped.

The current events currently only contain the names of the object modified. We 
add more events, and more details at a later point.

A user can monitor changes to the external catalog by adding a listener to the 
Spark listener bus checking for `ExternalCatalogEvent`s using the 
`SparkListener.onOtherEvent` hook. A more direct approach is add listener 
directly to the `ExternalCatalog`.

## How was this patch tested?
Added the `ExternalCatalogEventSuite`.

Author: Herman van Hovell 

Closes #17710 from hvanhovell/SPARK-20420.

(cherry picked from commit e2b3d2367a563d4600d8d87b5317e71135c362f0)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cddb4b7d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cddb4b7d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cddb4b7d

Branch: refs/heads/branch-2.2
Commit: cddb4b7db81b01b4abf2ab683aba97e4eabb9769
Parents: 6cd2f16
Author: Herman van Hovell 
Authored: Fri Apr 21 00:05:03 2017 -0700
Committer: Reynold Xin 
Committed: Fri Apr 21 00:05:10 2017 -0700

--
 .../sql/catalyst/catalog/ExternalCatalog.scala  |  85 -
 .../sql/catalyst/catalog/InMemoryCatalog.scala  |  22 ++-
 .../spark/sql/catalyst/catalog/events.scala | 158 
 .../catalog/ExternalCatalogEventSuite.scala | 188 +++
 .../apache/spark/sql/internal/SharedState.scala |   7 +
 .../spark/sql/hive/HiveExternalCatalog.scala|  22 ++-
 6 files changed, 457 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cddb4b7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
index 08a01e8..974ef90 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.catalog
 import org.apache.spark.sql.catalyst.analysis.{FunctionAlreadyExistsException, 
NoSuchDatabaseException, NoSuchFunctionException, NoSuchTableException}
 import org.apache.spark.sql.catalyst.expressions.Expression
 import org.apache.spark.sql.types.StructType
+import org.apache.spark.util.ListenerBus
 
 /**
  * Interface for the system catalog (of functions, partitions, tables, and 
databases).
@@ -30,7 +31,8 @@ import org.apache.spark.sql.types.StructType
  *
  * Implementations should throw [[NoSuchDatabaseException]] when databases 
don't exist.
  */
-abstract class ExternalCatalog {
+abstract class ExternalCatalog
+  extends ListenerBus[ExternalCatalogEventListener, ExternalCatalogEvent] {
   import CatalogTypes.TablePartitionSpec
 
   protected def requireDbExists(db: String): Unit = {
@@ -61,9 +63,22 @@ abstract class ExternalCatalog {
   // Databases
   //

spark git commit: Fixed typos in docs

2017-04-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master dd6d55d5d -> bdc605691


Fixed typos in docs

## What changes were proposed in this pull request?

Typos at a couple of place in the docs.

## How was this patch tested?

build including docs

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: ymahajan 

Closes #17690 from ymahajan/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bdc60569
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bdc60569
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bdc60569

Branch: refs/heads/master
Commit: bdc60569196e9ae4e9086c3e514a406a9e8b23a6
Parents: dd6d55d
Author: ymahajan 
Authored: Wed Apr 19 20:08:31 2017 -0700
Committer: Reynold Xin 
Committed: Wed Apr 19 20:08:31 2017 -0700

--
 docs/sql-programming-guide.md  | 2 +-
 docs/structured-streaming-programming-guide.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bdc60569/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 28942b6..490c1ce 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -571,7 +571,7 @@ be created by calling the `table` method on a 
`SparkSession` with the name of th
 For file-based data source, e.g. text, parquet, json, etc. you can specify a 
custom table path via the
 `path` option, e.g. `df.write.option("path", "/some/path").saveAsTable("t")`. 
When the table is dropped,
 the custom table path will not be removed and the table data is still there. 
If no custom table path is
-specifed, Spark will write data to a default table path under the warehouse 
directory. When the table is
+specified, Spark will write data to a default table path under the warehouse 
directory. When the table is
 dropped, the default table path will be removed too.
 
 Starting from Spark 2.1, persistent datasource tables have per-partition 
metadata stored in the Hive metastore. This brings several benefits:

http://git-wip-us.apache.org/repos/asf/spark/blob/bdc60569/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 3cf7151..5b18cf2 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -778,7 +778,7 @@ windowedCounts = words \
 In this example, we are defining the watermark of the query on the value of 
the column "timestamp", 
 and also defining "10 minutes" as the threshold of how late is the data 
allowed to be. If this query 
 is run in Update output mode (discussed later in [Output Modes](#output-modes) 
section), 
-the engine will keep updating counts of a window in the Resule Table until the 
window is older 
+the engine will keep updating counts of a window in the Result Table until the 
window is older
 than the watermark, which lags behind the current event time in column 
"timestamp" by 10 minutes.
 Here is an illustration. 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Fixed typos in docs

2017-04-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 e6bbdb0c5 -> 8d658b90b


Fixed typos in docs

## What changes were proposed in this pull request?

Typos at a couple of place in the docs.

## How was this patch tested?

build including docs

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: ymahajan 

Closes #17690 from ymahajan/master.

(cherry picked from commit bdc60569196e9ae4e9086c3e514a406a9e8b23a6)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8d658b90
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8d658b90
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8d658b90

Branch: refs/heads/branch-2.2
Commit: 8d658b90b9f08ed4a3a899aad5d3ea77986b7302
Parents: e6bbdb0
Author: ymahajan 
Authored: Wed Apr 19 20:08:31 2017 -0700
Committer: Reynold Xin 
Committed: Wed Apr 19 20:08:37 2017 -0700

--
 docs/sql-programming-guide.md  | 2 +-
 docs/structured-streaming-programming-guide.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8d658b90/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 28942b6..490c1ce 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -571,7 +571,7 @@ be created by calling the `table` method on a 
`SparkSession` with the name of th
 For file-based data source, e.g. text, parquet, json, etc. you can specify a 
custom table path via the
 `path` option, e.g. `df.write.option("path", "/some/path").saveAsTable("t")`. 
When the table is dropped,
 the custom table path will not be removed and the table data is still there. 
If no custom table path is
-specifed, Spark will write data to a default table path under the warehouse 
directory. When the table is
+specified, Spark will write data to a default table path under the warehouse 
directory. When the table is
 dropped, the default table path will be removed too.
 
 Starting from Spark 2.1, persistent datasource tables have per-partition 
metadata stored in the Hive metastore. This brings several benefits:

http://git-wip-us.apache.org/repos/asf/spark/blob/8d658b90/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 3cf7151..5b18cf2 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -778,7 +778,7 @@ windowedCounts = words \
 In this example, we are defining the watermark of the query on the value of 
the column "timestamp", 
 and also defining "10 minutes" as the threshold of how late is the data 
allowed to be. If this query 
 is run in Update output mode (discussed later in [Output Modes](#output-modes) 
section), 
-the engine will keep updating counts of a window in the Resule Table until the 
window is older 
+the engine will keep updating counts of a window in the Result Table until the 
window is older
 than the watermark, which lags behind the current event time in column 
"timestamp" by 10 minutes.
 Here is an illustration. 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20398][SQL] range() operator should include cancellation reason when killed

2017-04-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 af9f18c31 -> e6bbdb0c5


[SPARK-20398][SQL] range() operator should include cancellation reason when 
killed

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-19820 adds a reason field for why 
tasks were killed. However, for backwards compatibility it left the old 
TaskKilledException constructor which defaults to "unknown reason".
The range() operator should use the constructor that fills in the reason rather 
than dropping it on task kill.

## How was this patch tested?

Existing tests, and I tested this manually.

Author: Eric Liang 

Closes #17692 from ericl/fix-kill-reason-in-range.

(cherry picked from commit dd6d55d5de970662eccf024e5eae4e6821373d35)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e6bbdb0c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e6bbdb0c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e6bbdb0c

Branch: refs/heads/branch-2.2
Commit: e6bbdb0c50657190192933f29b92278ea8f37704
Parents: af9f18c
Author: Eric Liang 
Authored: Wed Apr 19 19:53:40 2017 -0700
Committer: Reynold Xin 
Committed: Wed Apr 19 19:54:45 2017 -0700

--
 .../org/apache/spark/sql/execution/basicPhysicalOperators.scala  | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e6bbdb0c/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
index 44278e3..233a105 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
@@ -463,9 +463,7 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
   | $number = $batchEnd;
   |   }
   |
-  |   if ($taskContext.isInterrupted()) {
-  | throw new TaskKilledException();
-  |   }
+  |   $taskContext.killTaskIfInterrupted();
   |
   |   long $nextBatchTodo;
   |   if ($numElementsTodo > ${batchSize}L) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20398][SQL] range() operator should include cancellation reason when killed

2017-04-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 39e303a8b -> dd6d55d5d


[SPARK-20398][SQL] range() operator should include cancellation reason when 
killed

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-19820 adds a reason field for why 
tasks were killed. However, for backwards compatibility it left the old 
TaskKilledException constructor which defaults to "unknown reason".
The range() operator should use the constructor that fills in the reason rather 
than dropping it on task kill.

## How was this patch tested?

Existing tests, and I tested this manually.

Author: Eric Liang 

Closes #17692 from ericl/fix-kill-reason-in-range.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dd6d55d5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dd6d55d5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dd6d55d5

Branch: refs/heads/master
Commit: dd6d55d5de970662eccf024e5eae4e6821373d35
Parents: 39e303a
Author: Eric Liang 
Authored: Wed Apr 19 19:53:40 2017 -0700
Committer: Reynold Xin 
Committed: Wed Apr 19 19:53:40 2017 -0700

--
 .../org/apache/spark/sql/execution/basicPhysicalOperators.scala  | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dd6d55d5/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
index 44278e3..233a105 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
@@ -463,9 +463,7 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
   | $number = $batchEnd;
   |   }
   |
-  |   if ($taskContext.isInterrupted()) {
-  | throw new TaskKilledException();
-  |   }
+  |   $taskContext.killTaskIfInterrupted();
   |
   |   long $nextBatchTodo;
   |   if ($numElementsTodo > ${batchSize}L) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [TEST][MINOR] Replace repartitionBy with distribute in CollapseRepartitionSuite

2017-04-17 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 0075562dd -> 33ea908af


[TEST][MINOR] Replace repartitionBy with distribute in CollapseRepartitionSuite

## What changes were proposed in this pull request?

Replace non-existent `repartitionBy` with `distribute` in 
`CollapseRepartitionSuite`.

## How was this patch tested?

local build and `catalyst/testOnly *CollapseRepartitionSuite`

Author: Jacek Laskowski 

Closes #17657 from jaceklaskowski/CollapseRepartitionSuite.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/33ea908a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/33ea908a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/33ea908a

Branch: refs/heads/master
Commit: 33ea908af94152147e996a6dc8da41ada27d5af3
Parents: 0075562
Author: Jacek Laskowski 
Authored: Mon Apr 17 17:58:10 2017 -0700
Committer: Reynold Xin 
Committed: Mon Apr 17 17:58:10 2017 -0700

--
 .../optimizer/CollapseRepartitionSuite.scala| 21 ++--
 1 file changed, 10 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/33ea908a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala
index 59d2dc4..8cc8dec 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala
@@ -106,8 +106,8 @@ class CollapseRepartitionSuite extends PlanTest {
 comparePlans(optimized2, correctAnswer)
   }
 
-  test("repartitionBy above repartition") {
-// Always respects the top repartitionBy amd removes useless repartition
+  test("distribute above repartition") {
+// Always respects the top distribute and removes useless repartition
 val query1 = testRelation
   .repartition(10)
   .distribute('a)(20)
@@ -123,8 +123,8 @@ class CollapseRepartitionSuite extends PlanTest {
 comparePlans(optimized2, correctAnswer)
   }
 
-  test("repartitionBy above coalesce") {
-// Always respects the top repartitionBy amd removes useless coalesce 
below repartition
+  test("distribute above coalesce") {
+// Always respects the top distribute and removes useless coalesce below 
repartition
 val query1 = testRelation
   .coalesce(10)
   .distribute('a)(20)
@@ -140,8 +140,8 @@ class CollapseRepartitionSuite extends PlanTest {
 comparePlans(optimized2, correctAnswer)
   }
 
-  test("repartition above repartitionBy") {
-// Always respects the top repartition amd removes useless distribute 
below repartition
+  test("repartition above distribute") {
+// Always respects the top repartition and removes useless distribute 
below repartition
 val query1 = testRelation
   .distribute('a)(10)
   .repartition(20)
@@ -155,11 +155,10 @@ class CollapseRepartitionSuite extends PlanTest {
 
 comparePlans(optimized1, correctAnswer)
 comparePlans(optimized2, correctAnswer)
-
   }
 
-  test("coalesce above repartitionBy") {
-// Remove useless coalesce above repartition
+  test("coalesce above distribute") {
+// Remove useless coalesce above distribute
 val query1 = testRelation
   .distribute('a)(10)
   .coalesce(20)
@@ -180,8 +179,8 @@ class CollapseRepartitionSuite extends PlanTest {
 comparePlans(optimized2, correctAnswer2)
   }
 
-  test("collapse two adjacent repartitionBys into one") {
-// Always respects the top repartitionBy
+  test("collapse two adjacent distributes into one") {
+// Always respects the top distribute
 val query1 = testRelation
   .distribute('b)(10)
   .distribute('a)(20)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20349][SQL][REVERT-BRANCH2.1] ListFunctions returns duplicate functions after using persistent functions

2017-04-17 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 622d7a8bf -> 3808b4728


[SPARK-20349][SQL][REVERT-BRANCH2.1] ListFunctions returns duplicate functions 
after using persistent functions

Revert the changes of https://github.com/apache/spark/pull/17646 made in Branch 
2.1, because it breaks the build. It needs the parser interface, but 
SessionCatalog in branch 2.1 does not have it.

### What changes were proposed in this pull request?

The session catalog caches some persistent functions in the `FunctionRegistry`, 
so there can be duplicates. Our Catalog API `listFunctions` does not handle it.

It would be better if `SessionCatalog` API can de-duplciate the records, 
instead of doing it by each API caller. In `FunctionRegistry`, our functions 
are identified by the unquoted string. Thus, this PR is try to parse it using 
our parser interface and then de-duplicate the names.

### How was this patch tested?
Added test cases.

Author: Xiao Li 

Closes #17661 from gatorsmile/compilationFix17646.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3808b472
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3808b472
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3808b472

Branch: refs/heads/branch-2.1
Commit: 3808b472813a2cdf560107787f6971e5202044a8
Parents: 622d7a8
Author: Xiao Li 
Authored: Mon Apr 17 17:57:20 2017 -0700
Committer: Reynold Xin 
Committed: Mon Apr 17 17:57:20 2017 -0700

--
 .../sql/catalyst/catalog/SessionCatalog.scala   | 21 +---
 .../spark/sql/execution/command/functions.scala |  4 +++-
 .../spark/sql/hive/execution/HiveUDFSuite.scala | 17 
 3 files changed, 8 insertions(+), 34 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3808b472/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index 6f302d3..a5cf719 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -20,7 +20,6 @@ package org.apache.spark.sql.catalyst.catalog
 import javax.annotation.concurrent.GuardedBy
 
 import scala.collection.mutable
-import scala.util.{Failure, Success, Try}
 
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
@@ -1099,25 +1098,15 @@ class SessionCatalog(
   def listFunctions(db: String, pattern: String): Seq[(FunctionIdentifier, 
String)] = {
 val dbName = formatDatabaseName(db)
 requireDbExists(dbName)
-val dbFunctions = externalCatalog.listFunctions(dbName, pattern).map { f =>
-  FunctionIdentifier(f, Some(dbName)) }
-val loadedFunctions =
-  StringUtils.filterPattern(functionRegistry.listFunction(), pattern).map 
{ f =>
-// In functionRegistry, function names are stored as an unquoted 
format.
-Try(parser.parseFunctionIdentifier(f)) match {
-  case Success(e) => e
-  case Failure(_) =>
-// The names of some built-in functions are not parsable by our 
parser, e.g., %
-FunctionIdentifier(f)
-}
-  }
+val dbFunctions = externalCatalog.listFunctions(dbName, pattern)
+  .map { f => FunctionIdentifier(f, Some(dbName)) }
+val loadedFunctions = 
StringUtils.filterPattern(functionRegistry.listFunction(), pattern)
+  .map { f => FunctionIdentifier(f) }
 val functions = dbFunctions ++ loadedFunctions
-// The session catalog caches some persistent functions in the 
FunctionRegistry
-// so there can be duplicates.
 functions.map {
   case f if FunctionRegistry.functionSet.contains(f.funcName) => (f, 
"SYSTEM")
   case f => (f, "USER")
-}.distinct
+}
   }
 
 

http://git-wip-us.apache.org/repos/asf/spark/blob/3808b472/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
index 75272d2..ea53987 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
@@ -208,6 +208,8 @@ case class ShowFunctionsCommand(
   case (f, "USER") if showUserFunctions => f.unquotedString
   case (f,

spark git commit: Typo fix: distitrbuted -> distributed

2017-04-17 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master e5fee3e4f -> 0075562dd


Typo fix: distitrbuted -> distributed

## What changes were proposed in this pull request?

Typo fix: distitrbuted -> distributed

## How was this patch tested?

Existing tests

Author: Andrew Ash 

Closes #17664 from ash211/patch-1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0075562d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0075562d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0075562d

Branch: refs/heads/master
Commit: 0075562dd2551a31c35ca26922d6bd73cdb78ea4
Parents: e5fee3e
Author: Andrew Ash 
Authored: Mon Apr 17 17:56:33 2017 -0700
Committer: Reynold Xin 
Committed: Mon Apr 17 17:56:33 2017 -0700

--
 .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0075562d/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
--
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 424bbca..b817570 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -577,7 +577,7 @@ private[spark] class Client(
 ).foreach { case (flist, resType, addToClasspath) =>
   flist.foreach { file =>
 val (_, localizedPath) = distribute(file, resType = resType)
-// If addToClassPath, we ignore adding jar multiple times to 
distitrbuted cache.
+// If addToClassPath, we ignore adding jar multiple times to 
distributed cache.
 if (addToClasspath) {
   if (localizedPath != null) {
 cachedSecondaryJarLinks += localizedPath


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [HOTFIX] Fix compilation.

2017-04-17 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 db9517c16 -> 622d7a8bf


[HOTFIX] Fix compilation.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/622d7a8b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/622d7a8b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/622d7a8b

Branch: refs/heads/branch-2.1
Commit: 622d7a8bf6be22e30db7ff38604ed86b44fcc87e
Parents: db9517c
Author: Reynold Xin 
Authored: Mon Apr 17 12:57:58 2017 -0700
Committer: Reynold Xin 
Committed: Mon Apr 17 12:57:58 2017 -0700

--
 .../apache/spark/sql/catalyst/expressions/regexpExpressions.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/622d7a8b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
index ad12177..0325d0e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
@@ -92,7 +92,8 @@ trait StringRegexExpression extends ImplicitCastInputTypes {
 See also:
   Use RLIKE to match with standard regular expressions.
 """)
-case class Like(left: Expression, right: Expression) extends 
StringRegexExpression {
+case class Like(left: Expression, right: Expression)
+  extends BinaryExpression with StringRegexExpression {
 
   override def escape(v: String): String = StringUtils.escapeLikeRegex(v)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3893 matches

Mail list logo