date:20190827

[spark] branch master updated (8848af2 -> 6252c54)

2019-08-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8848af2  [SPARK-28881][PYTHON][TESTS][FOLLOW-UP] Use 
SparkSession(SparkContext(...)) to prevent for Spark conf to affect other tests
 add 6252c54  [SPARK-23519][SQL] create view should work from query with 
duplicate output columns

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/command/views.scala | 18 +++---
 .../org/apache/spark/sql/execution/SQLViewSuite.scala  | 10 ++
 2 files changed, 21 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (90b10b4 -> 8848af2)

2019-08-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 90b10b4  [HOT-FIX] fix compilation
 add 8848af2  [SPARK-28881][PYTHON][TESTS][FOLLOW-UP] Use 
SparkSession(SparkContext(...)) to prevent for Spark conf to affect other tests

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/test_arrow.py | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r35414 - in /dev/spark/v2.4.4-rc3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark

2019-08-27 Thread dongjoon

Author: dongjoon
Date: Tue Aug 27 22:21:20 2019
New Revision: 35414

Log:
Apache Spark v2.4.4-rc3 docs


[This commit notification would consist of 1479 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r35413 - /dev/spark/v2.4.4-rc3-bin/

2019-08-27 Thread dongjoon

Author: dongjoon
Date: Tue Aug 27 22:01:59 2019
New Revision: 35413

Log:
Apache Spark v2.4.4-rc3

Added:
dev/spark/v2.4.4-rc3-bin/
dev/spark/v2.4.4-rc3-bin/SparkR_2.4.4.tar.gz   (with props)
dev/spark/v2.4.4-rc3-bin/SparkR_2.4.4.tar.gz.asc
dev/spark/v2.4.4-rc3-bin/SparkR_2.4.4.tar.gz.sha512
dev/spark/v2.4.4-rc3-bin/pyspark-2.4.4.tar.gz   (with props)
dev/spark/v2.4.4-rc3-bin/pyspark-2.4.4.tar.gz.asc
dev/spark/v2.4.4-rc3-bin/pyspark-2.4.4.tar.gz.sha512
dev/spark/v2.4.4-rc3-bin/spark-2.4.4-bin-hadoop2.6.tgz   (with props)
dev/spark/v2.4.4-rc3-bin/spark-2.4.4-bin-hadoop2.6.tgz.asc
dev/spark/v2.4.4-rc3-bin/spark-2.4.4-bin-hadoop2.6.tgz.sha512
dev/spark/v2.4.4-rc3-bin/spark-2.4.4-bin-hadoop2.7.tgz   (with props)
dev/spark/v2.4.4-rc3-bin/spark-2.4.4-bin-hadoop2.7.tgz.asc
dev/spark/v2.4.4-rc3-bin/spark-2.4.4-bin-hadoop2.7.tgz.sha512
dev/spark/v2.4.4-rc3-bin/spark-2.4.4-bin-without-hadoop-scala-2.12.tgz   
(with props)
dev/spark/v2.4.4-rc3-bin/spark-2.4.4-bin-without-hadoop-scala-2.12.tgz.asc

dev/spark/v2.4.4-rc3-bin/spark-2.4.4-bin-without-hadoop-scala-2.12.tgz.sha512
dev/spark/v2.4.4-rc3-bin/spark-2.4.4-bin-without-hadoop.tgz   (with props)
dev/spark/v2.4.4-rc3-bin/spark-2.4.4-bin-without-hadoop.tgz.asc
dev/spark/v2.4.4-rc3-bin/spark-2.4.4-bin-without-hadoop.tgz.sha512
dev/spark/v2.4.4-rc3-bin/spark-2.4.4.tgz   (with props)
dev/spark/v2.4.4-rc3-bin/spark-2.4.4.tgz.asc
dev/spark/v2.4.4-rc3-bin/spark-2.4.4.tgz.sha512

Added: dev/spark/v2.4.4-rc3-bin/SparkR_2.4.4.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.4.4-rc3-bin/SparkR_2.4.4.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.4.4-rc3-bin/SparkR_2.4.4.tar.gz.asc
==
--- dev/spark/v2.4.4-rc3-bin/SparkR_2.4.4.tar.gz.asc (added)
+++ dev/spark/v2.4.4-rc3-bin/SparkR_2.4.4.tar.gz.asc Tue Aug 27 22:01:59 2019
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJdZaPnAAoJEO2gDOg08PxcM2IQAIKGpWBzrNMW3ZllP3O22BYb
+8iL52a+UOxFIi9h38FUtCg0mhebfrB0KGheemi8UovQyELCazoDeqX+QsOI7HGZY
+X4ADZuJcCTpxyD9pVNaLs0+p8CPFzttFm5YII3c1mkchADqNGoxOgh80DUGxSJU2
+0OjgCcs6K72Ajff0ki00v8APCVwMo5qH1GDbMlkiVbuoa1bdYCCmuYCDraux+1UE
+/L1A21eH06bd0aU5vdFttJ0q2vy0WwnzlimmvDpnpg07ZCeRTft01Es3304t2wZo
+mRIi3Oh1/GTmUebQOGGRDj0iZJM1uwl7eWctWpM//2gKmkQKdBHfWJHNn4OYPABc
+X2dZ5EeAIHJ2KfCNEGB42XuIiLdnY4HZilZGQDpYSQV/2A5VGWSPIZUmdTQWnaPJ
+KpxK9RhYPB5s0uXu3GIjNJgP6NNG1H6X8pSQhbYoEndlkniApvQeGNwnV5OtZhkF
+C61JR0SfqKW0fHtdTF0b4j4F6h4o4YvplNqoKIifP2L4hXzHV3tdnBqUVKTD3YCO
+QSVuSfcpoQLjd6/0L5eJ2KY0E69K2V8pQSVk16LTYSSHFpbP7dVkHt3yCyXHJ00Q
+6rSf8ggDAQILSz1+QME4xdsZxIPcGU9FZnbFZ5Vy5fsrz83/bSWTuoQ8AvCXswH4
+x2gcPu6NmSV0MBMH+WF0
+=apv3
+-END PGP SIGNATURE-

Added: dev/spark/v2.4.4-rc3-bin/SparkR_2.4.4.tar.gz.sha512
==
--- dev/spark/v2.4.4-rc3-bin/SparkR_2.4.4.tar.gz.sha512 (added)
+++ dev/spark/v2.4.4-rc3-bin/SparkR_2.4.4.tar.gz.sha512 Tue Aug 27 22:01:59 2019
@@ -0,0 +1,3 @@
+SparkR_2.4.4.tar.gz: 77EA90A7 24185E28 6802A7C2 BF4DD425 1A04F1F8 2241E0BD
+ DBAD4075 9EC1295A DDF715DE 994D9E9E B3AFD14D 5954F582
+ ADA25907 9D72B15D 30FA2ED0 03398007

Added: dev/spark/v2.4.4-rc3-bin/pyspark-2.4.4.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.4.4-rc3-bin/pyspark-2.4.4.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.4.4-rc3-bin/pyspark-2.4.4.tar.gz.asc
==
--- dev/spark/v2.4.4-rc3-bin/pyspark-2.4.4.tar.gz.asc (added)
+++ dev/spark/v2.4.4-rc3-bin/pyspark-2.4.4.tar.gz.asc Tue Aug 27 22:01:59 2019
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJdZaD2AAoJEO2gDOg08PxclPcP/186evMfYqEu8snSaTaQoRmA
+wEi4WJLVvF8KQAT7klqK2AYjbOD15VEfyJsA3kIDoXlkz/fEBoo4LnfS5zpxwCXJ
+0EUyZPIUb29vRJpZVLanDOnq5rDRwBc3k6XNm+YWc1x1avw6EAZU7vKNUKJMS9T7
+WYGbxe7G7OQ+GihNjWCuoJ6Y14pHl9MNQDbYA8MAI8aTxB0wpYqi/QOv94FcGqn/
+KKjdOpDniMdhQBvI7f5qaWF0IcmttpeBM+sBRlbwl8h2BKVUItOy9Vxs2ibss6Eu
+7FJVz2BZ+Zmb+ASNSJMZ/Dw0m3j53uwytfSHtLOUlyiP2eAiONPSj/NnacO9O0xH
+i18OUUGpqPdlnxi2mE0rRQVW7CQAYi4lq3UCQOHe9TNFGYzZtdV1pJKsvS8aGzZc
+06G8F/g4J/iJeVDdKS4KL66Qgj89Os9Vd0ND5ZPpzM9b2B9SZAv6MIIs8Jybiv3b
+O3u9wOl487yJOq0ESFEiGeG5F2+9rHNJLjKKUEp5qi8cUfiypILv8AuQyEULvM5d
+SDapZPXLjTa345lb/WmmwdhmcTg+BWmKtasknN0qi+TLv70oI9EG9vnZSqm8q859
+4gNztp7Y9KeRuoWNcmQ/81y99P+nTu66I/yZ2zJHCkOq4KysDLqVhY4gKqz3J8em
+9TNtPnhU+QESROBHXCbG
+=+XVq
+-END PGP SIGNATURE-

Added:

[spark] branch branch-2.4 updated (c4bb486 -> 449f319)

2019-08-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c4bb486  [SPARK-27992][SPARK-28881][PYTHON][2.4] Allow Python to join 
with connection thread to propagate errors
 add 7955b39  Preparing Spark release v2.4.4-rc3
 new 449f319  Preparing development version 2.4.5-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] tag v2.4.4-rc3 created (now 7955b39)

2019-08-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to tag v2.4.4-rc3
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 7955b39  (commit)
This tag includes the following new commits:

 new 7955b39  Preparing Spark release v2.4.4-rc3

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v2.4.4-rc3

2019-08-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to tag v2.4.4-rc3
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 7955b3962ac46b89564e0613db7bea98a1478bf2
Author: Dongjoon Hyun 
AuthorDate: Tue Aug 27 19:51:56 2019 +

Preparing Spark release v2.4.4-rc3
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 57a5d84..d13ec02 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.4.5
+Version: 2.4.4
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 432a388..1dc59fa 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.5-SNAPSHOT
+2.4.4
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 2d3d5e3..a9599a8 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.5-SNAPSHOT
+2.4.4
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index af5a9bd..0b43916 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.5-SNAPSHOT
+2.4.4
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index c48f6e3..2db0e34 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.5-SNAPSHOT
+2.4.4
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 689168d..561e74e 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.5-SNAPSHOT
+2.4.4
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index c6ebb89..d909d4a

[spark] 01/01: Preparing development version 2.4.5-SNAPSHOT

2019-08-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 449f319ad9a822fc7e0751bbbe4e86663e6731d4
Author: Dongjoon Hyun 
AuthorDate: Tue Aug 27 19:52:00 2019 +

Preparing development version 2.4.5-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index d13ec02..57a5d84 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.4.4
+Version: 2.4.5
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 1dc59fa..432a388 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.4
+2.4.5-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index a9599a8..2d3d5e3 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.4
+2.4.5-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 0b43916..af5a9bd 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.4
+2.4.5-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2db0e34..c48f6e3 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.4
+2.4.5-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 561e74e..689168d 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.4
+2.4.5-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index

[spark] branch branch-2.4 updated (0d0686e -> c4bb486)

2019-08-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0d0686e  [SPARK-28642][SQL][TEST][FOLLOW-UP] Test 
spark.sql.redaction.options.regex with and without default values
 add c4bb486  [SPARK-27992][SPARK-28881][PYTHON][2.4] Allow Python to join 
with connection thread to propagate errors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/api/python/PythonRDD.scala| 37 ++
 python/pyspark/sql/dataframe.py| 10 --
 python/pyspark/sql/tests.py| 28 +++-
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  2 +-
 4 files changed, 72 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (2c13dc9 -> 0d0686e)

2019-08-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2c13dc9  [SPARK-28871][MINOR][DOCS] WaterMark doc fix
 add 0d0686e  [SPARK-28642][SQL][TEST][FOLLOW-UP] Test 
spark.sql.redaction.options.regex with and without default values

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/jdbc/JDBCSuite.scala  | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-28871][MINOR][DOCS] WaterMark doc fix

2019-08-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2c13dc9  [SPARK-28871][MINOR][DOCS] WaterMark doc fix
2c13dc9 is described below

commit 2c13dc9526053766473ff1119a093b08f6a08395
Author: cyq89051127 
AuthorDate: Tue Aug 27 08:13:39 2019 -0500

[SPARK-28871][MINOR][DOCS] WaterMark doc fix

### What changes were proposed in this pull request?

The code style in the 'Policy for handling multiple watermarks' in 
structured-streaming-programming-guide.md

### Why are the changes needed?

Making it look friendly  to user.

### Does this PR introduce any user-facing change?
NO

### How was this patch tested?

cd docs
SKIP_API=1 jekyll build

Closes #25580 from cyq89051127/master.

Authored-by: cyq89051127 
Signed-off-by: Sean Owen 
(cherry picked from commit 4cf81285dac0a4e901179a44bfa62cb51e33bee6)
Signed-off-by: Dongjoon Hyun 
---
 docs/structured-streaming-programming-guide.md | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index fa5664d..abff126c 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -1567,10 +1567,18 @@ be tolerated for stateful operations. You specify these 
thresholds using
 ``withWatermarks("eventTime", delay)`` on each of the input streams. For 
example, consider
 a query with stream-stream joins between `inputStream1` and `inputStream2`.
 
-  inputStream1.withWatermark("eventTime1", "1 hour")
-.join(
-  inputStream2.withWatermark("eventTime2", "2 hours"),
-  joinCondition)
+
+
+
+{% highlight scala %}
+inputStream1.withWatermark("eventTime1", "1 hour")
+  .join(
+inputStream2.withWatermark("eventTime2", "2 hours"),
+joinCondition)
+{% endhighlight %}
+
+
+
 
 While executing the query, Structured Streaming individually tracks the maximum
 event time seen in each input stream, calculates watermarks based on the 
corresponding delay,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [HOT-FIX] fix compilation

2019-08-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 90b10b4  [HOT-FIX] fix compilation
90b10b4 is described below

commit 90b10b4f7a54caaf0962424a111fddbd42b107b1
Author: Wenchen Fan 
AuthorDate: Tue Aug 27 23:30:44 2019 +0800

[HOT-FIX] fix compilation

This is caused by 2 PRs that were merged at the same time:

https://github.com/apache/spark/commit/cb06209fc908bac6ce6a8f20653865489773cbc3

https://github.com/apache/spark/commit/2b24a71fec021755f43db99628a56bd4a01518eb

Closes #25597 from cloud-fan/hot-fix.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 .../src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala   | 6 +++---
 .../org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
index 8f6c47c..0d236a4 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
@@ -585,7 +585,7 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
 
   test("Throw exception on unsafe cast with ANSI casting policy") {
 withSQLConf(
-  SQLConf.USE_V1_SOURCE_WRITER_LIST.key -> "parquet",
+  SQLConf.USE_V1_SOURCE_LIST.key -> "parquet",
   SQLConf.STORE_ASSIGNMENT_POLICY.key -> 
SQLConf.StoreAssignmentPolicy.ANSI.toString) {
   withTable("t") {
 sql("create table t(i int, d double) using parquet")
@@ -610,7 +610,7 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
 
   test("Allow on writing any numeric value to numeric type with ANSI policy") {
 withSQLConf(
-  SQLConf.USE_V1_SOURCE_WRITER_LIST.key -> "parquet",
+  SQLConf.USE_V1_SOURCE_LIST.key -> "parquet",
   SQLConf.STORE_ASSIGNMENT_POLICY.key -> 
SQLConf.StoreAssignmentPolicy.ANSI.toString) {
   withTable("t") {
 sql("create table t(i int, d float) using parquet")
@@ -624,7 +624,7 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
 
   test("Allow on writing timestamp value to date type with ANSI policy") {
 withSQLConf(
-  SQLConf.USE_V1_SOURCE_WRITER_LIST.key -> "parquet",
+  SQLConf.USE_V1_SOURCE_LIST.key -> "parquet",
   SQLConf.STORE_ASSIGNMENT_POLICY.key -> 
SQLConf.StoreAssignmentPolicy.ANSI.toString) {
   withTable("t") {
 sql("create table t(i date) using parquet")
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
index 369feb5..b98626a 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
@@ -329,7 +329,7 @@ class DataFrameReaderWriterSuite extends QueryTest with 
SharedSparkSession with
 
   test("Throw exception on unsafe cast with ANSI casting policy") {
 withSQLConf(
-  SQLConf.USE_V1_SOURCE_WRITER_LIST.key -> "parquet",
+  SQLConf.USE_V1_SOURCE_LIST.key -> "parquet",
   SQLConf.STORE_ASSIGNMENT_POLICY.key -> 
SQLConf.StoreAssignmentPolicy.ANSI.toString) {
   withTable("t") {
 sql("create table t(i int, d double) using parquet")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2b24a71  [SPARK-28495][SQL] Introduce ANSI store assignment policy for 
table insertion
2b24a71 is described below

commit 2b24a71fec021755f43db99628a56bd4a01518eb
Author: Gengliang Wang 
AuthorDate: Tue Aug 27 22:13:23 2019 +0800

[SPARK-28495][SQL] Introduce ANSI store assignment policy for table 
insertion

### What changes were proposed in this pull request?
 Introduce ANSI store assignment policy for table insertion.
With ANSI policy, Spark performs the type coercion of table insertion as 
per ANSI SQL.

### Why are the changes needed?
In Spark version 2.4 and earlier, when inserting into a table, Spark will 
cast the data type of input query to the data type of target table by coercion. 
This can be super confusing, e.g. users make a mistake and write string values 
to an int column.

In data source V2, by default, only upcasting is allowed when inserting 
data into a table. E.g. int -> long and int -> string are allowed, while 
decimal -> double or long -> int are not allowed. The rules of UpCast was 
originally created for Dataset type coercion. They are quite strict and 
different from the behavior of all existing popular DBMS. This is breaking 
change. It is possible that existing queries are broken after 3.0 releases.

Following ANSI SQL standard makes Spark consistent with the table insertion 
behaviors of popular DBMS like PostgreSQL/Oracle/Mysql.

### Does this PR introduce any user-facing change?
A new optional mode for table insertion.

### How was this patch tested?
Unit test

Closes #25581 from gengliangwang/ANSImode.

Authored-by: Gengliang Wang 
Signed-off-by: Wenchen Fan 
---
 .../catalyst/analysis/TableOutputResolver.scala|   5 +-
 .../spark/sql/catalyst/expressions/Cast.scala  |  30 
 .../org/apache/spark/sql/internal/SQLConf.scala|   7 +-
 .../org/apache/spark/sql/types/DataType.scala  |  25 ++-
 .../analysis/DataSourceV2AnalysisSuite.scala   | 172 ---
 .../types/DataTypeWriteCompatibilitySuite.scala| 189 ++---
 .../org/apache/spark/sql/sources/InsertSuite.scala |  52 ++
 .../sql/test/DataFrameReaderWriterSuite.scala  |  22 +++
 8 files changed, 367 insertions(+), 135 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
index f0991f1..6769773 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
@@ -108,10 +108,11 @@ object TableOutputResolver {
   case StoreAssignmentPolicy.LEGACY =>
 outputField
 
-  case StoreAssignmentPolicy.STRICT =>
+  case StoreAssignmentPolicy.STRICT | StoreAssignmentPolicy.ANSI =>
 // run the type check first to ensure type errors are present
 val canWrite = DataType.canWrite(
-  queryExpr.dataType, tableAttr.dataType, byName, conf.resolver, 
tableAttr.name, addError)
+  queryExpr.dataType, tableAttr.dataType, byName, conf.resolver, 
tableAttr.name,
+  storeAssignmentPolicy, addError)
 if (queryExpr.nullable && !tableAttr.nullable) {
   addError(s"Cannot write nullable values to non-null column 
'${tableAttr.name}'")
   None
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
index baabf19..452f084 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@@ -158,6 +158,36 @@ object Cast {
 case _ => false
   }
 
+  def canANSIStoreAssign(from: DataType, to: DataType): Boolean = (from, to) 
match {
+case _ if from == to => true
+case (_: NumericType, _: NumericType) => true
+case (_: AtomicType, StringType) => true
+case (_: CalendarIntervalType, StringType) => true
+case (DateType, TimestampType) => true
+case (TimestampType, DateType) => true
+// Spark supports casting between long and timestamp, please see 
`longToTimestamp` and
+// `timestampToLong` for details.
+case (TimestampType, LongType) => true
+case (LongType, TimestampType) => true
+
+case (ArrayType(fromType, fn), ArrayType(toType, tn)) =>
+  resolvableNullability(fn, tn) && canANSIStoreAssign(fromType, toType)
+
+case (MapType(fromKey, fromValue, fn), MapType(toKey, toValue, tn)) =>
+

[spark] branch master updated: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-27 Thread tgraves

This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 70f4bbc  [SPARK-28414][WEBUI] UI updates to show resource info in 
Standalone
70f4bbc is described below

commit 70f4bbccc511c60266511c752a1fa37b20be3f8d
Author: wuyi 
AuthorDate: Tue Aug 27 08:59:29 2019 -0500

[SPARK-28414][WEBUI] UI updates to show resource info in Standalone

## What changes were proposed in this pull request?

Since SPARK-27371 has supported GPU-aware resource scheduling in 
Standalone, this PR adds resources info in Standalone UI.

## How was this patch tested?

Updated `JsonProtocolSuite` and tested manually.

Master page:


![masterpage](https://user-images.githubusercontent.com/16397174/62835958-b933c100-bc90-11e9-814f-22bae048303d.png)

Worker page


![workerpage](https://user-images.githubusercontent.com/16397174/63417947-d2790200-c434-11e9-8979-36b8f558afd3.png)

Application page


![applicationpage](https://user-images.githubusercontent.com/16397174/62835964-cbadfa80-bc90-11e9-99a2-26e05421619a.png)

Closes #25409 from Ngone51/SPARK-28414.

Authored-by: wuyi 
Signed-off-by: Thomas Graves 
---
 .../org/apache/spark/deploy/DeployMessage.scala|  4 +-
 .../org/apache/spark/deploy/JsonProtocol.scala | 39 +++-
 .../spark/deploy/StandaloneResourceUtils.scala | 74 ++
 .../org/apache/spark/deploy/master/Master.scala|  4 +-
 .../apache/spark/deploy/master/WorkerInfo.scala| 29 +++--
 .../spark/deploy/master/ui/ApplicationPage.scala   |  9 ++-
 .../apache/spark/deploy/master/ui/MasterPage.scala | 41 ++--
 .../org/apache/spark/deploy/worker/Worker.scala| 25 +++-
 .../apache/spark/deploy/worker/ui/WorkerPage.scala | 28 +++-
 .../apache/spark/resource/ResourceAllocator.scala  |  1 -
 .../spark/resource/ResourceInformation.scala   |  2 +
 .../org/apache/spark/deploy/DeployTestUtils.scala  | 40 ++--
 .../apache/spark/deploy/JsonProtocolSuite.scala| 37 +--
 13 files changed, 298 insertions(+), 35 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/DeployMessage.scala 
b/core/src/main/scala/org/apache/spark/deploy/DeployMessage.scala
index 3f1d1ae..fba371d 100644
--- a/core/src/main/scala/org/apache/spark/deploy/DeployMessage.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/DeployMessage.scala
@@ -238,7 +238,9 @@ private[deploy] object DeployMessages {
   case class WorkerStateResponse(host: String, port: Int, workerId: String,
 executors: List[ExecutorRunner], finishedExecutors: List[ExecutorRunner],
 drivers: List[DriverRunner], finishedDrivers: List[DriverRunner], 
masterUrl: String,
-cores: Int, memory: Int, coresUsed: Int, memoryUsed: Int, masterWebUiUrl: 
String) {
+cores: Int, memory: Int, coresUsed: Int, memoryUsed: Int, masterWebUiUrl: 
String,
+resources: Map[String, ResourceInformation] = Map.empty,
+resourcesUsed: Map[String, ResourceInformation] = Map.empty) {
 
 Utils.checkHost(host)
 assert (port > 0)
diff --git a/core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala 
b/core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala
index 7212696..6c3276c 100644
--- a/core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala
@@ -17,15 +17,29 @@
 
 package org.apache.spark.deploy
 
-import org.json4s.JsonAST.JObject
+import org.json4s.JsonAST._
 import org.json4s.JsonDSL._
 
 import org.apache.spark.deploy.DeployMessages.{MasterStateResponse, 
WorkerStateResponse}
 import org.apache.spark.deploy.master._
 import org.apache.spark.deploy.master.RecoveryState.MasterState
 import org.apache.spark.deploy.worker.ExecutorRunner
+import org.apache.spark.resource.{ResourceInformation, ResourceRequirement}
 
 private[deploy] object JsonProtocol {
+
+  private def writeResourcesInfo(info: Map[String, ResourceInformation]): 
JObject = {
+val jsonFields = info.map {
+  case (k, v) => JField(k, v.toJson)
+}
+JObject(jsonFields.toList)
+  }
+
+  private def writeResourceRequirement(req: ResourceRequirement): JObject = {
+("name" -> req.resourceName) ~
+("amount" -> req.amount)
+  }
+
   /**
* Export the [[WorkerInfo]] to a Json object. A [[WorkerInfo]] consists of 
the information of a
* worker.
@@ -41,6 +55,9 @@ private[deploy] object JsonProtocol {
* `memory` total memory of the worker
* `memoryused` allocated memory of the worker
* `memoryfree` free memory of the worker
+   * `resources` total resources of the worker
+   * `resourcesused` allocated resources of the worker
+   * `resourcesfree` free resources of the worker

[spark] branch master updated: [SPARK-11215][ML][FOLLOWUP] update the examples and suites using new api

2019-08-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7fe7506  [SPARK-11215][ML][FOLLOWUP] update the examples and suites 
using new api
7fe7506 is described below

commit 7fe750674e85d596aa7e788bf207c3b2073ea7cf
Author: zhengruifeng 
AuthorDate: Tue Aug 27 08:58:32 2019 -0500

[SPARK-11215][ML][FOLLOWUP] update the examples and suites using new api

## What changes were proposed in this pull request?
since method `labels` is already deprecated, we should update the examples 
and suites to turn off warings when compiling spark:
```
[warn] 
/Users/zrf/Dev/OpenSource/spark/examples/src/main/scala/org/apache/spark/examples/ml/DecisionTreeClassificationExample.scala:65:
 method labels in class StringIndexerModel is deprecated (since 3.0.0): 
`labels` is deprecated and will be removed in 3.1.0. Use `labelsArray` instead.
[warn]   .setLabels(labelIndexer.labels)
[warn]   ^
[warn] 
/Users/zrf/Dev/OpenSource/spark/examples/src/main/scala/org/apache/spark/examples/ml/GradientBoostedTreeClassifierExample.scala:68:
 method labels in class StringIndexerModel is deprecated (since 3.0.0): 
`labels` is deprecated and will be removed in 3.1.0. Use `labelsArray` instead.
[warn]   .setLabels(labelIndexer.labels)
[warn]   ^
```

## How was this patch tested?
existing suites

Closes #25428 from zhengruifeng/del_stringindexer_labels_usage.

Authored-by: zhengruifeng 
Signed-off-by: Sean Owen 
---
 .../spark/examples/ml/JavaDecisionTreeClassificationExample.java  | 2 +-
 .../spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java   | 2 +-
 .../apache/spark/examples/ml/JavaRandomForestClassifierExample.java   | 2 +-
 .../apache/spark/examples/ml/DecisionTreeClassificationExample.scala  | 2 +-
 .../spark/examples/ml/GradientBoostedTreeClassifierExample.scala  | 2 +-
 .../org/apache/spark/examples/ml/RandomForestClassifierExample.scala  | 2 +-
 .../test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala   | 4 ++--
 7 files changed, 8 insertions(+), 8 deletions(-)

diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
index a9c6e7f..f7e1144 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java
@@ -71,7 +71,7 @@ public class JavaDecisionTreeClassificationExample {
 IndexToString labelConverter = new IndexToString()
   .setInputCol("prediction")
   .setOutputCol("predictedLabel")
-  .setLabels(labelIndexer.labels());
+  .setLabels(labelIndexer.labelsArray()[0]);
 
 // Chain indexers and tree in a Pipeline.
 Pipeline pipeline = new Pipeline()
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
index 3e9eb99..807027a 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaGradientBoostedTreeClassifierExample.java
@@ -73,7 +73,7 @@ public class JavaGradientBoostedTreeClassifierExample {
 IndexToString labelConverter = new IndexToString()
   .setInputCol("prediction")
   .setOutputCol("predictedLabel")
-  .setLabels(labelIndexer.labels());
+  .setLabels(labelIndexer.labelsArray()[0]);
 
 // Chain indexers and GBT in a Pipeline.
 Pipeline pipeline = new Pipeline()
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaRandomForestClassifierExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaRandomForestClassifierExample.java
index da2633e..8863f18 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaRandomForestClassifierExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaRandomForestClassifierExample.java
@@ -69,7 +69,7 @@ public class JavaRandomForestClassifierExample {
 IndexToString labelConverter = new IndexToString()
   .setInputCol("prediction")
   .setOutputCol("predictedLabel")
-  .setLabels(labelIndexer.labels());
+  .setLabels(labelIndexer.labelsArray()[0]);
 
 // Chain indexers and forest in a Pipeline
 Pipeline pipeline = new Pipeline()
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/ml/DecisionTreeClassificationExample.scala

[spark] branch master updated: [SPARK-28621][SQL] Make spark.sql.crossJoin.enabled default value true

2019-08-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7f605f5  [SPARK-28621][SQL] Make spark.sql.crossJoin.enabled default 
value true
7f605f5 is described below

commit 7f605f5559a6508acfa90ca4f3875c430f585770
Author: WeichenXu 
AuthorDate: Tue Aug 27 21:53:37 2019 +0800

[SPARK-28621][SQL] Make spark.sql.crossJoin.enabled default value true

### What changes were proposed in this pull request?

Make `spark.sql.crossJoin.enabled` default value true

### Why are the changes needed?

For implicit cross join, we can set up a watchdog to cancel it if running 
for a long time.
When "spark.sql.crossJoin.enabled" is false, because 
`CheckCartesianProducts` is implemented in logical plan stage, it may generate 
some mismatching error which may confuse end user:
* it's done in logical phase, so we may fail queries that can be executed 
via broadcast join, which is very fast.
* if we move the check to the physical phase, then a query may success at 
the beginning, and begin to fail when the table size gets larger (other people 
insert data to the table). This can be quite confusing.
* the CROSS JOIN syntax doesn't work well if join reorder happens.
* some non-equi-join will generate plan using cartesian product, but 
`CheckCartesianProducts` do not detect it and raise error.

So that in order to address this in simpler way, we can turn off showing 
this cross-join error by default.

For reference, I list some cases raising mismatching error here:
Providing:
```
spark.range(2).createOrReplaceTempView("sm1") // can be broadcast
spark.range(5000).createOrReplaceTempView("bg1") // cannot be broadcast
spark.range(6000).createOrReplaceTempView("bg2") // cannot be broadcast
```
1) Some join could be convert to broadcast nested loop join, but 
CheckCartesianProducts raise error. e.g.
```
select sm1.id, bg1.id from bg1 join sm1 where sm1.id < bg1.id
```
2) Some join will run by CartesianJoin but CheckCartesianProducts DO NOT 
raise error. e.g.
```
select bg1.id, bg2.id from bg1 join bg2 where bg1.id < bg2.id
```

### Does this PR introduce any user-facing change?

### How was this patch tested?

Closes #25520 from WeichenXu123/SPARK-28621.

Authored-by: WeichenXu 
Signed-off-by: Wenchen Fan 
---
 R/pkg/tests/fulltests/test_sparkSQL.R  | 18 ++
 docs/sql-migration-guide-upgrade.md|  2 ++
 python/pyspark/sql/tests/test_dataframe.py |  9 +
 python/pyspark/sql/tests/test_udf.py   |  5 +++--
 .../scala/org/apache/spark/sql/internal/SQLConf.scala  |  3 ++-
 .../ExtractPythonUDFFromJoinConditionSuite.scala   |  8 +---
 .../test/scala/org/apache/spark/sql/JoinSuite.scala|  6 --
 7 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index fdc7474..035525a 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -2352,10 +2352,20 @@ test_that("join(), crossJoin() and merge() on a 
DataFrame", {
 
   # inner join, not cartesian join
   expect_equal(count(where(join(df, df2), df$name == df2$name)), 3)
-  # cartesian join
-  expect_error(tryCatch(count(join(df, df2)), error = function(e) { stop(e) }),
-   paste0(".*(org.apache.spark.sql.AnalysisException: Detected 
implicit cartesian",
-  " product for INNER join between logical plans).*"))
+
+  conf <- callJMethod(sparkSession, "conf")
+  crossJoinEnabled <- callJMethod(conf, "get", "spark.sql.crossJoin.enabled")
+  callJMethod(conf, "set", "spark.sql.crossJoin.enabled", "false")
+  tryCatch({
+# cartesian join
+expect_error(tryCatch(count(join(df, df2)), error = function(e) { stop(e) 
}),
+ paste0(".*(org.apache.spark.sql.AnalysisException: Detected 
implicit cartesian",
+" product for INNER join between logical plans).*"))
+  },
+  finally = {
+# Resetting the conf back to default value
+callJMethod(conf, "set", "spark.sql.crossJoin.enabled", crossJoinEnabled)
+  })
 
   joined <- crossJoin(df, df2)
   expect_equal(names(joined), c("age", "name", "name", "test"))
diff --git a/docs/sql-migration-guide-upgrade.md 
b/docs/sql-migration-guide-upgrade.md
index a643a84..cc3ef1e 100644
--- a/docs/sql-migration-guide-upgrade.md
+++ b/docs/sql-migration-guide-upgrade.md
@@ -23,6 +23,8 @@ license: |
 {:toc}
 
 ## Upgrading From Spark SQL 2.4 to 3.0
+  - Since Spark 3.0, configuration `spark.sql.crossJoin.enabled` become 
internal configuration, and is true by default, so by default spark won't raise 
exception on

[spark] branch master updated (4cf8128 -> e12da8b)

2019-08-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4cf8128  [SPARK-28871][MINOR][DOCS] WaterMark doc fix
 add e12da8b  [SPARK-28876][SQL] fallBackToHdfs should not support Hive 
partitioned table

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/hive/HiveStrategies.scala | 16 +++-
 .../apache/spark/sql/hive/StatisticsSuite.scala| 30 ++
 2 files changed, 39 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28871][MINOR][DOCS] WaterMark doc fix

2019-08-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4cf8128  [SPARK-28871][MINOR][DOCS] WaterMark doc fix
4cf8128 is described below

commit 4cf81285dac0a4e901179a44bfa62cb51e33bee6
Author: cyq89051127 
AuthorDate: Tue Aug 27 08:13:39 2019 -0500

[SPARK-28871][MINOR][DOCS] WaterMark doc fix

### What changes were proposed in this pull request?

The code style in the 'Policy for handling multiple watermarks' in 
structured-streaming-programming-guide.md

### Why are the changes needed?

Making it look friendly  to user.

### Does this PR introduce any user-facing change?
NO

### How was this patch tested?

cd docs
SKIP_API=1 jekyll build

Closes #25580 from cyq89051127/master.

Authored-by: cyq89051127 
Signed-off-by: Sean Owen 
---
 docs/structured-streaming-programming-guide.md | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index b0d3e16..deaf262 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -1583,10 +1583,18 @@ be tolerated for stateful operations. You specify these 
thresholds using
 ``withWatermarks("eventTime", delay)`` on each of the input streams. For 
example, consider
 a query with stream-stream joins between `inputStream1` and `inputStream2`.
 
-  inputStream1.withWatermark("eventTime1", "1 hour")
-.join(
-  inputStream2.withWatermark("eventTime2", "2 hours"),
-  joinCondition)
+
+
+
+{% highlight scala %}
+inputStream1.withWatermark("eventTime1", "1 hour")
+  .join(
+inputStream2.withWatermark("eventTime2", "2 hours"),
+joinCondition)
+{% endhighlight %}
+
+
+
 
 While executing the query, Structured Streaming individually tracks the maximum
 event time seen in each input stream, calculates watermarks based on the 
corresponding delay,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cb06209 -> 9617973)

2019-08-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cb06209  [SPARK-28747][SQL] merge the two data source v2 fallback 
configs
 add 9617973  [SPARK-27592][SQL][TEST][FOLLOW-UP] Test set the partitioned 
bucketed data source table SerDe correctly

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/hive/HiveMetastoreCatalogSuite.scala | 38 ++
 1 file changed, 38 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c02c86e -> cb06209)

2019-08-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c02c86e  [SPARK-28691][EXAMPLES] Add Java/Scala 
DirectKerberizedKafkaWordCount examples
 add cb06209  [SPARK-28747][SQL] merge the two data source v2 fallback 
configs

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/avro/AvroLogicalTypeSuite.scala  |  6 +--
 .../org/apache/spark/sql/avro/AvroSuite.scala  |  6 +--
 .../kafka010/KafkaDontFailOnDataLossSuite.scala|  4 +-
 .../spark/sql/kafka010/KafkaRelationSuite.scala|  6 +--
 .../apache/spark/sql/kafka010/KafkaSinkSuite.scala |  4 +-
 .../org/apache/spark/sql/internal/SQLConf.scala| 20 ++-
 .../org/apache/spark/sql/DataFrameReader.scala | 16 +-
 .../org/apache/spark/sql/DataFrameWriter.scala | 54 ++-
 .../sql/execution/datasources/DataSource.scala | 19 +++
 .../datasources/DataSourceResolution.scala | 32 
 .../org/apache/spark/sql/DataFrameSuite.scala  |  2 +-
 .../spark/sql/FileBasedDataSourceSuite.scala   | 17 +++---
 .../org/apache/spark/sql/MetadataCacheSuite.scala  |  4 +-
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  4 +-
 .../DataSourceScanExecRedactionSuite.scala |  4 +-
 .../execution/OptimizeMetadataOnlyQuerySuite.scala |  2 +-
 .../apache/spark/sql/execution/PlannerSuite.scala  |  2 +-
 .../spark/sql/execution/SameResultSuite.scala  |  6 +--
 .../spark/sql/execution/SparkPlanSuite.scala   |  6 +--
 .../OrcNestedSchemaPruningBenchmark.scala  |  3 +-
 .../columnar/InMemoryColumnarQuerySuite.scala  |  2 +-
 .../execution/command/PlanResolutionSuite.scala| 12 ++---
 .../datasources/FileSourceStrategySuite.scala  |  2 +-
 .../sql/execution/datasources/csv/CSVSuite.scala   |  2 +-
 .../orc/OrcPartitionDiscoverySuite.scala   |  3 +-
 .../execution/datasources/orc/OrcQuerySuite.scala  |  3 +-
 .../datasources/orc/OrcV1FilterSuite.scala |  3 +-
 .../datasources/orc/OrcV1SchemaPruningSuite.scala  |  3 +-
 .../datasources/orc/OrcV2SchemaPruningSuite.scala  |  2 +-
 .../datasources/parquet/ParquetFilterSuite.scala   |  5 +-
 .../parquet/ParquetPartitionDiscoverySuite.scala   |  5 +-
 .../datasources/parquet/ParquetQuerySuite.scala|  5 +-
 .../parquet/ParquetSchemaPruningSuite.scala|  5 +-
 .../sql/execution/metric/SQLMetricsSuite.scala |  2 +-
 .../execution/python/ExtractPythonUDFsSuite.scala  |  4 +-
 .../org/apache/spark/sql/sources/InsertSuite.scala |  2 +-
 .../sql/sources/v2/DataSourceV2SQLSuite.scala  | 37 +++--
 .../sources/v2/FileDataSourceV2FallBackSuite.scala | 61 ++
 .../spark/sql/streaming/FileStreamSinkSuite.scala  |  6 +--
 .../apache/spark/sql/streaming/StreamSuite.scala   |  4 +-
 .../sql/test/DataFrameReaderWriterSuite.scala  |  2 +-
 .../spark/sql/hive/execution/SQLQuerySuite.scala   |  2 +-
 .../spark/sql/sources/HadoopFsRelationTest.scala   |  2 +-
 43 files changed, 159 insertions(+), 232 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (00cb2f9 -> c02c86e)

2019-08-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 00cb2f9  [SPARK-28881][PYTHON][TESTS] Add a test to make sure toPandas 
with Arrow optimization throws an exception per maxResultSize
 add c02c86e  [SPARK-28691][EXAMPLES] Add Java/Scala 
DirectKerberizedKafkaWordCount examples

No new revisions were added by this update.

Summary of changes:
 ...ava => JavaDirectKerberizedKafkaWordCount.java} | 66 ++
 scala => DirectKerberizedKafkaWordCount.scala} | 54 +++---
 2 files changed, 101 insertions(+), 19 deletions(-)
 copy 
examples/src/main/java/org/apache/spark/examples/streaming/{JavaDirectKafkaWordCount.java
 => JavaDirectKerberizedKafkaWordCount.java} (54%)
 copy 
examples/src/main/scala/org/apache/spark/examples/streaming/{DirectKafkaWordCount.scala
 => DirectKerberizedKafkaWordCount.scala} (54%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28881][PYTHON][TESTS] Add a test to make sure toPandas with Arrow optimization throws an exception per maxResultSize

2019-08-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 00cb2f9  [SPARK-28881][PYTHON][TESTS] Add a test to make sure toPandas 
with Arrow optimization throws an exception per maxResultSize
00cb2f9 is described below

commit 00cb2f99ccbd7c0fdba19ba63c4ec73ca97dea66
Author: HyukjinKwon 
AuthorDate: Tue Aug 27 17:30:06 2019 +0900

[SPARK-28881][PYTHON][TESTS] Add a test to make sure toPandas with Arrow 
optimization throws an exception per maxResultSize

### What changes were proposed in this pull request?
This PR proposes to add a test case for:

```bash
./bin/pyspark --conf spark.driver.maxResultSize=1m
spark.conf.set("spark.sql.execution.arrow.enabled",True)
```

```python
spark.range(1000).toPandas()
```

```
Empty DataFrame
Columns: [id]
Index: []
```

which can result in partial results (see 
https://github.com/apache/spark/pull/25593#issuecomment-525153808). This 
regression was found between Spark 2.3 and Spark 2.4, and accidentally fixed.

### Why are the changes needed?
To prevent the same regression in the future.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Test was added.

Closes #25594 from HyukjinKwon/SPARK-28881.

Authored-by: HyukjinKwon 
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/tests/test_arrow.py | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/tests/test_arrow.py 
b/python/pyspark/sql/tests/test_arrow.py
index f533083..50c82b0 100644
--- a/python/pyspark/sql/tests/test_arrow.py
+++ b/python/pyspark/sql/tests/test_arrow.py
@@ -22,7 +22,7 @@ import time
 import unittest
 import warnings
 
-from pyspark.sql import Row
+from pyspark.sql import Row, SparkSession
 from pyspark.sql.functions import udf
 from pyspark.sql.types import *
 from pyspark.testing.sqlutils import ReusedSQLTestCase, have_pandas, 
have_pyarrow, \
@@ -421,6 +421,35 @@ class ArrowTests(ReusedSQLTestCase):
 run_test(*case)
 
 
+@unittest.skipIf(
+not have_pandas or not have_pyarrow,
+pandas_requirement_message or pyarrow_requirement_message)
+class MaxResultArrowTests(unittest.TestCase):
+# These tests are separate as 'spark.driver.maxResultSize' configuration
+# is a static configuration to Spark context.
+
+@classmethod
+def setUpClass(cls):
+cls.spark = SparkSession.builder \
+.master("local[4]") \
+.appName(cls.__name__) \
+.config("spark.driver.maxResultSize", "10k") \
+.getOrCreate()
+
+# Explicitly enable Arrow and disable fallback.
+cls.spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
+
cls.spark.conf.set("spark.sql.execution.arrow.pyspark.fallback.enabled", 
"false")
+
+@classmethod
+def tearDownClass(cls):
+if hasattr(cls, "spark"):
+cls.spark.stop()
+
+def test_exception_by_max_results(self):
+with self.assertRaisesRegexp(Exception, "is bigger than"):
+self.spark.range(0, 1, 1, 100).toPandas()
+
+
 class EncryptionArrowTests(ArrowTests):
 
 @classmethod


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7701d29 -> ab1819d)

2019-08-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7701d29  [SPARK-28877][PYSPARK][test-hadoop3.2][test-java11] Make 
jaxb-runtime compile-time dependency
 add ab1819d  [SPARK-28527][SQL][TEST][FOLLOW-UP] Ignores Thrift server 
ThriftServerQueryTestSuite

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e31aec9 -> 7701d29)

2019-08-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e31aec9  [SPARK-28667][SQL] Support InsertInto through the 
V2SessionCatalog
 add 7701d29  [SPARK-28877][PYSPARK][test-hadoop3.2][test-java11] Make 
jaxb-runtime compile-time dependency

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8848af2 -> 6252c54)

[spark] branch master updated (90b10b4 -> 8848af2)

svn commit: r35414 - in /dev/spark/v2.4.4-rc3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark

svn commit: r35413 - /dev/spark/v2.4.4-rc3-bin/

[spark] branch branch-2.4 updated (c4bb486 -> 449f319)

[spark] tag v2.4.4-rc3 created (now 7955b39)

[spark] 01/01: Preparing Spark release v2.4.4-rc3

[spark] 01/01: Preparing development version 2.4.5-SNAPSHOT

[spark] branch branch-2.4 updated (0d0686e -> c4bb486)

[spark] branch branch-2.4 updated (2c13dc9 -> 0d0686e)

[spark] branch branch-2.4 updated: [SPARK-28871][MINOR][DOCS] WaterMark doc fix

[spark] branch master updated: [HOT-FIX] fix compilation

[spark] branch master updated: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

[spark] branch master updated: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

[spark] branch master updated: [SPARK-11215][ML][FOLLOWUP] update the examples and suites using new api

[spark] branch master updated: [SPARK-28621][SQL] Make spark.sql.crossJoin.enabled default value true

[spark] branch master updated (4cf8128 -> e12da8b)

[spark] branch master updated: [SPARK-28871][MINOR][DOCS] WaterMark doc fix

[spark] branch master updated (cb06209 -> 9617973)

[spark] branch master updated (c02c86e -> cb06209)

[spark] branch master updated (00cb2f9 -> c02c86e)

[spark] branch master updated: [SPARK-28881][PYTHON][TESTS] Add a test to make sure toPandas with Arrow optimization throws an exception per maxResultSize

[spark] branch master updated (7701d29 -> ab1819d)

[spark] branch master updated (e31aec9 -> 7701d29)

24 matches

Site Navigation

Mail list logo

Footer information