date:20200527

[spark] branch master updated: [SPARK-25351][PYTHON][TEST][FOLLOWUP] Fix test assertions to be consistent

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8bbb666  [SPARK-25351][PYTHON][TEST][FOLLOWUP] Fix test assertions to 
be consistent
8bbb666 is described below

commit 8bbb22e042c1533da294ac7b504b6aaa694a
Author: Bryan Cutler 
AuthorDate: Thu May 28 10:27:15 2020 +0900

[SPARK-25351][PYTHON][TEST][FOLLOWUP] Fix test assertions to be consistent

### What changes were proposed in this pull request?
Followup to make assertions from recent test consistent with the rest of 
the module

### Why are the changes needed?

Better to use assertions from `unittest` and be consistent

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

Closes #28659 from BryanCutler/arrow-category-test-fix-SPARK-25351.

Authored-by: Bryan Cutler 
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/tests/test_arrow.py | 9 +
 python/pyspark/sql/tests/test_pandas_udf_scalar.py | 9 +++--
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/python/pyspark/sql/tests/test_arrow.py 
b/python/pyspark/sql/tests/test_arrow.py
index c3c9fb0..c59765d 100644
--- a/python/pyspark/sql/tests/test_arrow.py
+++ b/python/pyspark/sql/tests/test_arrow.py
@@ -435,11 +435,12 @@ class ArrowTests(ReusedSQLTestCase):
 assert_frame_equal(result_spark, result_arrow)
 
 # ensure original category elements are string
-assert isinstance(category_first_element, str)
+self.assertIsInstance(category_first_element, str)
 # spark data frame and arrow execution mode enabled data frame type 
must match pandas
-assert spark_type == arrow_type == 'string'
-assert isinstance(arrow_first_category_element, str)
-assert isinstance(spark_first_category_element, str)
+self.assertEqual(spark_type, 'string')
+self.assertEqual(arrow_type, 'string')
+self.assertIsInstance(arrow_first_category_element, str)
+self.assertIsInstance(spark_first_category_element, str)
 
 
 @unittest.skipIf(
diff --git a/python/pyspark/sql/tests/test_pandas_udf_scalar.py 
b/python/pyspark/sql/tests/test_pandas_udf_scalar.py
index ae6b8d5..2d38efd 100644
--- a/python/pyspark/sql/tests/test_pandas_udf_scalar.py
+++ b/python/pyspark/sql/tests/test_pandas_udf_scalar.py
@@ -910,13 +910,10 @@ class ScalarPandasUDFTests(ReusedSQLTestCase):
 
 spark_type = df.dtypes[1][1]
 # spark data frame and arrow execution mode enabled data frame type 
must match pandas
-assert spark_type == 'string'
+self.assertEqual(spark_type, 'string')
 
-# Check result value of column 'B' must be equal to column 'A'
-for i in range(0, len(result_spark["A"])):
-assert result_spark["A"][i] == result_spark["B"][i]
-assert isinstance(result_spark["A"][i], str)
-assert isinstance(result_spark["B"][i], str)
+# Check result of column 'B' must be equal to column 'A' in type and 
values
+pd.testing.assert_series_equal(result_spark["A"], result_spark["B"], 
check_names=False)
 
 @unittest.skipIf(sys.version_info[:2] < (3, 5), "Type hints are supported 
from Python 3.5.")
 def test_type_annotation(self):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-31839][TESTS] Delete duplicate code in castsuit

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 8bde6ed  [SPARK-31839][TESTS] Delete duplicate code in castsuit
8bde6ed is described below

commit 8bde6ed4563985a3b474abd67c171b11867e755d
Author: GuoPhilipse <46367746+guophili...@users.noreply.github.com>
AuthorDate: Thu May 28 09:57:11 2020 +0900

[SPARK-31839][TESTS] Delete duplicate code in castsuit

### What changes were proposed in this pull request?
Delete duplicate code castsuit

### Why are the changes needed?
keep spark code clean

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
no need

Closes #28655 from GuoPhilipse/delete-duplicate-code-castsuit.

Lead-authored-by: GuoPhilipse 
<46367746+guophili...@users.noreply.github.com>
Co-authored-by: GuoPhilipse 
Signed-off-by: HyukjinKwon 
(cherry picked from commit dfbc5edf20040e8163ee3beef61f2743a948c508)
Signed-off-by: HyukjinKwon 
---
 .../test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index 7616645..3f25f06 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -286,7 +286,6 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkCast(1.5, "1.5")
 
 checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
-checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
   }
 
   test("cast from string") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-31839][TESTS] Delete duplicate code in castsuit

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 8bde6ed  [SPARK-31839][TESTS] Delete duplicate code in castsuit
8bde6ed is described below

commit 8bde6ed4563985a3b474abd67c171b11867e755d
Author: GuoPhilipse <46367746+guophili...@users.noreply.github.com>
AuthorDate: Thu May 28 09:57:11 2020 +0900

[SPARK-31839][TESTS] Delete duplicate code in castsuit

### What changes were proposed in this pull request?
Delete duplicate code castsuit

### Why are the changes needed?
keep spark code clean

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
no need

Closes #28655 from GuoPhilipse/delete-duplicate-code-castsuit.

Lead-authored-by: GuoPhilipse 
<46367746+guophili...@users.noreply.github.com>
Co-authored-by: GuoPhilipse 
Signed-off-by: HyukjinKwon 
(cherry picked from commit dfbc5edf20040e8163ee3beef61f2743a948c508)
Signed-off-by: HyukjinKwon 
---
 .../test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index 7616645..3f25f06 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -286,7 +286,6 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkCast(1.5, "1.5")
 
 checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
-checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
   }
 
   test("cast from string") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-31839][TESTS] Delete duplicate code in castsuit

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 8bde6ed  [SPARK-31839][TESTS] Delete duplicate code in castsuit
8bde6ed is described below

commit 8bde6ed4563985a3b474abd67c171b11867e755d
Author: GuoPhilipse <46367746+guophili...@users.noreply.github.com>
AuthorDate: Thu May 28 09:57:11 2020 +0900

[SPARK-31839][TESTS] Delete duplicate code in castsuit

### What changes were proposed in this pull request?
Delete duplicate code castsuit

### Why are the changes needed?
keep spark code clean

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
no need

Closes #28655 from GuoPhilipse/delete-duplicate-code-castsuit.

Lead-authored-by: GuoPhilipse 
<46367746+guophili...@users.noreply.github.com>
Co-authored-by: GuoPhilipse 
Signed-off-by: HyukjinKwon 
(cherry picked from commit dfbc5edf20040e8163ee3beef61f2743a948c508)
Signed-off-by: HyukjinKwon 
---
 .../test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index 7616645..3f25f06 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -286,7 +286,6 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkCast(1.5, "1.5")
 
 checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
-checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
   }
 
   test("cast from string") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31839][TESTS] Delete duplicate code in castsuit

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 11bce50  [SPARK-31839][TESTS] Delete duplicate code in castsuit
11bce50 is described below

commit 11bce50debe2e47a5456f565c04960bd17d0e778
Author: GuoPhilipse <46367746+guophili...@users.noreply.github.com>
AuthorDate: Thu May 28 09:57:11 2020 +0900

[SPARK-31839][TESTS] Delete duplicate code in castsuit

### What changes were proposed in this pull request?
Delete duplicate code castsuit

### Why are the changes needed?
keep spark code clean

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
no need

Closes #28655 from GuoPhilipse/delete-duplicate-code-castsuit.

Lead-authored-by: GuoPhilipse 
<46367746+guophili...@users.noreply.github.com>
Co-authored-by: GuoPhilipse 
Signed-off-by: HyukjinKwon 
(cherry picked from commit dfbc5edf20040e8163ee3beef61f2743a948c508)
Signed-off-by: HyukjinKwon 
---
 .../test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index fd0ba67..f563720 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -240,7 +240,6 @@ abstract class CastSuiteBase extends SparkFunSuite with 
ExpressionEvalHelper {
 checkCast(1.5, "1.5")
 
 checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
-checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
   }
 
   test("cast from string") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-31839][TESTS] Delete duplicate code in castsuit

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 8bde6ed  [SPARK-31839][TESTS] Delete duplicate code in castsuit
8bde6ed is described below

commit 8bde6ed4563985a3b474abd67c171b11867e755d
Author: GuoPhilipse <46367746+guophili...@users.noreply.github.com>
AuthorDate: Thu May 28 09:57:11 2020 +0900

[SPARK-31839][TESTS] Delete duplicate code in castsuit

### What changes were proposed in this pull request?
Delete duplicate code castsuit

### Why are the changes needed?
keep spark code clean

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
no need

Closes #28655 from GuoPhilipse/delete-duplicate-code-castsuit.

Lead-authored-by: GuoPhilipse 
<46367746+guophili...@users.noreply.github.com>
Co-authored-by: GuoPhilipse 
Signed-off-by: HyukjinKwon 
(cherry picked from commit dfbc5edf20040e8163ee3beef61f2743a948c508)
Signed-off-by: HyukjinKwon 
---
 .../test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index 7616645..3f25f06 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -286,7 +286,6 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkCast(1.5, "1.5")
 
 checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
-checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
   }
 
   test("cast from string") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31839][TESTS] Delete duplicate code in castsuit

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 11bce50  [SPARK-31839][TESTS] Delete duplicate code in castsuit
11bce50 is described below

commit 11bce50debe2e47a5456f565c04960bd17d0e778
Author: GuoPhilipse <46367746+guophili...@users.noreply.github.com>
AuthorDate: Thu May 28 09:57:11 2020 +0900

[SPARK-31839][TESTS] Delete duplicate code in castsuit

### What changes were proposed in this pull request?
Delete duplicate code castsuit

### Why are the changes needed?
keep spark code clean

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
no need

Closes #28655 from GuoPhilipse/delete-duplicate-code-castsuit.

Lead-authored-by: GuoPhilipse 
<46367746+guophili...@users.noreply.github.com>
Co-authored-by: GuoPhilipse 
Signed-off-by: HyukjinKwon 
(cherry picked from commit dfbc5edf20040e8163ee3beef61f2743a948c508)
Signed-off-by: HyukjinKwon 
---
 .../test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index fd0ba67..f563720 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -240,7 +240,6 @@ abstract class CastSuiteBase extends SparkFunSuite with 
ExpressionEvalHelper {
 checkCast(1.5, "1.5")
 
 checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
-checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
   }
 
   test("cast from string") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2f92ea0 -> dfbc5ed)

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2f92ea0  [SPARK-31763][PYSPARK] Add `inputFiles` method in PySpark 
DataFrame Class
 add dfbc5ed  [SPARK-31839][TESTS] Delete duplicate code in castsuit

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 1 -
 1 file changed, 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-31839][TESTS] Delete duplicate code in castsuit

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 8bde6ed  [SPARK-31839][TESTS] Delete duplicate code in castsuit
8bde6ed is described below

commit 8bde6ed4563985a3b474abd67c171b11867e755d
Author: GuoPhilipse <46367746+guophili...@users.noreply.github.com>
AuthorDate: Thu May 28 09:57:11 2020 +0900

[SPARK-31839][TESTS] Delete duplicate code in castsuit

### What changes were proposed in this pull request?
Delete duplicate code castsuit

### Why are the changes needed?
keep spark code clean

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
no need

Closes #28655 from GuoPhilipse/delete-duplicate-code-castsuit.

Lead-authored-by: GuoPhilipse 
<46367746+guophili...@users.noreply.github.com>
Co-authored-by: GuoPhilipse 
Signed-off-by: HyukjinKwon 
(cherry picked from commit dfbc5edf20040e8163ee3beef61f2743a948c508)
Signed-off-by: HyukjinKwon 
---
 .../test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index 7616645..3f25f06 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -286,7 +286,6 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkCast(1.5, "1.5")
 
 checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
-checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
   }
 
   test("cast from string") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31839][TESTS] Delete duplicate code in castsuit

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 11bce50  [SPARK-31839][TESTS] Delete duplicate code in castsuit
11bce50 is described below

commit 11bce50debe2e47a5456f565c04960bd17d0e778
Author: GuoPhilipse <46367746+guophili...@users.noreply.github.com>
AuthorDate: Thu May 28 09:57:11 2020 +0900

[SPARK-31839][TESTS] Delete duplicate code in castsuit

### What changes were proposed in this pull request?
Delete duplicate code castsuit

### Why are the changes needed?
keep spark code clean

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
no need

Closes #28655 from GuoPhilipse/delete-duplicate-code-castsuit.

Lead-authored-by: GuoPhilipse 
<46367746+guophili...@users.noreply.github.com>
Co-authored-by: GuoPhilipse 
Signed-off-by: HyukjinKwon 
(cherry picked from commit dfbc5edf20040e8163ee3beef61f2743a948c508)
Signed-off-by: HyukjinKwon 
---
 .../test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index fd0ba67..f563720 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -240,7 +240,6 @@ abstract class CastSuiteBase extends SparkFunSuite with 
ExpressionEvalHelper {
 checkCast(1.5, "1.5")
 
 checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
-checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
   }
 
   test("cast from string") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2f92ea0 -> dfbc5ed)

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2f92ea0  [SPARK-31763][PYSPARK] Add `inputFiles` method in PySpark 
DataFrame Class
 add dfbc5ed  [SPARK-31839][TESTS] Delete duplicate code in castsuit

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 1 -
 1 file changed, 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31839][TESTS] Delete duplicate code in castsuit

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dfbc5ed  [SPARK-31839][TESTS] Delete duplicate code in castsuit
dfbc5ed is described below

commit dfbc5edf20040e8163ee3beef61f2743a948c508
Author: GuoPhilipse <46367746+guophili...@users.noreply.github.com>
AuthorDate: Thu May 28 09:57:11 2020 +0900

[SPARK-31839][TESTS] Delete duplicate code in castsuit

### What changes were proposed in this pull request?
Delete duplicate code castsuit

### Why are the changes needed?
keep spark code clean

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
no need

Closes #28655 from GuoPhilipse/delete-duplicate-code-castsuit.

Lead-authored-by: GuoPhilipse 
<46367746+guophili...@users.noreply.github.com>
Co-authored-by: GuoPhilipse 
Signed-off-by: HyukjinKwon 
---
 .../test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index e5bff7f..6af995c 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -240,7 +240,6 @@ abstract class CastSuiteBase extends SparkFunSuite with 
ExpressionEvalHelper {
 checkCast(1.5, "1.5")
 
 checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
-checkEvaluation(cast(cast(1.toDouble, TimestampType), DoubleType), 
1.toDouble)
   }
 
   test("cast from string") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (339b0eca -> 2f92ea0)

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 339b0eca [SPARK-25351][SQL][PYTHON] Handle Pandas category type when 
converting from Python with Arrow
 add 2f92ea0  [SPARK-31763][PYSPARK] Add `inputFiles` method in PySpark 
DataFrame Class

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/dataframe.py| 14 ++
 python/pyspark/sql/tests/test_dataframe.py | 18 ++
 2 files changed, 32 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (339b0eca -> 2f92ea0)

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 339b0eca [SPARK-25351][SQL][PYTHON] Handle Pandas category type when 
converting from Python with Arrow
 add 2f92ea0  [SPARK-31763][PYSPARK] Add `inputFiles` method in PySpark 
DataFrame Class

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/dataframe.py| 14 ++
 python/pyspark/sql/tests/test_dataframe.py | 18 ++
 2 files changed, 32 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (339b0eca -> 2f92ea0)

2020-05-27 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 339b0eca [SPARK-25351][SQL][PYTHON] Handle Pandas category type when 
converting from Python with Arrow
 add 2f92ea0  [SPARK-31763][PYSPARK] Add `inputFiles` method in PySpark 
DataFrame Class

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/dataframe.py| 14 ++
 python/pyspark/sql/tests/test_dataframe.py | 18 ++
 2 files changed, 32 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-25351][SQL][PYTHON] Handle Pandas category type when converting from Python with Arrow

2020-05-27 Thread cutlerb

This is an automated email from the ASF dual-hosted git repository.

cutlerb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 339b0eca [SPARK-25351][SQL][PYTHON] Handle Pandas category type when 
converting from Python with Arrow
339b0eca is described below

commit 339b0ecadb9c66ec8a62fd1f8e5a7a266b465aef
Author: Jalpan Randeri 
AuthorDate: Wed May 27 17:27:29 2020 -0700

[SPARK-25351][SQL][PYTHON] Handle Pandas category type when converting from 
Python with Arrow

Handle Pandas category type while converting from python with Arrow 
enabled. The category column will be converted to whatever type the category 
elements are as is the case with Arrow disabled.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
New unit tests were added for `createDataFrame` and scalar `pandas_udf`

Closes #26585 from jalpan-randeri/feature-pyarrow-dictionary-type.

Authored-by: Jalpan Randeri 
Signed-off-by: Bryan Cutler 
---
 python/pyspark/sql/pandas/serializers.py   |  3 +++
 python/pyspark/sql/pandas/types.py |  2 ++
 python/pyspark/sql/tests/test_arrow.py | 26 ++
 python/pyspark/sql/tests/test_pandas_udf_scalar.py | 21 +
 4 files changed, 52 insertions(+)

diff --git a/python/pyspark/sql/pandas/serializers.py 
b/python/pyspark/sql/pandas/serializers.py
index 4dd15d1..ff0b10a 100644
--- a/python/pyspark/sql/pandas/serializers.py
+++ b/python/pyspark/sql/pandas/serializers.py
@@ -154,6 +154,9 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer):
 # Ensure timestamp series are in expected form for Spark internal 
representation
 if t is not None and pa.types.is_timestamp(t):
 s = _check_series_convert_timestamps_internal(s, 
self._timezone)
+elif type(s.dtype) == pd.CategoricalDtype:
+# Note: This can be removed once minimum pyarrow version is >= 
0.16.1
+s = s.astype(s.dtypes.categories.dtype)
 try:
 array = pa.Array.from_pandas(s, mask=mask, type=t, 
safe=self._safecheck)
 except pa.ArrowException as e:
diff --git a/python/pyspark/sql/pandas/types.py 
b/python/pyspark/sql/pandas/types.py
index d1edf3f..4b70c8a 100644
--- a/python/pyspark/sql/pandas/types.py
+++ b/python/pyspark/sql/pandas/types.py
@@ -114,6 +114,8 @@ def from_arrow_type(at):
 return StructType(
 [StructField(field.name, from_arrow_type(field.type), 
nullable=field.nullable)
  for field in at])
+elif types.is_dictionary(at):
+spark_type = from_arrow_type(at.value_type)
 else:
 raise TypeError("Unsupported type in conversion from Arrow: " + 
str(at))
 return spark_type
diff --git a/python/pyspark/sql/tests/test_arrow.py 
b/python/pyspark/sql/tests/test_arrow.py
index 004c79f..c3c9fb0 100644
--- a/python/pyspark/sql/tests/test_arrow.py
+++ b/python/pyspark/sql/tests/test_arrow.py
@@ -415,6 +415,32 @@ class ArrowTests(ReusedSQLTestCase):
 for case in cases:
 run_test(*case)
 
+def test_createDateFrame_with_category_type(self):
+pdf = pd.DataFrame({"A": [u"a", u"b", u"c", u"a"]})
+pdf["B"] = pdf["A"].astype('category')
+category_first_element = dict(enumerate(pdf['B'].cat.categories))[0]
+
+with self.sql_conf({"spark.sql.execution.arrow.pyspark.enabled": 
True}):
+arrow_df = self.spark.createDataFrame(pdf)
+arrow_type = arrow_df.dtypes[1][1]
+result_arrow = arrow_df.toPandas()
+arrow_first_category_element = result_arrow["B"][0]
+
+with self.sql_conf({"spark.sql.execution.arrow.pyspark.enabled": 
False}):
+df = self.spark.createDataFrame(pdf)
+spark_type = df.dtypes[1][1]
+result_spark = df.toPandas()
+spark_first_category_element = result_spark["B"][0]
+
+assert_frame_equal(result_spark, result_arrow)
+
+# ensure original category elements are string
+assert isinstance(category_first_element, str)
+# spark data frame and arrow execution mode enabled data frame type 
must match pandas
+assert spark_type == arrow_type == 'string'
+assert isinstance(arrow_first_category_element, str)
+assert isinstance(spark_first_category_element, str)
+
 
 @unittest.skipIf(
 not have_pandas or not have_pyarrow,
diff --git a/python/pyspark/sql/tests/test_pandas_udf_scalar.py 
b/python/pyspark/sql/tests/test_pandas_udf_scalar.py
index 7260e80..ae6b8d5 100644
--- a/python/pyspark/sql/tests/test_pandas_udf_scalar.py
+++ b/python/pyspark/sql/tests/test_pandas_udf_scalar.py
@@ -897,6 +897,27 @@ class ScalarPandasUDFTests(ReusedSQLTestCase):
 result = df.withColumn('time', foo_udf(df.

[spark] branch branch-3.0 updated: Revert "[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite"

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e7b88e8  Revert "[SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite"
e7b88e8 is described below

commit e7b88e82ec5cee0a738f96127b106358cc97cb4f
Author: Xingbo Jiang 
AuthorDate: Wed May 27 17:21:10 2020 -0700

Revert "[SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite"

This reverts commit cb817bb1cf6b074e075c02880001ec96f2f39de7.
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 54899bf..6191e41 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,7 +25,6 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
-import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -38,10 +37,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
-TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  test("global sync by barrier() call") {
+  // TODO (SPARK-31730): re-enable it
+  ignore("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -58,7 +57,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-initLocalClusterSparkContext()
+val conf = new SparkConf()
+  .setMaster("local-cluster[4, 1, 1024]")
+  .setAppName("test-cluster")
+sc = new SparkContext(conf)
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -76,7 +78,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-initLocalClusterSparkContext()
+val conf = new SparkConf()
+  .setMaster("local-cluster[4, 1, 1024]")
+  .setAppName("test-cluster")
+sc = new SparkContext(conf)
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -95,7 +100,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-initLocalClusterSparkContext()
+val conf = new SparkConf()
+  .setMaster("local-cluster[4, 1, 1024]")
+  .setAppName("test-cluster")
+sc = new SparkContext(conf)
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -121,7 +129,8 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  test("support multiple barrier() call within a single task") {
+  // TODO (SPARK-31730): re-enable it
+  ignore("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -276,9 +285,6 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
-// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
-// could get scheduled at ANY level.
-sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
 val dep = new OneToOneDependency[Int](rdd0)
 // set up a barrier stage with 2 tasks and both tasks prefer executor 0 
(only 1 core) for


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cb817bb  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite
cb817bb is described below

commit cb817bb1cf6b074e075c02880001ec96f2f39de7
Author: Xingbo Jiang 
AuthorDate: Wed May 27 16:37:02 2020 -0700

[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4)
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..54899bf 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
+import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
+// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
+// could get scheduled at ANY level.
+sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
 val dep = new OneToOneDependency[Int](rdd0)
 // set up a barrier stage with 2 tasks and both tasks prefer executor 0 
(only 1 core) for


--

[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cb817bb  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite
cb817bb is described below

commit cb817bb1cf6b074e075c02880001ec96f2f39de7
Author: Xingbo Jiang 
AuthorDate: Wed May 27 16:37:02 2020 -0700

[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4)
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..54899bf 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
+import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
+// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
+// could get scheduled at ANY level.
+sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
 val dep = new OneToOneDependency[Int](rdd0)
 // set up a barrier stage with 2 tasks and both tasks prefer executor 0 
(only 1 core) for


--

[spark] branch master updated (d19b173 -> efe7fd2)

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier
 add efe7fd2  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cb817bb  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite
cb817bb is described below

commit cb817bb1cf6b074e075c02880001ec96f2f39de7
Author: Xingbo Jiang 
AuthorDate: Wed May 27 16:37:02 2020 -0700

[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4)
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..54899bf 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
+import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
+// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
+// could get scheduled at ANY level.
+sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
 val dep = new OneToOneDependency[Int](rdd0)
 // set up a barrier stage with 2 tasks and both tasks prefer executor 0 
(only 1 core) for


--

[spark] branch master updated (d19b173 -> efe7fd2)

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier
 add efe7fd2  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cb817bb  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite
cb817bb is described below

commit cb817bb1cf6b074e075c02880001ec96f2f39de7
Author: Xingbo Jiang 
AuthorDate: Wed May 27 16:37:02 2020 -0700

[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4)
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..54899bf 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
+import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
+// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
+// could get scheduled at ANY level.
+sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
 val dep = new OneToOneDependency[Int](rdd0)
 // set up a barrier stage with 2 tasks and both tasks prefer executor 0 
(only 1 core) for


--

[spark] branch master updated (d19b173 -> efe7fd2)

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier
 add efe7fd2  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cb817bb  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite
cb817bb is described below

commit cb817bb1cf6b074e075c02880001ec96f2f39de7
Author: Xingbo Jiang 
AuthorDate: Wed May 27 16:37:02 2020 -0700

[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4)
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..54899bf 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
+import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
+// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
+// could get scheduled at ANY level.
+sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
 val dep = new OneToOneDependency[Int](rdd0)
 // set up a barrier stage with 2 tasks and both tasks prefer executor 0 
(only 1 core) for


--

[spark] branch master updated (d19b173 -> efe7fd2)

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier
 add efe7fd2  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new efe7fd2  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite
efe7fd2 is described below

commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4
Author: Xingbo Jiang 
AuthorDate: Wed May 27 16:37:02 2020 -0700

[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..54899bf 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
+import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
+// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
+// could get scheduled at ANY level.
+sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
 val dep = new OneToOneDependency[Int](rdd0)
 // set up a barrier stage with 2 tasks and both tasks prefer executor 0 
(only 1 core) for


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.o

[spark] branch master updated (1528fbc -> d19b173)

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1528fbc  [SPARK-31827][SQL] fail datetime parsing/formatting if detect 
the Java 8 bug of stand-alone form
 add d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/util/JsonProtocol.scala |  1 +
 .../scheduler/EventLoggingListenerSuite.scala  | 44 ++
 .../org/apache/spark/util/JsonProtocolSuite.scala  | 11 ++
 3 files changed, 56 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1528fbc -> d19b173)

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1528fbc  [SPARK-31827][SQL] fail datetime parsing/formatting if detect 
the Java 8 bug of stand-alone form
 add d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/util/JsonProtocol.scala |  1 +
 .../scheduler/EventLoggingListenerSuite.scala  | 44 ++
 .../org/apache/spark/util/JsonProtocolSuite.scala  | 11 ++
 3 files changed, 56 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier
d19b173 is described below

commit d19b173b47af04fe6f03e2b21b60eb317aeaae4f
Author: Kousuke Saruta 
AuthorDate: Wed May 27 14:36:12 2020 -0700

[SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier

### What changes were proposed in this pull request?

This PR changes JsonProtocol to write RDDInfos#isBarrier.

### Why are the changes needed?

JsonProtocol reads RDDInfos#isBarrier but not write it so it's a bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I added a testcase.

Closes #28583 from sarutak/SPARK-31764.

Authored-by: Kousuke Saruta 
Signed-off-by: Xingbo Jiang 
---
 .../scala/org/apache/spark/util/JsonProtocol.scala |  1 +
 .../scheduler/EventLoggingListenerSuite.scala  | 44 ++
 .../org/apache/spark/util/JsonProtocolSuite.scala  | 11 ++
 3 files changed, 56 insertions(+)

diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala 
b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
index 26bbff5..844d9b7 100644
--- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
@@ -487,6 +487,7 @@ private[spark] object JsonProtocol {
 ("Callsite" -> rddInfo.callSite) ~
 ("Parent IDs" -> parentIds) ~
 ("Storage Level" -> storageLevel) ~
+("Barrier" -> rddInfo.isBarrier) ~
 ("Number of Partitions" -> rddInfo.numPartitions) ~
 ("Number of Cached Partitions" -> rddInfo.numCachedPartitions) ~
 ("Memory Size" -> rddInfo.memSize) ~
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
 
b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
index 61ea21f..7c23e44 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
@@ -36,6 +36,7 @@ import org.apache.spark.deploy.history.{EventLogFileReader, 
SingleEventLogFileWr
 import org.apache.spark.deploy.history.EventLogTestHelper._
 import org.apache.spark.executor.{ExecutorMetrics, TaskMetrics}
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.{EVENT_LOG_DIR, EVENT_LOG_ENABLED}
 import org.apache.spark.io._
 import org.apache.spark.metrics.{ExecutorMetricType, MetricsSystem}
 import org.apache.spark.resource.ResourceProfile
@@ -100,6 +101,49 @@ class EventLoggingListenerSuite extends SparkFunSuite with 
LocalSparkContext wit
 testStageExecutorMetricsEventLogging()
   }
 
+  test("SPARK-31764: isBarrier should be logged in event log") {
+val conf = new SparkConf()
+conf.set(EVENT_LOG_ENABLED, true)
+conf.set(EVENT_LOG_DIR, testDirPath.toString)
+val sc = new SparkContext("local", "test-SPARK-31764", conf)
+val appId = sc.applicationId
+
+sc.parallelize(1 to 10)
+  .barrier()
+  .mapPartitions(_.map(elem => (elem, elem)))
+  .filter(elem => elem._1 % 2 == 0)
+  .reduceByKey(_ + _)
+  .collect
+sc.stop()
+
+val eventLogStream = EventLogFileReader.openEventLog(new Path(testDirPath, 
appId), fileSystem)
+val events = readLines(eventLogStream).map(line => 
JsonProtocol.sparkEventFromJson(parse(line)))
+val jobStartEvents = events
+  .filter(event => event.isInstanceOf[SparkListenerJobStart])
+  .map(_.asInstanceOf[SparkListenerJobStart])
+
+assert(jobStartEvents.size === 1)
+val stageInfos = jobStartEvents.head.stageInfos
+assert(stageInfos.size === 2)
+
+val stage0 = stageInfos(0)
+val rddInfosInStage0 = stage0.rddInfos
+assert(rddInfosInStage0.size === 3)
+val sortedRddInfosInStage0 = rddInfosInStage0.sortBy(_.scope.get.name)
+assert(sortedRddInfosInStage0(0).scope.get.name === "filter")
+assert(sortedRddInfosInStage0(0).isBarrier === true)
+assert(sortedRddInfosInStage0(1).scope.get.name === "mapPartitions")
+assert(sortedRddInfosInStage0(1).isBarrier === true)
+assert(sortedRddInfosInStage0(2).scope.get.name === "parallelize")
+assert(sortedRddInfosInStage0(2).isBarrier === false)
+
+val stage1 = stageInfos(1)
+val rddInfosInStage1 = stage1.rddInfos
+assert(rddInfosInStage1.size === 1)
+assert(rddInfosInStage1(0).scope.get.name === "reduceByKey")
+assert(rddInfosInStage1(0).isBarrier === false) // reduceByKey
+  }
+
   /* - *
* Actual test logic *
* - */
diff --git a/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 
b/core/src/test/

[spark] branch branch-3.0 updated: [SPARK-31827][SQL] fail datetime parsing/formatting if detect the Java 8 bug of stand-alone form

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 0383d1e  [SPARK-31827][SQL] fail datetime parsing/formatting if detect 
the Java 8 bug of stand-alone form
0383d1e is described below

commit 0383d1efe7a7ada8a202fd411bf32b3ed80c9ce4
Author: Wenchen Fan 
AuthorDate: Wed May 27 18:53:19 2020 +

[SPARK-31827][SQL] fail datetime parsing/formatting if detect the Java 8 
bug of stand-alone form

If `LLL`/`qqq` is used in the datetime pattern string, and the current JDK 
in use has a bug for the stand-alone form (see 
https://bugs.openjdk.java.net/browse/JDK-8114833), throw an exception with a 
clear error message.

to keep backward compatibility with Spark 2.4

Yes

Spark 2.4
```
scala> sql("select date_format('1990-1-1', 'LLL')").show
+-+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+-+
|  Jan|
+-+
```

Spark 3.0 with Java 11
```
scala> sql("select date_format('1990-1-1', 'LLL')").show
+-+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+-+
|  Jan|
+-+
```

Spark 3.0 with Java 8
```
// before this PR
+-+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+-+
|1|
+-+
// after this PR
scala> sql("select date_format('1990-1-1', 'LLL')").show
org.apache.spark.SparkUpgradeException
```

manual test with java 8 and 11

Closes #28646 from cloud-fan/format.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 docs/sql-ref-datetime-pattern.md   |  7 ---
 .../sql/catalyst/util/DateTimeFormatterHelper.scala| 18 +-
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/docs/sql-ref-datetime-pattern.md b/docs/sql-ref-datetime-pattern.md
index 0e00e7b..48e85b4 100644
--- a/docs/sql-ref-datetime-pattern.md
+++ b/docs/sql-ref-datetime-pattern.md
@@ -76,7 +76,8 @@ The count of pattern letters determines the format.
 
 - Year: The count of letters determines the minimum field width below which 
padding is used. If the count of letters is two, then a reduced two digit form 
is used. For printing, this outputs the rightmost two digits. For parsing, this 
will parse using the base value of 2000, resulting in a year within the range 
2000 to 2099 inclusive. If the count of letters is less than four (but not 
two), then the sign is only output for negative years. Otherwise, the sign is 
output if the pad width is [...]
 
-- Month: If the number of pattern letters is 3 or more, the month is 
interpreted as text; otherwise, it is interpreted as a number. The text form is 
depend on letters - 'M' denotes the 'standard' form, and 'L' is for 
'stand-alone' form. The difference between the 'standard' and 'stand-alone' 
forms is trickier to describe as there is no difference in English. However, in 
other languages there is a difference in the word used when the text is used 
alone, as opposed to in a complete date. F [...]
+- Month: It follows the rule of Number/Text. The text form is depend on 
letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. 
These two forms are different only in some certain languages. For example, in 
Russian, 'Июль' is the stand-alone form of July, and 'Июля' is the standard 
form. Here are examples for all supported pattern letters:
+  - `'M'` or `'L'`: Month number in a year starting from 1. There is no 
difference between 'M' and 'L'. Month from 1 to 9 are printed without padding.
 ```sql
 spark-sql> select date_format(date '1970-01-01', "M");
 1
@@ -106,8 +107,8 @@ The count of pattern letters determines the format.
 ```
   - `''`: full textual month representation in the standard form. It is 
used for parsing/formatting months as a part of dates/timestamps.
 ```sql
-spark-sql> select date_format(date '1970-01-01', " ");
-January 1970
+spark-sql> select date_format(date '1970-01-01', "d ");
+1 January
 spark-sql> select to_csv(named_struct('date', date '1970-01-01'), 
map('dateFormat', 'd ', 'locale', 'RU'));
 1 января
 ```
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
 
b/sql/catalyst/src/main/s

[spark] branch branch-3.0 updated: [SPARK-31827][SQL] fail datetime parsing/formatting if detect the Java 8 bug of stand-alone form

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 0383d1e  [SPARK-31827][SQL] fail datetime parsing/formatting if detect 
the Java 8 bug of stand-alone form
0383d1e is described below

commit 0383d1efe7a7ada8a202fd411bf32b3ed80c9ce4
Author: Wenchen Fan 
AuthorDate: Wed May 27 18:53:19 2020 +

[SPARK-31827][SQL] fail datetime parsing/formatting if detect the Java 8 
bug of stand-alone form

If `LLL`/`qqq` is used in the datetime pattern string, and the current JDK 
in use has a bug for the stand-alone form (see 
https://bugs.openjdk.java.net/browse/JDK-8114833), throw an exception with a 
clear error message.

to keep backward compatibility with Spark 2.4

Yes

Spark 2.4
```
scala> sql("select date_format('1990-1-1', 'LLL')").show
+-+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+-+
|  Jan|
+-+
```

Spark 3.0 with Java 11
```
scala> sql("select date_format('1990-1-1', 'LLL')").show
+-+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+-+
|  Jan|
+-+
```

Spark 3.0 with Java 8
```
// before this PR
+-+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+-+
|1|
+-+
// after this PR
scala> sql("select date_format('1990-1-1', 'LLL')").show
org.apache.spark.SparkUpgradeException
```

manual test with java 8 and 11

Closes #28646 from cloud-fan/format.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 docs/sql-ref-datetime-pattern.md   |  7 ---
 .../sql/catalyst/util/DateTimeFormatterHelper.scala| 18 +-
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/docs/sql-ref-datetime-pattern.md b/docs/sql-ref-datetime-pattern.md
index 0e00e7b..48e85b4 100644
--- a/docs/sql-ref-datetime-pattern.md
+++ b/docs/sql-ref-datetime-pattern.md
@@ -76,7 +76,8 @@ The count of pattern letters determines the format.
 
 - Year: The count of letters determines the minimum field width below which 
padding is used. If the count of letters is two, then a reduced two digit form 
is used. For printing, this outputs the rightmost two digits. For parsing, this 
will parse using the base value of 2000, resulting in a year within the range 
2000 to 2099 inclusive. If the count of letters is less than four (but not 
two), then the sign is only output for negative years. Otherwise, the sign is 
output if the pad width is [...]
 
-- Month: If the number of pattern letters is 3 or more, the month is 
interpreted as text; otherwise, it is interpreted as a number. The text form is 
depend on letters - 'M' denotes the 'standard' form, and 'L' is for 
'stand-alone' form. The difference between the 'standard' and 'stand-alone' 
forms is trickier to describe as there is no difference in English. However, in 
other languages there is a difference in the word used when the text is used 
alone, as opposed to in a complete date. F [...]
+- Month: It follows the rule of Number/Text. The text form is depend on 
letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. 
These two forms are different only in some certain languages. For example, in 
Russian, 'Июль' is the stand-alone form of July, and 'Июля' is the standard 
form. Here are examples for all supported pattern letters:
+  - `'M'` or `'L'`: Month number in a year starting from 1. There is no 
difference between 'M' and 'L'. Month from 1 to 9 are printed without padding.
 ```sql
 spark-sql> select date_format(date '1970-01-01', "M");
 1
@@ -106,8 +107,8 @@ The count of pattern letters determines the format.
 ```
   - `''`: full textual month representation in the standard form. It is 
used for parsing/formatting months as a part of dates/timestamps.
 ```sql
-spark-sql> select date_format(date '1970-01-01', " ");
-January 1970
+spark-sql> select date_format(date '1970-01-01', "d ");
+1 January
 spark-sql> select to_csv(named_struct('date', date '1970-01-01'), 
map('dateFormat', 'd ', 'locale', 'RU'));
 1 января
 ```
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
 
b/sql/catalyst/src/main/s

[spark] branch branch-3.0 updated: [SPARK-31827][SQL] fail datetime parsing/formatting if detect the Java 8 bug of stand-alone form

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 0383d1e  [SPARK-31827][SQL] fail datetime parsing/formatting if detect 
the Java 8 bug of stand-alone form
0383d1e is described below

commit 0383d1efe7a7ada8a202fd411bf32b3ed80c9ce4
Author: Wenchen Fan 
AuthorDate: Wed May 27 18:53:19 2020 +

[SPARK-31827][SQL] fail datetime parsing/formatting if detect the Java 8 
bug of stand-alone form

If `LLL`/`qqq` is used in the datetime pattern string, and the current JDK 
in use has a bug for the stand-alone form (see 
https://bugs.openjdk.java.net/browse/JDK-8114833), throw an exception with a 
clear error message.

to keep backward compatibility with Spark 2.4

Yes

Spark 2.4
```
scala> sql("select date_format('1990-1-1', 'LLL')").show
+-+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+-+
|  Jan|
+-+
```

Spark 3.0 with Java 11
```
scala> sql("select date_format('1990-1-1', 'LLL')").show
+-+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+-+
|  Jan|
+-+
```

Spark 3.0 with Java 8
```
// before this PR
+-+
|date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)|
+-+
|1|
+-+
// after this PR
scala> sql("select date_format('1990-1-1', 'LLL')").show
org.apache.spark.SparkUpgradeException
```

manual test with java 8 and 11

Closes #28646 from cloud-fan/format.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 docs/sql-ref-datetime-pattern.md   |  7 ---
 .../sql/catalyst/util/DateTimeFormatterHelper.scala| 18 +-
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/docs/sql-ref-datetime-pattern.md b/docs/sql-ref-datetime-pattern.md
index 0e00e7b..48e85b4 100644
--- a/docs/sql-ref-datetime-pattern.md
+++ b/docs/sql-ref-datetime-pattern.md
@@ -76,7 +76,8 @@ The count of pattern letters determines the format.
 
 - Year: The count of letters determines the minimum field width below which 
padding is used. If the count of letters is two, then a reduced two digit form 
is used. For printing, this outputs the rightmost two digits. For parsing, this 
will parse using the base value of 2000, resulting in a year within the range 
2000 to 2099 inclusive. If the count of letters is less than four (but not 
two), then the sign is only output for negative years. Otherwise, the sign is 
output if the pad width is [...]
 
-- Month: If the number of pattern letters is 3 or more, the month is 
interpreted as text; otherwise, it is interpreted as a number. The text form is 
depend on letters - 'M' denotes the 'standard' form, and 'L' is for 
'stand-alone' form. The difference between the 'standard' and 'stand-alone' 
forms is trickier to describe as there is no difference in English. However, in 
other languages there is a difference in the word used when the text is used 
alone, as opposed to in a complete date. F [...]
+- Month: It follows the rule of Number/Text. The text form is depend on 
letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. 
These two forms are different only in some certain languages. For example, in 
Russian, 'Июль' is the stand-alone form of July, and 'Июля' is the standard 
form. Here are examples for all supported pattern letters:
+  - `'M'` or `'L'`: Month number in a year starting from 1. There is no 
difference between 'M' and 'L'. Month from 1 to 9 are printed without padding.
 ```sql
 spark-sql> select date_format(date '1970-01-01', "M");
 1
@@ -106,8 +107,8 @@ The count of pattern letters determines the format.
 ```
   - `''`: full textual month representation in the standard form. It is 
used for parsing/formatting months as a part of dates/timestamps.
 ```sql
-spark-sql> select date_format(date '1970-01-01', " ");
-January 1970
+spark-sql> select date_format(date '1970-01-01', "d ");
+1 January
 spark-sql> select to_csv(named_struct('date', date '1970-01-01'), 
map('dateFormat', 'd ', 'locale', 'RU'));
 1 января
 ```
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
 
b/sql/catalyst/src/main/s

[spark] branch master updated (b5eb093 -> 1528fbc)

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b5eb093  [SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in 
legacy fractional formatter
 add 1528fbc  [SPARK-31827][SQL] fail datetime parsing/formatting if detect 
the Java 8 bug of stand-alone form

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-datetime-pattern.md|  6 +++---
 .../sql/catalyst/util/DateTimeFormatterHelper.scala | 17 -
 2 files changed, 19 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b5eb093 -> 1528fbc)

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b5eb093  [SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in 
legacy fractional formatter
 add 1528fbc  [SPARK-31827][SQL] fail datetime parsing/formatting if detect 
the Java 8 bug of stand-alone form

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-datetime-pattern.md|  6 +++---
 .../sql/catalyst/util/DateTimeFormatterHelper.scala | 17 -
 2 files changed, 19 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (311fe6a -> b5eb093)

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 311fe6a  [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests 
in DateExpressionsSuite
 add b5eb093  [SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in 
legacy fractional formatter

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/TimestampFormatter.scala | 35 +-
 .../spark/sql/util/TimestampFormatterSuite.scala   |  3 ++
 2 files changed, 31 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (311fe6a -> b5eb093)

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 311fe6a  [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests 
in DateExpressionsSuite
 add b5eb093  [SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in 
legacy fractional formatter

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/TimestampFormatter.scala | 35 +-
 .../spark/sql/util/TimestampFormatterSuite.scala   |  3 ++
 2 files changed, 31 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in DateExpressionsSuite

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4611d21  [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests 
in DateExpressionsSuite
4611d21 is described below

commit 4611d2124cf756471675eefa2760d23ae5818b94
Author: Kent Yao 
AuthorDate: Wed May 27 17:26:07 2020 +

[SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in 
DateExpressionsSuite

### What changes were proposed in this pull request?

This PR modifies some codegen related tests to test escape characters for 
datetime functions which are time zone aware. If the timezone is absent, the 
formatter could result in `null` caused by `java.util.NoSuchElementException: 
None.get` and bypassing the real intention of those test cases.

### Why are the changes needed?

fix tests

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing the modified test cases.

Closes #28653 from yaooqinn/SPARK-31835.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 311fe6a880f371c20ca5156ca6eb7dec5a15eff6)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/expressions/DateExpressionsSuite.scala   | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
index 5b5c85f..e00d65f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
@@ -793,7 +793,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 // Test escaping of format
-GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), 
Literal("\"quote")) :: Nil)
+GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), 
Literal("\"quote"), UTC_OPT) :: Nil)
   }
 
   test("unix_timestamp") {
@@ -863,7 +863,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
 // Test escaping of format
 GenerateUnsafeProjection.generate(
-  UnixTimestamp(Literal("2015-07-24"), Literal("\"quote")) :: Nil)
+  UnixTimestamp(Literal("2015-07-24"), Literal("\"quote"), UTC_OPT) :: Nil)
   }
 
   test("to_unix_timestamp") {
@@ -941,7 +941,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
 // Test escaping of format
 GenerateUnsafeProjection.generate(
-  ToUnixTimestamp(Literal("2015-07-24"), Literal("\"quote")) :: Nil)
+  ToUnixTimestamp(Literal("2015-07-24"), Literal("\"quote"), UTC_OPT) :: 
Nil)
   }
 
   test("datediff") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in DateExpressionsSuite

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4611d21  [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests 
in DateExpressionsSuite
4611d21 is described below

commit 4611d2124cf756471675eefa2760d23ae5818b94
Author: Kent Yao 
AuthorDate: Wed May 27 17:26:07 2020 +

[SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in 
DateExpressionsSuite

### What changes were proposed in this pull request?

This PR modifies some codegen related tests to test escape characters for 
datetime functions which are time zone aware. If the timezone is absent, the 
formatter could result in `null` caused by `java.util.NoSuchElementException: 
None.get` and bypassing the real intention of those test cases.

### Why are the changes needed?

fix tests

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing the modified test cases.

Closes #28653 from yaooqinn/SPARK-31835.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 311fe6a880f371c20ca5156ca6eb7dec5a15eff6)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/expressions/DateExpressionsSuite.scala   | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
index 5b5c85f..e00d65f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
@@ -793,7 +793,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 // Test escaping of format
-GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), 
Literal("\"quote")) :: Nil)
+GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), 
Literal("\"quote"), UTC_OPT) :: Nil)
   }
 
   test("unix_timestamp") {
@@ -863,7 +863,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
 // Test escaping of format
 GenerateUnsafeProjection.generate(
-  UnixTimestamp(Literal("2015-07-24"), Literal("\"quote")) :: Nil)
+  UnixTimestamp(Literal("2015-07-24"), Literal("\"quote"), UTC_OPT) :: Nil)
   }
 
   test("to_unix_timestamp") {
@@ -941,7 +941,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
 // Test escaping of format
 GenerateUnsafeProjection.generate(
-  ToUnixTimestamp(Literal("2015-07-24"), Literal("\"quote")) :: Nil)
+  ToUnixTimestamp(Literal("2015-07-24"), Literal("\"quote"), UTC_OPT) :: 
Nil)
   }
 
   test("datediff") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f6f1e51 -> 311fe6a)

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f6f1e51  [SPARK-31719][SQL] Refactor JoinSelection
 add 311fe6a  [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests 
in DateExpressionsSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/DateExpressionsSuite.scala   | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in DateExpressionsSuite

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4611d21  [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests 
in DateExpressionsSuite
4611d21 is described below

commit 4611d2124cf756471675eefa2760d23ae5818b94
Author: Kent Yao 
AuthorDate: Wed May 27 17:26:07 2020 +

[SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in 
DateExpressionsSuite

### What changes were proposed in this pull request?

This PR modifies some codegen related tests to test escape characters for 
datetime functions which are time zone aware. If the timezone is absent, the 
formatter could result in `null` caused by `java.util.NoSuchElementException: 
None.get` and bypassing the real intention of those test cases.

### Why are the changes needed?

fix tests

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing the modified test cases.

Closes #28653 from yaooqinn/SPARK-31835.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 311fe6a880f371c20ca5156ca6eb7dec5a15eff6)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/expressions/DateExpressionsSuite.scala   | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
index 5b5c85f..e00d65f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
@@ -793,7 +793,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 // Test escaping of format
-GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), 
Literal("\"quote")) :: Nil)
+GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), 
Literal("\"quote"), UTC_OPT) :: Nil)
   }
 
   test("unix_timestamp") {
@@ -863,7 +863,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
 // Test escaping of format
 GenerateUnsafeProjection.generate(
-  UnixTimestamp(Literal("2015-07-24"), Literal("\"quote")) :: Nil)
+  UnixTimestamp(Literal("2015-07-24"), Literal("\"quote"), UTC_OPT) :: Nil)
   }
 
   test("to_unix_timestamp") {
@@ -941,7 +941,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
 // Test escaping of format
 GenerateUnsafeProjection.generate(
-  ToUnixTimestamp(Literal("2015-07-24"), Literal("\"quote")) :: Nil)
+  ToUnixTimestamp(Literal("2015-07-24"), Literal("\"quote"), UTC_OPT) :: 
Nil)
   }
 
   test("datediff") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f6f1e51 -> 311fe6a)

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f6f1e51  [SPARK-31719][SQL] Refactor JoinSelection
 add 311fe6a  [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests 
in DateExpressionsSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/DateExpressionsSuite.scala   | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in DateExpressionsSuite

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4611d21  [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests 
in DateExpressionsSuite
4611d21 is described below

commit 4611d2124cf756471675eefa2760d23ae5818b94
Author: Kent Yao 
AuthorDate: Wed May 27 17:26:07 2020 +

[SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in 
DateExpressionsSuite

### What changes were proposed in this pull request?

This PR modifies some codegen related tests to test escape characters for 
datetime functions which are time zone aware. If the timezone is absent, the 
formatter could result in `null` caused by `java.util.NoSuchElementException: 
None.get` and bypassing the real intention of those test cases.

### Why are the changes needed?

fix tests

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing the modified test cases.

Closes #28653 from yaooqinn/SPARK-31835.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 311fe6a880f371c20ca5156ca6eb7dec5a15eff6)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/expressions/DateExpressionsSuite.scala   | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
index 5b5c85f..e00d65f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
@@ -793,7 +793,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 // Test escaping of format
-GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), 
Literal("\"quote")) :: Nil)
+GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), 
Literal("\"quote"), UTC_OPT) :: Nil)
   }
 
   test("unix_timestamp") {
@@ -863,7 +863,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
 // Test escaping of format
 GenerateUnsafeProjection.generate(
-  UnixTimestamp(Literal("2015-07-24"), Literal("\"quote")) :: Nil)
+  UnixTimestamp(Literal("2015-07-24"), Literal("\"quote"), UTC_OPT) :: Nil)
   }
 
   test("to_unix_timestamp") {
@@ -941,7 +941,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
 // Test escaping of format
 GenerateUnsafeProjection.generate(
-  ToUnixTimestamp(Literal("2015-07-24"), Literal("\"quote")) :: Nil)
+  ToUnixTimestamp(Literal("2015-07-24"), Literal("\"quote"), UTC_OPT) :: 
Nil)
   }
 
   test("datediff") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f6f1e51 -> 311fe6a)

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f6f1e51  [SPARK-31719][SQL] Refactor JoinSelection
 add 311fe6a  [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests 
in DateExpressionsSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/DateExpressionsSuite.scala   | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in DateExpressionsSuite

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4611d21  [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests 
in DateExpressionsSuite
4611d21 is described below

commit 4611d2124cf756471675eefa2760d23ae5818b94
Author: Kent Yao 
AuthorDate: Wed May 27 17:26:07 2020 +

[SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in 
DateExpressionsSuite

### What changes were proposed in this pull request?

This PR modifies some codegen related tests to test escape characters for 
datetime functions which are time zone aware. If the timezone is absent, the 
formatter could result in `null` caused by `java.util.NoSuchElementException: 
None.get` and bypassing the real intention of those test cases.

### Why are the changes needed?

fix tests

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing the modified test cases.

Closes #28653 from yaooqinn/SPARK-31835.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 311fe6a880f371c20ca5156ca6eb7dec5a15eff6)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/expressions/DateExpressionsSuite.scala   | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
index 5b5c85f..e00d65f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
@@ -793,7 +793,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 // Test escaping of format
-GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), 
Literal("\"quote")) :: Nil)
+GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), 
Literal("\"quote"), UTC_OPT) :: Nil)
   }
 
   test("unix_timestamp") {
@@ -863,7 +863,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
 // Test escaping of format
 GenerateUnsafeProjection.generate(
-  UnixTimestamp(Literal("2015-07-24"), Literal("\"quote")) :: Nil)
+  UnixTimestamp(Literal("2015-07-24"), Literal("\"quote"), UTC_OPT) :: Nil)
   }
 
   test("to_unix_timestamp") {
@@ -941,7 +941,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
 // Test escaping of format
 GenerateUnsafeProjection.generate(
-  ToUnixTimestamp(Literal("2015-07-24"), Literal("\"quote")) :: Nil)
+  ToUnixTimestamp(Literal("2015-07-24"), Literal("\"quote"), UTC_OPT) :: 
Nil)
   }
 
   test("datediff") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f6f1e51 -> 311fe6a)

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f6f1e51  [SPARK-31719][SQL] Refactor JoinSelection
 add 311fe6a  [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests 
in DateExpressionsSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/DateExpressionsSuite.scala   | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in DateExpressionsSuite

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 311fe6a  [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests 
in DateExpressionsSuite
311fe6a is described below

commit 311fe6a880f371c20ca5156ca6eb7dec5a15eff6
Author: Kent Yao 
AuthorDate: Wed May 27 17:26:07 2020 +

[SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in 
DateExpressionsSuite

### What changes were proposed in this pull request?

This PR modifies some codegen related tests to test escape characters for 
datetime functions which are time zone aware. If the timezone is absent, the 
formatter could result in `null` caused by `java.util.NoSuchElementException: 
None.get` and bypassing the real intention of those test cases.

### Why are the changes needed?

fix tests

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing the modified test cases.

Closes #28653 from yaooqinn/SPARK-31835.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/expressions/DateExpressionsSuite.scala   | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
index 02d6d84..1ca7380 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
@@ -792,7 +792,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 // Test escaping of format
-GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), 
Literal("\"quote")) :: Nil)
+GenerateUnsafeProjection.generate(FromUnixTime(Literal(0L), 
Literal("\"quote"), UTC_OPT) :: Nil)
   }
 
   test("unix_timestamp") {
@@ -862,7 +862,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
 // Test escaping of format
 GenerateUnsafeProjection.generate(
-  UnixTimestamp(Literal("2015-07-24"), Literal("\"quote")) :: Nil)
+  UnixTimestamp(Literal("2015-07-24"), Literal("\"quote"), UTC_OPT) :: Nil)
   }
 
   test("to_unix_timestamp") {
@@ -940,7 +940,7 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
 // Test escaping of format
 GenerateUnsafeProjection.generate(
-  ToUnixTimestamp(Literal("2015-07-24"), Literal("\"quote")) :: Nil)
+  ToUnixTimestamp(Literal("2015-07-24"), Literal("\"quote"), UTC_OPT) :: 
Nil)
   }
 
   test("datediff") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (6b055a4 -> e1adbac)

2020-05-27 Thread holden

This is an automated email from the ASF dual-hosted git repository.

holden pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6b055a4  Preparing development version 2.4.7-SNAPSHOT
 add f5962ca  Preparing Spark release v2.4.6-rc5
 new e1adbac  Preparing development version 2.4.7-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] tag v2.4.6-rc5 created (now f5962ca)

2020-05-27 Thread holden

This is an automated email from the ASF dual-hosted git repository.

holden pushed a change to tag v2.4.6-rc5
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at f5962ca  (commit)
This tag includes the following new commits:

 new f5962ca  Preparing Spark release v2.4.6-rc5

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v2.4.6-rc5

2020-05-27 Thread holden

This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to tag v2.4.6-rc5
in repository https://gitbox.apache.org/repos/asf/spark.git

commit f5962ca743c65d89f0f539b4b61518bce19a5af1
Author: Holden Karau 
AuthorDate: Wed May 27 16:38:33 2020 +

Preparing Spark release v2.4.6-rc5
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index b70014d..c913a38 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.4.7
+Version: 2.4.6
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 712cc7f..de59f40 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.7-SNAPSHOT
+2.4.6
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 825d771..b3a2265 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.7-SNAPSHOT
+2.4.6
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 9dd26b3..d3db527 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.7-SNAPSHOT
+2.4.6
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 386782b..34872fd 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.7-SNAPSHOT
+2.4.6
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 8496a68..5059ee0 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.7-SNAPSHOT
+2.4.6
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 9a584b3..7d46020 1006

[spark] 01/01: Preparing development version 2.4.7-SNAPSHOT

2020-05-27 Thread holden

This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit e1adbac033f5be3fb2843788a4dfc6e1943fcc2c
Author: Holden Karau 
AuthorDate: Wed May 27 16:38:38 2020 +

Preparing development version 2.4.7-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index c913a38..b70014d 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.4.6
+Version: 2.4.7
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index de59f40..712cc7f 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6
+2.4.7-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index b3a2265..825d771 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6
+2.4.7-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index d3db527..9dd26b3 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6
+2.4.7-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 34872fd..386782b 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6
+2.4.7-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 5059ee0..8496a68 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6
+2.4.7-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 7d46020.

[spark] branch master updated (50492c0 -> f6f1e51)

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 50492c0  [SPARK-31803][ML] Make sure instance weight is not negative
 add f6f1e51  [SPARK-31719][SQL] Refactor JoinSelection

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/joins.scala   | 158 +
 .../optimizer/JoinSelectionHelperSuite.scala   | 186 +
 .../spark/sql/execution/SparkStrategies.scala  | 153 -
 .../adaptive/LogicalQueryStageStrategy.scala   |   3 +-
 .../adaptive/OptimizeLocalShuffleReader.scala  |   3 +-
 .../dynamicpruning/PlanDynamicPruningFilters.scala |   1 +
 .../execution/joins/BroadcastHashJoinExec.scala|   1 +
 .../joins/BroadcastNestedLoopJoinExec.scala|   1 +
 .../spark/sql/execution/joins/HashJoin.scala   |   1 +
 .../sql/execution/joins/ShuffledHashJoinExec.scala |   1 +
 .../apache/spark/sql/execution/joins/package.scala |  31 
 .../scala/org/apache/spark/sql/JoinHintSuite.scala |   2 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  |   3 +-
 .../sql/execution/joins/BroadcastJoinSuite.scala   |   1 +
 .../sql/execution/joins/ExistenceJoinSuite.scala   |   1 +
 .../spark/sql/execution/joins/InnerJoinSuite.scala |   9 +-
 .../spark/sql/execution/joins/OuterJoinSuite.scala |   1 +
 17 files changed, 395 insertions(+), 161 deletions(-)
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinSelectionHelperSuite.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/package.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (50492c0 -> f6f1e51)

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 50492c0  [SPARK-31803][ML] Make sure instance weight is not negative
 add f6f1e51  [SPARK-31719][SQL] Refactor JoinSelection

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/joins.scala   | 158 +
 .../optimizer/JoinSelectionHelperSuite.scala   | 186 +
 .../spark/sql/execution/SparkStrategies.scala  | 153 -
 .../adaptive/LogicalQueryStageStrategy.scala   |   3 +-
 .../adaptive/OptimizeLocalShuffleReader.scala  |   3 +-
 .../dynamicpruning/PlanDynamicPruningFilters.scala |   1 +
 .../execution/joins/BroadcastHashJoinExec.scala|   1 +
 .../joins/BroadcastNestedLoopJoinExec.scala|   1 +
 .../spark/sql/execution/joins/HashJoin.scala   |   1 +
 .../sql/execution/joins/ShuffledHashJoinExec.scala |   1 +
 .../apache/spark/sql/execution/joins/package.scala |  31 
 .../scala/org/apache/spark/sql/JoinHintSuite.scala |   2 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  |   3 +-
 .../sql/execution/joins/BroadcastJoinSuite.scala   |   1 +
 .../sql/execution/joins/ExistenceJoinSuite.scala   |   1 +
 .../spark/sql/execution/joins/InnerJoinSuite.scala |   9 +-
 .../spark/sql/execution/joins/OuterJoinSuite.scala |   1 +
 17 files changed, 395 insertions(+), 161 deletions(-)
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinSelectionHelperSuite.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/package.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (50492c0 -> f6f1e51)

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 50492c0  [SPARK-31803][ML] Make sure instance weight is not negative
 add f6f1e51  [SPARK-31719][SQL] Refactor JoinSelection

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/joins.scala   | 158 +
 .../optimizer/JoinSelectionHelperSuite.scala   | 186 +
 .../spark/sql/execution/SparkStrategies.scala  | 153 -
 .../adaptive/LogicalQueryStageStrategy.scala   |   3 +-
 .../adaptive/OptimizeLocalShuffleReader.scala  |   3 +-
 .../dynamicpruning/PlanDynamicPruningFilters.scala |   1 +
 .../execution/joins/BroadcastHashJoinExec.scala|   1 +
 .../joins/BroadcastNestedLoopJoinExec.scala|   1 +
 .../spark/sql/execution/joins/HashJoin.scala   |   1 +
 .../sql/execution/joins/ShuffledHashJoinExec.scala |   1 +
 .../apache/spark/sql/execution/joins/package.scala |  31 
 .../scala/org/apache/spark/sql/JoinHintSuite.scala |   2 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  |   3 +-
 .../sql/execution/joins/BroadcastJoinSuite.scala   |   1 +
 .../sql/execution/joins/ExistenceJoinSuite.scala   |   1 +
 .../spark/sql/execution/joins/InnerJoinSuite.scala |   9 +-
 .../spark/sql/execution/joins/OuterJoinSuite.scala |   1 +
 17 files changed, 395 insertions(+), 161 deletions(-)
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinSelectionHelperSuite.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/package.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (50492c0 -> f6f1e51)

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 50492c0  [SPARK-31803][ML] Make sure instance weight is not negative
 add f6f1e51  [SPARK-31719][SQL] Refactor JoinSelection

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/joins.scala   | 158 +
 .../optimizer/JoinSelectionHelperSuite.scala   | 186 +
 .../spark/sql/execution/SparkStrategies.scala  | 153 -
 .../adaptive/LogicalQueryStageStrategy.scala   |   3 +-
 .../adaptive/OptimizeLocalShuffleReader.scala  |   3 +-
 .../dynamicpruning/PlanDynamicPruningFilters.scala |   1 +
 .../execution/joins/BroadcastHashJoinExec.scala|   1 +
 .../joins/BroadcastNestedLoopJoinExec.scala|   1 +
 .../spark/sql/execution/joins/HashJoin.scala   |   1 +
 .../sql/execution/joins/ShuffledHashJoinExec.scala |   1 +
 .../apache/spark/sql/execution/joins/package.scala |  31 
 .../scala/org/apache/spark/sql/JoinHintSuite.scala |   2 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  |   3 +-
 .../sql/execution/joins/BroadcastJoinSuite.scala   |   1 +
 .../sql/execution/joins/ExistenceJoinSuite.scala   |   1 +
 .../spark/sql/execution/joins/InnerJoinSuite.scala |   9 +-
 .../spark/sql/execution/joins/OuterJoinSuite.scala |   1 +
 17 files changed, 395 insertions(+), 161 deletions(-)
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinSelectionHelperSuite.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/package.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31719][SQL] Refactor JoinSelection

2020-05-27 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f6f1e51  [SPARK-31719][SQL] Refactor JoinSelection
f6f1e51 is described below

commit f6f1e51072d6d7cb67486257f4c86447d959718f
Author: Ali Afroozeh 
AuthorDate: Wed May 27 15:49:08 2020 +

[SPARK-31719][SQL] Refactor JoinSelection

### What changes were proposed in this pull request?
This PR extracts the logic for selecting the planned join type out of the 
`JoinSelection` rule and moves it to `JoinSelectionHelper` in Catalyst.

### Why are the changes needed?
This change both cleans up the code in `JoinSelection` and allows the logic 
to be in one place and be used from other rules that need to make decision 
based on the join type before the planning time.

### Does this PR introduce _any_ user-facing change?
`BuildSide`, `BuildLeft`, and `BuildRight` are moved from 
`org.apache.spark.sql.execution` to Catalyst in 
`org.apache.spark.sql.catalyst.optimizer`.

### How was this patch tested?
This is a refactoring, passes existing tests.

Closes #28540 from dbaliafroozeh/RefactorJoinSelection.

Authored-by: Ali Afroozeh 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/optimizer/joins.scala   | 158 +
 .../optimizer/JoinSelectionHelperSuite.scala   | 186 +
 .../spark/sql/execution/SparkStrategies.scala  | 153 -
 .../adaptive/LogicalQueryStageStrategy.scala   |   3 +-
 .../adaptive/OptimizeLocalShuffleReader.scala  |   3 +-
 .../dynamicpruning/PlanDynamicPruningFilters.scala |   1 +
 .../execution/joins/BroadcastHashJoinExec.scala|   1 +
 .../joins/BroadcastNestedLoopJoinExec.scala|   1 +
 .../spark/sql/execution/joins/HashJoin.scala   |   1 +
 .../sql/execution/joins/ShuffledHashJoinExec.scala |   1 +
 .../apache/spark/sql/execution/joins/package.scala |  31 
 .../scala/org/apache/spark/sql/JoinHintSuite.scala |   2 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  |   3 +-
 .../sql/execution/joins/BroadcastJoinSuite.scala   |   1 +
 .../sql/execution/joins/ExistenceJoinSuite.scala   |   1 +
 .../spark/sql/execution/joins/InnerJoinSuite.scala |   9 +-
 .../spark/sql/execution/joins/OuterJoinSuite.scala |   1 +
 17 files changed, 395 insertions(+), 161 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
index b65221c..85c6600 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
@@ -208,3 +208,161 @@ object ExtractPythonUDFFromJoinCondition extends 
Rule[LogicalPlan] with Predicat
   }
   }
 }
+
+sealed abstract class BuildSide
+
+case object BuildRight extends BuildSide
+
+case object BuildLeft extends BuildSide
+
+trait JoinSelectionHelper {
+
+  def getBroadcastBuildSide(
+  left: LogicalPlan,
+  right: LogicalPlan,
+  joinType: JoinType,
+  hint: JoinHint,
+  hintOnly: Boolean,
+  conf: SQLConf): Option[BuildSide] = {
+val buildLeft = if (hintOnly) {
+  hintToBroadcastLeft(hint)
+} else {
+  canBroadcastBySize(left, conf) && !hintToNotBroadcastLeft(hint)
+}
+val buildRight = if (hintOnly) {
+  hintToBroadcastRight(hint)
+} else {
+  canBroadcastBySize(right, conf) && !hintToNotBroadcastRight(hint)
+}
+getBuildSide(
+  canBuildLeft(joinType) && buildLeft,
+  canBuildRight(joinType) && buildRight,
+  left,
+  right
+)
+  }
+
+  def getShuffleHashJoinBuildSide(
+  left: LogicalPlan,
+  right: LogicalPlan,
+  joinType: JoinType,
+  hint: JoinHint,
+  hintOnly: Boolean,
+  conf: SQLConf): Option[BuildSide] = {
+val buildLeft = if (hintOnly) {
+  hintToShuffleHashJoinLeft(hint)
+} else {
+  canBuildLocalHashMapBySize(left, conf) && muchSmaller(left, right)
+}
+val buildRight = if (hintOnly) {
+  hintToShuffleHashJoinRight(hint)
+} else {
+  canBuildLocalHashMapBySize(right, conf) && muchSmaller(right, left)
+}
+getBuildSide(
+  canBuildLeft(joinType) && buildLeft,
+  canBuildRight(joinType) && buildRight,
+  left,
+  right
+)
+  }
+
+  def getSmallerSide(left: LogicalPlan, right: LogicalPlan): BuildSide = {
+if (right.stats.sizeInBytes <= left.stats.sizeInBytes) BuildRight else 
BuildLeft
+  }
+
+  /**
+   * Matches a plan whose output should be small enough to be used in 
broadcast join.
+   */
+  def canBroadcastBySize(plan: LogicalPlan, conf: SQLConf): Boolean = {
+plan.stats.sizeInBytes >= 0 && plan.stats.sizeInBytes <= 
conf.autoBroadcast

[spark] branch master updated (765105b -> 50492c0)

2020-05-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 765105b  [SPARK-31638][WEBUI] Clean Pagination code for all webUI pages
 add 50492c0  [SPARK-31803][ML] Make sure instance weight is not negative

No new revisions were added by this update.

Summary of changes:
 mllib/src/main/scala/org/apache/spark/ml/Predictor.scala   | 3 ++-
 .../main/scala/org/apache/spark/ml/classification/NaiveBayes.scala | 5 +++--
 .../scala/org/apache/spark/ml/clustering/BisectingKMeans.scala | 3 ++-
 .../scala/org/apache/spark/ml/clustering/GaussianMixture.scala | 3 ++-
 mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala   | 3 ++-
 .../apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala | 3 ++-
 .../scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala | 3 ++-
 .../scala/org/apache/spark/ml/evaluation/ClusteringMetrics.scala   | 2 --
 .../spark/ml/evaluation/MulticlassClassificationEvaluator.scala| 3 ++-
 .../scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala | 4 +++-
 mllib/src/main/scala/org/apache/spark/ml/functions.scala   | 6 ++
 .../apache/spark/ml/regression/GeneralizedLinearRegression.scala   | 3 ++-
 .../scala/org/apache/spark/ml/regression/IsotonicRegression.scala  | 7 ---
 13 files changed, 32 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (765105b -> 50492c0)

2020-05-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 765105b  [SPARK-31638][WEBUI] Clean Pagination code for all webUI pages
 add 50492c0  [SPARK-31803][ML] Make sure instance weight is not negative

No new revisions were added by this update.

Summary of changes:
 mllib/src/main/scala/org/apache/spark/ml/Predictor.scala   | 3 ++-
 .../main/scala/org/apache/spark/ml/classification/NaiveBayes.scala | 5 +++--
 .../scala/org/apache/spark/ml/clustering/BisectingKMeans.scala | 3 ++-
 .../scala/org/apache/spark/ml/clustering/GaussianMixture.scala | 3 ++-
 mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala   | 3 ++-
 .../apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala | 3 ++-
 .../scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala | 3 ++-
 .../scala/org/apache/spark/ml/evaluation/ClusteringMetrics.scala   | 2 --
 .../spark/ml/evaluation/MulticlassClassificationEvaluator.scala| 3 ++-
 .../scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala | 4 +++-
 mllib/src/main/scala/org/apache/spark/ml/functions.scala   | 6 ++
 .../apache/spark/ml/regression/GeneralizedLinearRegression.scala   | 3 ++-
 .../scala/org/apache/spark/ml/regression/IsotonicRegression.scala  | 7 ---
 13 files changed, 32 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8f2b6f3 -> 765105b)

2020-05-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8f2b6f3  [SPARK-31393][SQL][FOLLOW-UP] Show the correct alias in 
schema for expression
 add 765105b  [SPARK-31638][WEBUI] Clean Pagination code for all webUI pages

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ui/jobs/AllJobsPage.scala |  23 +--
 .../scala/org/apache/spark/ui/jobs/StagePage.scala |  14 +-
 .../org/apache/spark/ui/jobs/StageTable.scala  |  21 +--
 .../scala/org/apache/spark/ui/StagePageSuite.scala |   1 -
 .../spark/sql/execution/ui/AllExecutionsPage.scala |  29 +---
 .../hive/thriftserver/ui/ThriftServerPage.scala| 164 +
 6 files changed, 93 insertions(+), 159 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8f2b6f3 -> 765105b)

2020-05-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8f2b6f3  [SPARK-31393][SQL][FOLLOW-UP] Show the correct alias in 
schema for expression
 add 765105b  [SPARK-31638][WEBUI] Clean Pagination code for all webUI pages

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ui/jobs/AllJobsPage.scala |  23 +--
 .../scala/org/apache/spark/ui/jobs/StagePage.scala |  14 +-
 .../org/apache/spark/ui/jobs/StageTable.scala  |  21 +--
 .../scala/org/apache/spark/ui/StagePageSuite.scala |   1 -
 .../spark/sql/execution/ui/AllExecutionsPage.scala |  29 +---
 .../hive/thriftserver/ui/ThriftServerPage.scala| 164 +
 6 files changed, 93 insertions(+), 159 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

60 matches

Mail list logo