[spark] branch master updated (c28a6fa -> db47c6e)

2020-07-15 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c28a6fa  [SPARK-29292][SQL][ML] Update rest of default modules (Hive, 
ML, etc) for Scala 2.13 compilation
 add db47c6e  [SPARK-32125][UI] Support get taskList by status in Web UI 
and SHS Rest API

No new revisions were added by this update.

Summary of changes:
 .../api/v1/{StageStatus.java => TaskStatus.java}   |  14 +-
 .../org/apache/spark/status/AppStatusStore.scala   |  16 +-
 .../spark/status/api/v1/StagesResource.scala   |   5 +-
 ...st_w__status___offset___length_expectation.json |  99 
 ...__sortBy_short_names__runtime_expectation.json} |   0
 .../stage_task_list_w__status_expectation.json | 531 +
 .../spark/deploy/history/HistoryServerSuite.scala  |   6 +
 docs/monitoring.md |   3 +-
 8 files changed, 660 insertions(+), 14 deletions(-)
 copy core/src/main/java/org/apache/spark/status/api/v1/{StageStatus.java => 
TaskStatus.java} (83%)
 create mode 100644 
core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status___offset___length_expectation.json
 copy 
core/src/test/resources/HistoryServerExpectations/{stage_task_list_w__sortBy_short_names__runtime_expectation.json
 => stage_task_list_w__status___sortBy_short_names__runtime_expectation.json} 
(100%)
 create mode 100644 
core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status_expectation.json


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c28a6fa -> db47c6e)

2020-07-15 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c28a6fa  [SPARK-29292][SQL][ML] Update rest of default modules (Hive, 
ML, etc) for Scala 2.13 compilation
 add db47c6e  [SPARK-32125][UI] Support get taskList by status in Web UI 
and SHS Rest API

No new revisions were added by this update.

Summary of changes:
 .../api/v1/{StageStatus.java => TaskStatus.java}   |  14 +-
 .../org/apache/spark/status/AppStatusStore.scala   |  16 +-
 .../spark/status/api/v1/StagesResource.scala   |   5 +-
 ...st_w__status___offset___length_expectation.json |  99 
 ...__sortBy_short_names__runtime_expectation.json} |   0
 .../stage_task_list_w__status_expectation.json | 531 +
 .../spark/deploy/history/HistoryServerSuite.scala  |   6 +
 docs/monitoring.md |   3 +-
 8 files changed, 660 insertions(+), 14 deletions(-)
 copy core/src/main/java/org/apache/spark/status/api/v1/{StageStatus.java => 
TaskStatus.java} (83%)
 create mode 100644 
core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status___offset___length_expectation.json
 copy 
core/src/test/resources/HistoryServerExpectations/{stage_task_list_w__sortBy_short_names__runtime_expectation.json
 => stage_task_list_w__status___sortBy_short_names__runtime_expectation.json} 
(100%)
 create mode 100644 
core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status_expectation.json


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c28a6fa -> db47c6e)

2020-07-15 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c28a6fa  [SPARK-29292][SQL][ML] Update rest of default modules (Hive, 
ML, etc) for Scala 2.13 compilation
 add db47c6e  [SPARK-32125][UI] Support get taskList by status in Web UI 
and SHS Rest API

No new revisions were added by this update.

Summary of changes:
 .../api/v1/{StageStatus.java => TaskStatus.java}   |  14 +-
 .../org/apache/spark/status/AppStatusStore.scala   |  16 +-
 .../spark/status/api/v1/StagesResource.scala   |   5 +-
 ...st_w__status___offset___length_expectation.json |  99 
 ...__sortBy_short_names__runtime_expectation.json} |   0
 .../stage_task_list_w__status_expectation.json | 531 +
 .../spark/deploy/history/HistoryServerSuite.scala  |   6 +
 docs/monitoring.md |   3 +-
 8 files changed, 660 insertions(+), 14 deletions(-)
 copy core/src/main/java/org/apache/spark/status/api/v1/{StageStatus.java => 
TaskStatus.java} (83%)
 create mode 100644 
core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status___offset___length_expectation.json
 copy 
core/src/test/resources/HistoryServerExpectations/{stage_task_list_w__sortBy_short_names__runtime_expectation.json
 => stage_task_list_w__status___sortBy_short_names__runtime_expectation.json} 
(100%)
 create mode 100644 
core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status_expectation.json


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c28a6fa -> db47c6e)

2020-07-15 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c28a6fa  [SPARK-29292][SQL][ML] Update rest of default modules (Hive, 
ML, etc) for Scala 2.13 compilation
 add db47c6e  [SPARK-32125][UI] Support get taskList by status in Web UI 
and SHS Rest API

No new revisions were added by this update.

Summary of changes:
 .../api/v1/{StageStatus.java => TaskStatus.java}   |  14 +-
 .../org/apache/spark/status/AppStatusStore.scala   |  16 +-
 .../spark/status/api/v1/StagesResource.scala   |   5 +-
 ...st_w__status___offset___length_expectation.json |  99 
 ...__sortBy_short_names__runtime_expectation.json} |   0
 .../stage_task_list_w__status_expectation.json | 531 +
 .../spark/deploy/history/HistoryServerSuite.scala  |   6 +
 docs/monitoring.md |   3 +-
 8 files changed, 660 insertions(+), 14 deletions(-)
 copy core/src/main/java/org/apache/spark/status/api/v1/{StageStatus.java => 
TaskStatus.java} (83%)
 create mode 100644 
core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status___offset___length_expectation.json
 copy 
core/src/test/resources/HistoryServerExpectations/{stage_task_list_w__sortBy_short_names__runtime_expectation.json
 => stage_task_list_w__status___sortBy_short_names__runtime_expectation.json} 
(100%)
 create mode 100644 
core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status_expectation.json


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c28a6fa -> db47c6e)

2020-07-15 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c28a6fa  [SPARK-29292][SQL][ML] Update rest of default modules (Hive, 
ML, etc) for Scala 2.13 compilation
 add db47c6e  [SPARK-32125][UI] Support get taskList by status in Web UI 
and SHS Rest API

No new revisions were added by this update.

Summary of changes:
 .../api/v1/{StageStatus.java => TaskStatus.java}   |  14 +-
 .../org/apache/spark/status/AppStatusStore.scala   |  16 +-
 .../spark/status/api/v1/StagesResource.scala   |   5 +-
 ...st_w__status___offset___length_expectation.json |  99 
 ...__sortBy_short_names__runtime_expectation.json} |   0
 .../stage_task_list_w__status_expectation.json | 531 +
 .../spark/deploy/history/HistoryServerSuite.scala  |   6 +
 docs/monitoring.md |   3 +-
 8 files changed, 660 insertions(+), 14 deletions(-)
 copy core/src/main/java/org/apache/spark/status/api/v1/{StageStatus.java => 
TaskStatus.java} (83%)
 create mode 100644 
core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status___offset___length_expectation.json
 copy 
core/src/test/resources/HistoryServerExpectations/{stage_task_list_w__sortBy_short_names__runtime_expectation.json
 => stage_task_list_w__status___sortBy_short_names__runtime_expectation.json} 
(100%)
 create mode 100644 
core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status_expectation.json


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders"

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4ef535ff Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized 
expression should exclude inputEncoders"
4ef535ff is described below

commit 4ef535fffbc1cacbacb035b2b1ac1dffcc0352b4
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 17:43:23 2020 -0700

Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should 
exclude inputEncoders"

This reverts commit 785ec2ee6c2473f54b7ca6c01f446cc8bdf883fa.
---
 .../org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala |  6 --
 sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala  | 12 
 2 files changed, 18 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
index 2706e4d..58a9f68 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
@@ -57,12 +57,6 @@ case class ScalaUDF(
 
   override def toString: String = 
s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})"
 
-  override lazy val canonicalized: Expression = {
-// SPARK-32307: `ExpressionEncoder` can't be canonicalized, and 
technically we don't
-// need it to identify a `ScalaUDF`.
-Canonicalize.execute(copy(children = children.map(_.canonicalized), 
inputEncoders = Nil))
-  }
-
   /**
* The analyzer should be aware of Scala primitive types so as to make the
* UDF return null if there is any null input value of these types. On the
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
index 2ab14d5..91e9f1d 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
@@ -609,16 +609,4 @@ class UDFSuite extends QueryTest with SharedSparkSession {
 }
 
assert(e2.getMessage.contains("UDFSuite$MalformedClassObject$MalformedPrimitiveFunction"))
   }
-
-  test("SPARK-32307: Aggression that use map type input UDF as group 
expression") {
-spark.udf.register("key", udf((m: Map[String, String]) => 
m.keys.head.toInt))
-Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
-checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: 
Nil)
-  }
-
-  test("SPARK-32307: Aggression that use array type input UDF as group 
expression") {
-spark.udf.register("key", udf((m: Array[Int]) => m.head))
-Seq(Array(1)).toDF("a").createOrReplaceTempView("t")
-checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: 
Nil)
-  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders"

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4ef535ff Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized 
expression should exclude inputEncoders"
4ef535ff is described below

commit 4ef535fffbc1cacbacb035b2b1ac1dffcc0352b4
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 17:43:23 2020 -0700

Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should 
exclude inputEncoders"

This reverts commit 785ec2ee6c2473f54b7ca6c01f446cc8bdf883fa.
---
 .../org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala |  6 --
 sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala  | 12 
 2 files changed, 18 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
index 2706e4d..58a9f68 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
@@ -57,12 +57,6 @@ case class ScalaUDF(
 
   override def toString: String = 
s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})"
 
-  override lazy val canonicalized: Expression = {
-// SPARK-32307: `ExpressionEncoder` can't be canonicalized, and 
technically we don't
-// need it to identify a `ScalaUDF`.
-Canonicalize.execute(copy(children = children.map(_.canonicalized), 
inputEncoders = Nil))
-  }
-
   /**
* The analyzer should be aware of Scala primitive types so as to make the
* UDF return null if there is any null input value of these types. On the
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
index 2ab14d5..91e9f1d 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
@@ -609,16 +609,4 @@ class UDFSuite extends QueryTest with SharedSparkSession {
 }
 
assert(e2.getMessage.contains("UDFSuite$MalformedClassObject$MalformedPrimitiveFunction"))
   }
-
-  test("SPARK-32307: Aggression that use map type input UDF as group 
expression") {
-spark.udf.register("key", udf((m: Map[String, String]) => 
m.keys.head.toInt))
-Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
-checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: 
Nil)
-  }
-
-  test("SPARK-32307: Aggression that use array type input UDF as group 
expression") {
-spark.udf.register("key", udf((m: Array[Int]) => m.head))
-Seq(Array(1)).toDF("a").createOrReplaceTempView("t")
-checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: 
Nil)
-  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders"

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4ef535ff Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized 
expression should exclude inputEncoders"
4ef535ff is described below

commit 4ef535fffbc1cacbacb035b2b1ac1dffcc0352b4
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 17:43:23 2020 -0700

Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should 
exclude inputEncoders"

This reverts commit 785ec2ee6c2473f54b7ca6c01f446cc8bdf883fa.
---
 .../org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala |  6 --
 sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala  | 12 
 2 files changed, 18 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
index 2706e4d..58a9f68 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
@@ -57,12 +57,6 @@ case class ScalaUDF(
 
   override def toString: String = 
s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})"
 
-  override lazy val canonicalized: Expression = {
-// SPARK-32307: `ExpressionEncoder` can't be canonicalized, and 
technically we don't
-// need it to identify a `ScalaUDF`.
-Canonicalize.execute(copy(children = children.map(_.canonicalized), 
inputEncoders = Nil))
-  }
-
   /**
* The analyzer should be aware of Scala primitive types so as to make the
* UDF return null if there is any null input value of these types. On the
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
index 2ab14d5..91e9f1d 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
@@ -609,16 +609,4 @@ class UDFSuite extends QueryTest with SharedSparkSession {
 }
 
assert(e2.getMessage.contains("UDFSuite$MalformedClassObject$MalformedPrimitiveFunction"))
   }
-
-  test("SPARK-32307: Aggression that use map type input UDF as group 
expression") {
-spark.udf.register("key", udf((m: Map[String, String]) => 
m.keys.head.toInt))
-Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
-checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: 
Nil)
-  }
-
-  test("SPARK-32307: Aggression that use array type input UDF as group 
expression") {
-spark.udf.register("key", udf((m: Array[Int]) => m.head))
-Seq(Array(1)).toDF("a").createOrReplaceTempView("t")
-checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: 
Nil)
-  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders"

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4ef535ff Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized 
expression should exclude inputEncoders"
4ef535ff is described below

commit 4ef535fffbc1cacbacb035b2b1ac1dffcc0352b4
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 17:43:23 2020 -0700

Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should 
exclude inputEncoders"

This reverts commit 785ec2ee6c2473f54b7ca6c01f446cc8bdf883fa.
---
 .../org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala |  6 --
 sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala  | 12 
 2 files changed, 18 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
index 2706e4d..58a9f68 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
@@ -57,12 +57,6 @@ case class ScalaUDF(
 
   override def toString: String = 
s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})"
 
-  override lazy val canonicalized: Expression = {
-// SPARK-32307: `ExpressionEncoder` can't be canonicalized, and 
technically we don't
-// need it to identify a `ScalaUDF`.
-Canonicalize.execute(copy(children = children.map(_.canonicalized), 
inputEncoders = Nil))
-  }
-
   /**
* The analyzer should be aware of Scala primitive types so as to make the
* UDF return null if there is any null input value of these types. On the
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
index 2ab14d5..91e9f1d 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
@@ -609,16 +609,4 @@ class UDFSuite extends QueryTest with SharedSparkSession {
 }
 
assert(e2.getMessage.contains("UDFSuite$MalformedClassObject$MalformedPrimitiveFunction"))
   }
-
-  test("SPARK-32307: Aggression that use map type input UDF as group 
expression") {
-spark.udf.register("key", udf((m: Map[String, String]) => 
m.keys.head.toInt))
-Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
-checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: 
Nil)
-  }
-
-  test("SPARK-32307: Aggression that use array type input UDF as group 
expression") {
-spark.udf.register("key", udf((m: Array[Int]) => m.head))
-Seq(Array(1)).toDF("a").createOrReplaceTempView("t")
-checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: 
Nil)
-  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders"

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4ef535ff Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized 
expression should exclude inputEncoders"
4ef535ff is described below

commit 4ef535fffbc1cacbacb035b2b1ac1dffcc0352b4
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 17:43:23 2020 -0700

Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should 
exclude inputEncoders"

This reverts commit 785ec2ee6c2473f54b7ca6c01f446cc8bdf883fa.
---
 .../org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala |  6 --
 sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala  | 12 
 2 files changed, 18 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
index 2706e4d..58a9f68 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
@@ -57,12 +57,6 @@ case class ScalaUDF(
 
   override def toString: String = 
s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})"
 
-  override lazy val canonicalized: Expression = {
-// SPARK-32307: `ExpressionEncoder` can't be canonicalized, and 
technically we don't
-// need it to identify a `ScalaUDF`.
-Canonicalize.execute(copy(children = children.map(_.canonicalized), 
inputEncoders = Nil))
-  }
-
   /**
* The analyzer should be aware of Scala primitive types so as to make the
* UDF return null if there is any null input value of these types. On the
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
index 2ab14d5..91e9f1d 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala
@@ -609,16 +609,4 @@ class UDFSuite extends QueryTest with SharedSparkSession {
 }
 
assert(e2.getMessage.contains("UDFSuite$MalformedClassObject$MalformedPrimitiveFunction"))
   }
-
-  test("SPARK-32307: Aggression that use map type input UDF as group 
expression") {
-spark.udf.register("key", udf((m: Map[String, String]) => 
m.keys.head.toInt))
-Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t")
-checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: 
Nil)
-  }
-
-  test("SPARK-32307: Aggression that use array type input UDF as group 
expression") {
-spark.udf.register("key", udf((m: Array[Int]) => m.head))
-Seq(Array(1)).toDF("a").createOrReplaceTempView("t")
-checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: 
Nil)
-  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b05f309 -> c28a6fa)

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel
 add c28a6fa  [SPARK-29292][SQL][ML] Update rest of default modules (Hive, 
ML, etc) for Scala 2.13 compilation

No new revisions were added by this update.

Summary of changes:
 .../spark/examples/ml/JavaTokenizerExample.java|  4 ++--
 .../org/apache/spark/examples/SparkKMeans.scala|  8 ++-
 .../apache/spark/sql/avro/SchemaConverters.scala   |  4 ++--
 .../spark/sql/kafka010/KafkaOffsetReader.scala |  2 +-
 .../sql/kafka010/KafkaMicroBatchSourceSuite.scala  |  4 ++--
 .../main/scala/org/apache/spark/ml/Estimator.scala |  2 +-
 .../spark/ml/clustering/GaussianMixture.scala  | 28 +++---
 .../org/apache/spark/ml/feature/RobustScaler.scala |  4 ++--
 .../org/apache/spark/ml/feature/Word2Vec.scala |  2 +-
 .../scala/org/apache/spark/ml/param/params.scala   |  2 +-
 .../spark/mllib/api/python/PythonMLLibAPI.scala|  8 +++
 .../spark/mllib/clustering/BisectingKMeans.scala   |  2 +-
 .../spark/mllib/clustering/GaussianMixture.scala   | 10 
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala|  2 +-
 .../org/apache/spark/mllib/rdd/SlidingRDD.scala|  2 +-
 .../apache/spark/mllib/tree/impurity/Entropy.scala |  2 +-
 .../apache/spark/mllib/tree/impurity/Gini.scala|  2 +-
 .../spark/mllib/tree/impurity/Variance.scala   |  2 +-
 .../apache/spark/mllib/util/NumericParser.scala|  8 +++
 .../spark/ml/clustering/BisectingKMeansSuite.scala |  4 ++--
 .../apache/spark/ml/clustering/KMeansSuite.scala   | 12 +-
 .../ml/evaluation/ClusteringEvaluatorSuite.scala   |  2 +-
 .../apache/spark/ml/feature/NormalizerSuite.scala  | 12 +-
 .../apache/spark/ml/recommendation/ALSSuite.scala  | 12 +-
 .../spark/sql/hive/HiveExternalCatalog.scala   |  8 +++
 .../org/apache/spark/sql/hive/HiveInspectors.scala |  4 ++--
 .../spark/sql/hive/HiveMetastoreCatalog.scala  |  4 ++--
 .../org/apache/spark/sql/hive/HiveUtils.scala  |  4 ++--
 .../spark/sql/hive/client/HiveClientImpl.scala | 24 +--
 .../apache/spark/sql/hive/client/HiveShim.scala| 10 
 .../spark/sql/hive/execution/HiveOptions.scala |  2 +-
 .../sql/hive/execution/HiveTableScanExec.scala |  2 +-
 .../scala/org/apache/spark/sql/hive/hiveUDFs.scala |  4 ++--
 .../spark/sql/hive/HiveShowCreateTableSuite.scala  |  2 +-
 .../apache/spark/sql/hive/StatisticsSuite.scala|  2 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  2 +-
 36 files changed, 106 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b05f309 -> c28a6fa)

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel
 add c28a6fa  [SPARK-29292][SQL][ML] Update rest of default modules (Hive, 
ML, etc) for Scala 2.13 compilation

No new revisions were added by this update.

Summary of changes:
 .../spark/examples/ml/JavaTokenizerExample.java|  4 ++--
 .../org/apache/spark/examples/SparkKMeans.scala|  8 ++-
 .../apache/spark/sql/avro/SchemaConverters.scala   |  4 ++--
 .../spark/sql/kafka010/KafkaOffsetReader.scala |  2 +-
 .../sql/kafka010/KafkaMicroBatchSourceSuite.scala  |  4 ++--
 .../main/scala/org/apache/spark/ml/Estimator.scala |  2 +-
 .../spark/ml/clustering/GaussianMixture.scala  | 28 +++---
 .../org/apache/spark/ml/feature/RobustScaler.scala |  4 ++--
 .../org/apache/spark/ml/feature/Word2Vec.scala |  2 +-
 .../scala/org/apache/spark/ml/param/params.scala   |  2 +-
 .../spark/mllib/api/python/PythonMLLibAPI.scala|  8 +++
 .../spark/mllib/clustering/BisectingKMeans.scala   |  2 +-
 .../spark/mllib/clustering/GaussianMixture.scala   | 10 
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala|  2 +-
 .../org/apache/spark/mllib/rdd/SlidingRDD.scala|  2 +-
 .../apache/spark/mllib/tree/impurity/Entropy.scala |  2 +-
 .../apache/spark/mllib/tree/impurity/Gini.scala|  2 +-
 .../spark/mllib/tree/impurity/Variance.scala   |  2 +-
 .../apache/spark/mllib/util/NumericParser.scala|  8 +++
 .../spark/ml/clustering/BisectingKMeansSuite.scala |  4 ++--
 .../apache/spark/ml/clustering/KMeansSuite.scala   | 12 +-
 .../ml/evaluation/ClusteringEvaluatorSuite.scala   |  2 +-
 .../apache/spark/ml/feature/NormalizerSuite.scala  | 12 +-
 .../apache/spark/ml/recommendation/ALSSuite.scala  | 12 +-
 .../spark/sql/hive/HiveExternalCatalog.scala   |  8 +++
 .../org/apache/spark/sql/hive/HiveInspectors.scala |  4 ++--
 .../spark/sql/hive/HiveMetastoreCatalog.scala  |  4 ++--
 .../org/apache/spark/sql/hive/HiveUtils.scala  |  4 ++--
 .../spark/sql/hive/client/HiveClientImpl.scala | 24 +--
 .../apache/spark/sql/hive/client/HiveShim.scala| 10 
 .../spark/sql/hive/execution/HiveOptions.scala |  2 +-
 .../sql/hive/execution/HiveTableScanExec.scala |  2 +-
 .../scala/org/apache/spark/sql/hive/hiveUDFs.scala |  4 ++--
 .../spark/sql/hive/HiveShowCreateTableSuite.scala  |  2 +-
 .../apache/spark/sql/hive/StatisticsSuite.scala|  2 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  2 +-
 36 files changed, 106 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b05f309 -> c28a6fa)

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel
 add c28a6fa  [SPARK-29292][SQL][ML] Update rest of default modules (Hive, 
ML, etc) for Scala 2.13 compilation

No new revisions were added by this update.

Summary of changes:
 .../spark/examples/ml/JavaTokenizerExample.java|  4 ++--
 .../org/apache/spark/examples/SparkKMeans.scala|  8 ++-
 .../apache/spark/sql/avro/SchemaConverters.scala   |  4 ++--
 .../spark/sql/kafka010/KafkaOffsetReader.scala |  2 +-
 .../sql/kafka010/KafkaMicroBatchSourceSuite.scala  |  4 ++--
 .../main/scala/org/apache/spark/ml/Estimator.scala |  2 +-
 .../spark/ml/clustering/GaussianMixture.scala  | 28 +++---
 .../org/apache/spark/ml/feature/RobustScaler.scala |  4 ++--
 .../org/apache/spark/ml/feature/Word2Vec.scala |  2 +-
 .../scala/org/apache/spark/ml/param/params.scala   |  2 +-
 .../spark/mllib/api/python/PythonMLLibAPI.scala|  8 +++
 .../spark/mllib/clustering/BisectingKMeans.scala   |  2 +-
 .../spark/mllib/clustering/GaussianMixture.scala   | 10 
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala|  2 +-
 .../org/apache/spark/mllib/rdd/SlidingRDD.scala|  2 +-
 .../apache/spark/mllib/tree/impurity/Entropy.scala |  2 +-
 .../apache/spark/mllib/tree/impurity/Gini.scala|  2 +-
 .../spark/mllib/tree/impurity/Variance.scala   |  2 +-
 .../apache/spark/mllib/util/NumericParser.scala|  8 +++
 .../spark/ml/clustering/BisectingKMeansSuite.scala |  4 ++--
 .../apache/spark/ml/clustering/KMeansSuite.scala   | 12 +-
 .../ml/evaluation/ClusteringEvaluatorSuite.scala   |  2 +-
 .../apache/spark/ml/feature/NormalizerSuite.scala  | 12 +-
 .../apache/spark/ml/recommendation/ALSSuite.scala  | 12 +-
 .../spark/sql/hive/HiveExternalCatalog.scala   |  8 +++
 .../org/apache/spark/sql/hive/HiveInspectors.scala |  4 ++--
 .../spark/sql/hive/HiveMetastoreCatalog.scala  |  4 ++--
 .../org/apache/spark/sql/hive/HiveUtils.scala  |  4 ++--
 .../spark/sql/hive/client/HiveClientImpl.scala | 24 +--
 .../apache/spark/sql/hive/client/HiveShim.scala| 10 
 .../spark/sql/hive/execution/HiveOptions.scala |  2 +-
 .../sql/hive/execution/HiveTableScanExec.scala |  2 +-
 .../scala/org/apache/spark/sql/hive/hiveUDFs.scala |  4 ++--
 .../spark/sql/hive/HiveShowCreateTableSuite.scala  |  2 +-
 .../apache/spark/sql/hive/StatisticsSuite.scala|  2 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  2 +-
 36 files changed, 106 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b05f309 -> c28a6fa)

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel
 add c28a6fa  [SPARK-29292][SQL][ML] Update rest of default modules (Hive, 
ML, etc) for Scala 2.13 compilation

No new revisions were added by this update.

Summary of changes:
 .../spark/examples/ml/JavaTokenizerExample.java|  4 ++--
 .../org/apache/spark/examples/SparkKMeans.scala|  8 ++-
 .../apache/spark/sql/avro/SchemaConverters.scala   |  4 ++--
 .../spark/sql/kafka010/KafkaOffsetReader.scala |  2 +-
 .../sql/kafka010/KafkaMicroBatchSourceSuite.scala  |  4 ++--
 .../main/scala/org/apache/spark/ml/Estimator.scala |  2 +-
 .../spark/ml/clustering/GaussianMixture.scala  | 28 +++---
 .../org/apache/spark/ml/feature/RobustScaler.scala |  4 ++--
 .../org/apache/spark/ml/feature/Word2Vec.scala |  2 +-
 .../scala/org/apache/spark/ml/param/params.scala   |  2 +-
 .../spark/mllib/api/python/PythonMLLibAPI.scala|  8 +++
 .../spark/mllib/clustering/BisectingKMeans.scala   |  2 +-
 .../spark/mllib/clustering/GaussianMixture.scala   | 10 
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala|  2 +-
 .../org/apache/spark/mllib/rdd/SlidingRDD.scala|  2 +-
 .../apache/spark/mllib/tree/impurity/Entropy.scala |  2 +-
 .../apache/spark/mllib/tree/impurity/Gini.scala|  2 +-
 .../spark/mllib/tree/impurity/Variance.scala   |  2 +-
 .../apache/spark/mllib/util/NumericParser.scala|  8 +++
 .../spark/ml/clustering/BisectingKMeansSuite.scala |  4 ++--
 .../apache/spark/ml/clustering/KMeansSuite.scala   | 12 +-
 .../ml/evaluation/ClusteringEvaluatorSuite.scala   |  2 +-
 .../apache/spark/ml/feature/NormalizerSuite.scala  | 12 +-
 .../apache/spark/ml/recommendation/ALSSuite.scala  | 12 +-
 .../spark/sql/hive/HiveExternalCatalog.scala   |  8 +++
 .../org/apache/spark/sql/hive/HiveInspectors.scala |  4 ++--
 .../spark/sql/hive/HiveMetastoreCatalog.scala  |  4 ++--
 .../org/apache/spark/sql/hive/HiveUtils.scala  |  4 ++--
 .../spark/sql/hive/client/HiveClientImpl.scala | 24 +--
 .../apache/spark/sql/hive/client/HiveShim.scala| 10 
 .../spark/sql/hive/execution/HiveOptions.scala |  2 +-
 .../sql/hive/execution/HiveTableScanExec.scala |  2 +-
 .../scala/org/apache/spark/sql/hive/hiveUDFs.scala |  4 ++--
 .../spark/sql/hive/HiveShowCreateTableSuite.scala  |  2 +-
 .../apache/spark/sql/hive/StatisticsSuite.scala|  2 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  2 +-
 36 files changed, 106 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b05f309 -> c28a6fa)

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel
 add c28a6fa  [SPARK-29292][SQL][ML] Update rest of default modules (Hive, 
ML, etc) for Scala 2.13 compilation

No new revisions were added by this update.

Summary of changes:
 .../spark/examples/ml/JavaTokenizerExample.java|  4 ++--
 .../org/apache/spark/examples/SparkKMeans.scala|  8 ++-
 .../apache/spark/sql/avro/SchemaConverters.scala   |  4 ++--
 .../spark/sql/kafka010/KafkaOffsetReader.scala |  2 +-
 .../sql/kafka010/KafkaMicroBatchSourceSuite.scala  |  4 ++--
 .../main/scala/org/apache/spark/ml/Estimator.scala |  2 +-
 .../spark/ml/clustering/GaussianMixture.scala  | 28 +++---
 .../org/apache/spark/ml/feature/RobustScaler.scala |  4 ++--
 .../org/apache/spark/ml/feature/Word2Vec.scala |  2 +-
 .../scala/org/apache/spark/ml/param/params.scala   |  2 +-
 .../spark/mllib/api/python/PythonMLLibAPI.scala|  8 +++
 .../spark/mllib/clustering/BisectingKMeans.scala   |  2 +-
 .../spark/mllib/clustering/GaussianMixture.scala   | 10 
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala|  2 +-
 .../org/apache/spark/mllib/rdd/SlidingRDD.scala|  2 +-
 .../apache/spark/mllib/tree/impurity/Entropy.scala |  2 +-
 .../apache/spark/mllib/tree/impurity/Gini.scala|  2 +-
 .../spark/mllib/tree/impurity/Variance.scala   |  2 +-
 .../apache/spark/mllib/util/NumericParser.scala|  8 +++
 .../spark/ml/clustering/BisectingKMeansSuite.scala |  4 ++--
 .../apache/spark/ml/clustering/KMeansSuite.scala   | 12 +-
 .../ml/evaluation/ClusteringEvaluatorSuite.scala   |  2 +-
 .../apache/spark/ml/feature/NormalizerSuite.scala  | 12 +-
 .../apache/spark/ml/recommendation/ALSSuite.scala  | 12 +-
 .../spark/sql/hive/HiveExternalCatalog.scala   |  8 +++
 .../org/apache/spark/sql/hive/HiveInspectors.scala |  4 ++--
 .../spark/sql/hive/HiveMetastoreCatalog.scala  |  4 ++--
 .../org/apache/spark/sql/hive/HiveUtils.scala  |  4 ++--
 .../spark/sql/hive/client/HiveClientImpl.scala | 24 +--
 .../apache/spark/sql/hive/client/HiveShim.scala| 10 
 .../spark/sql/hive/execution/HiveOptions.scala |  2 +-
 .../sql/hive/execution/HiveTableScanExec.scala |  2 +-
 .../scala/org/apache/spark/sql/hive/hiveUDFs.scala |  4 ++--
 .../spark/sql/hive/HiveShowCreateTableSuite.scala  |  2 +-
 .../apache/spark/sql/hive/StatisticsSuite.scala|  2 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  2 +-
 36 files changed, 106 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cf22d94 -> b05f309)

2020-07-15 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf22d94  [SPARK-32036] Replace references to blacklist/whitelist 
language with more appropriate terminology, excluding the blacklisting feature
 add b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 100 -
 .../apache/spark/ml/regression/FMRegressor.scala   |  10 +--
 .../spark/mllib/optimization/GradientDescent.scala |  45 ++
 .../apache/spark/mllib/optimization/LBFGS.scala|  11 ++-
 .../ml/classification/FMClassifierSuite.scala  |  26 ++
 python/pyspark/ml/classification.py|  48 +-
 python/pyspark/ml/tests/test_training_summary.py   |  49 +-
 7 files changed, 257 insertions(+), 32 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cf22d94 -> b05f309)

2020-07-15 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf22d94  [SPARK-32036] Replace references to blacklist/whitelist 
language with more appropriate terminology, excluding the blacklisting feature
 add b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 100 -
 .../apache/spark/ml/regression/FMRegressor.scala   |  10 +--
 .../spark/mllib/optimization/GradientDescent.scala |  45 ++
 .../apache/spark/mllib/optimization/LBFGS.scala|  11 ++-
 .../ml/classification/FMClassifierSuite.scala  |  26 ++
 python/pyspark/ml/classification.py|  48 +-
 python/pyspark/ml/tests/test_training_summary.py   |  49 +-
 7 files changed, 257 insertions(+), 32 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cf22d94 -> b05f309)

2020-07-15 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf22d94  [SPARK-32036] Replace references to blacklist/whitelist 
language with more appropriate terminology, excluding the blacklisting feature
 add b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 100 -
 .../apache/spark/ml/regression/FMRegressor.scala   |  10 +--
 .../spark/mllib/optimization/GradientDescent.scala |  45 ++
 .../apache/spark/mllib/optimization/LBFGS.scala|  11 ++-
 .../ml/classification/FMClassifierSuite.scala  |  26 ++
 python/pyspark/ml/classification.py|  48 +-
 python/pyspark/ml/tests/test_training_summary.py   |  49 +-
 7 files changed, 257 insertions(+), 32 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cf22d94 -> b05f309)

2020-07-15 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf22d94  [SPARK-32036] Replace references to blacklist/whitelist 
language with more appropriate terminology, excluding the blacklisting feature
 add b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 100 -
 .../apache/spark/ml/regression/FMRegressor.scala   |  10 +--
 .../spark/mllib/optimization/GradientDescent.scala |  45 ++
 .../apache/spark/mllib/optimization/LBFGS.scala|  11 ++-
 .../ml/classification/FMClassifierSuite.scala  |  26 ++
 python/pyspark/ml/classification.py|  48 +-
 python/pyspark/ml/tests/test_training_summary.py   |  49 +-
 7 files changed, 257 insertions(+), 32 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cf22d94 -> b05f309)

2020-07-15 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf22d94  [SPARK-32036] Replace references to blacklist/whitelist 
language with more appropriate terminology, excluding the blacklisting feature
 add b05f309  [SPARK-32140][ML][PYSPARK] Add training summary to 
FMClassificationModel

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/classification/FMClassifier.scala | 100 -
 .../apache/spark/ml/regression/FMRegressor.scala   |  10 +--
 .../spark/mllib/optimization/GradientDescent.scala |  45 ++
 .../apache/spark/mllib/optimization/LBFGS.scala|  11 ++-
 .../ml/classification/FMClassifierSuite.scala  |  26 ++
 python/pyspark/ml/classification.py|  48 +-
 python/pyspark/ml/tests/test_training_summary.py   |  49 +-
 7 files changed, 257 insertions(+), 32 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r40495 - in /release/spark: spark-2.3.4/ spark-2.4.5/ spark-3.0.0-preview2/

2020-07-15 Thread srowen
Author: srowen
Date: Wed Jul 15 17:12:28 2020
New Revision: 40495

Log:
Remove non-current Spark 2.3, 2.4, 3.0 releases

Removed:
release/spark/spark-2.3.4/
release/spark/spark-2.4.5/
release/spark/spark-3.0.0-preview2/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature

2020-07-15 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cf22d94  [SPARK-32036] Replace references to blacklist/whitelist 
language with more appropriate terminology, excluding the blacklisting feature
cf22d94 is described below

commit cf22d947fb8f37aa4d394b6633d6f08dbbf6dc1c
Author: Erik Krogen 
AuthorDate: Wed Jul 15 11:40:55 2020 -0500

[SPARK-32036] Replace references to blacklist/whitelist language with more 
appropriate terminology, excluding the blacklisting feature

### What changes were proposed in this pull request?

This PR will remove references to these "blacklist" and "whitelist" terms 
besides the blacklisting feature as a whole, which can be handled in a separate 
JIRA/PR.

This touches quite a few files, but the changes are straightforward 
(variable/method/etc. name changes) and most quite self-contained.

### Why are the changes needed?

As per discussion on the Spark dev list, it will be beneficial to remove 
references to problematic language that can alienate potential community 
members. One such reference is "blacklist" and "whitelist". While it seems to 
me that there is some valid debate as to whether these terms have racist 
origins, the cultural connotations are inescapable in today's world.

### Does this PR introduce _any_ user-facing change?

In the test file `HiveQueryFileTest`, a developer has the ability to 
specify the system property `spark.hive.whitelist` to specify a list of Hive 
query files that should be tested. This system property has been renamed to 
`spark.hive.includelist`. The old property has been kept for compatibility, but 
will log a warning if used. I am open to feedback from others on whether 
keeping a deprecated property here is unnecessary given that this is just for 
developers running tests.

### How was this patch tested?

Existing tests should be suitable since no behavior changes are expected as 
a result of this PR.

Closes #28874 from xkrogen/xkrogen-SPARK-32036-rename-blacklists.

Authored-by: Erik Krogen 
Signed-off-by: Thomas Graves 
---
 R/pkg/tests/fulltests/test_context.R   |  2 +-
 R/pkg/tests/fulltests/test_sparkSQL.R  |  8 ++--
 R/pkg/tests/run-all.R  |  4 +-
 .../java/org/apache/spark/network/crypto/README.md |  2 +-
 .../spark/deploy/history/FsHistoryProvider.scala   | 29 +++--
 .../spark/deploy/rest/RestSubmissionClient.scala   |  4 +-
 .../spark/scheduler/OutputCommitCoordinator.scala  |  2 +-
 .../scala/org/apache/spark/util/JsonProtocol.scala |  4 +-
 .../test/scala/org/apache/spark/ThreadAudit.scala  |  4 +-
 .../org/apache/spark/deploy/SparkSubmitSuite.scala | 22 +-
 .../deploy/history/FsHistoryProviderSuite.scala|  8 ++--
 .../org/apache/spark/ui/UISeleniumSuite.scala  | 14 +++---
 dev/sparktestsupport/modules.py| 10 ++---
 docs/streaming-programming-guide.md| 50 +++---
 .../streaming/JavaRecoverableNetworkWordCount.java | 20 -
 .../streaming/recoverable_network_wordcount.py | 16 +++
 .../streaming/RecoverableNetworkWordCount.scala| 16 +++
 .../scala/org/apache/spark/util/DockerUtils.scala  |  6 +--
 project/SparkBuild.scala   |  4 +-
 python/pylintrc|  2 +-
 python/pyspark/cloudpickle.py  |  6 +--
 python/pyspark/sql/functions.py|  4 +-
 python/pyspark/sql/pandas/typehints.py |  4 +-
 python/run-tests.py|  2 +-
 .../cluster/mesos/MesosClusterScheduler.scala  |  4 +-
 .../spark/deploy/yarn/YarnSparkHadoopUtil.scala|  2 +-
 .../sql/catalyst/analysis/CheckAnalysis.scala  |  2 +-
 .../spark/sql/catalyst/json/JSONOptions.scala  | 10 ++---
 .../spark/sql/catalyst/optimizer/Optimizer.scala   | 34 +++
 .../spark/sql/catalyst/optimizer/expressions.scala |  2 +-
 .../plans/logical/basicLogicalOperators.scala  |  2 +-
 .../spark/sql/catalyst/rules/RuleExecutor.scala|  6 +--
 .../catalyst/optimizer/FilterPushdownSuite.scala   |  2 +-
 .../PullupCorrelatedPredicatesSuite.scala  |  2 +-
 .../datasources/json/JsonOutputWriter.scala|  2 +-
 .../inputs/{blacklist.sql => ignored.sql}  |  2 +-
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  6 +--
 .../org/apache/spark/sql/TPCDSQuerySuite.scala |  4 +-
 .../sql/execution/datasources/json/JsonSuite.scala |  2 +-
 .../thriftserver/ThriftServerQueryTestSuite.scala  |  4 +-
 .../hive/execution/HiveCompatibilitySuite.scala| 16 +++
 .../execution/HiveWindowFunctionQuerySuite.scala   |  8 ++--
 .../clientpositive/add_partition_no_includelist.q  |  

[spark] branch branch-2.4 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 9aeeb0f  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
9aeeb0f is described below

commit 9aeeb0f5932550c8025b6804235a50fc203da3a1
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 07:43:56 2020 -0700

[SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER 
BY in DISTRIBUTE BY

This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.

```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")

$ ls -al /tmp/master/
total 56
drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
-rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
-rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
```

The following was found during SPARK-32276. If Spark optimizer removes the 
inner `ORDER BY`, the file size increases.
```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")

$ ls -al /tmp/SPARK-32276/
total 632
drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
-rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
-rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
```

No. This only improves the test coverage.

Pass the GitHub Action or Jenkins.

Closes #29118 from dongjoon-hyun/SPARK-32318.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
index e318f36..5d4f99a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
@@ -83,4 +83,13 @@ class EliminateSortsSuite extends PlanTest {
 
 comparePlans(optimized, correctAnswer)
   }
+
+  test("SPARK-32318: should not remove orderBy in distribute statement") {
+val projectPlan = testRelation.select('a, 'b)
+val orderByPlan = projectPlan.orderBy('b.desc)
+val distributedPlan = orderByPlan.distribute('a)(1)
+val optimized = Optimize.execute(distributedPlan.analyze)
+val correctAnswer = distributedPlan.analyze
+comparePlans(optimized, correctAnswer)
+  }
 }


-
To unsubscribe, e-mail: 

[spark] branch branch-2.4 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 9aeeb0f  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
9aeeb0f is described below

commit 9aeeb0f5932550c8025b6804235a50fc203da3a1
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 07:43:56 2020 -0700

[SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER 
BY in DISTRIBUTE BY

This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.

```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")

$ ls -al /tmp/master/
total 56
drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
-rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
-rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
```

The following was found during SPARK-32276. If Spark optimizer removes the 
inner `ORDER BY`, the file size increases.
```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")

$ ls -al /tmp/SPARK-32276/
total 632
drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
-rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
-rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
```

No. This only improves the test coverage.

Pass the GitHub Action or Jenkins.

Closes #29118 from dongjoon-hyun/SPARK-32318.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
index e318f36..5d4f99a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
@@ -83,4 +83,13 @@ class EliminateSortsSuite extends PlanTest {
 
 comparePlans(optimized, correctAnswer)
   }
+
+  test("SPARK-32318: should not remove orderBy in distribute statement") {
+val projectPlan = testRelation.select('a, 'b)
+val orderByPlan = projectPlan.orderBy('b.desc)
+val distributedPlan = orderByPlan.distribute('a)(1)
+val optimized = Optimize.execute(distributedPlan.analyze)
+val correctAnswer = distributedPlan.analyze
+comparePlans(optimized, correctAnswer)
+  }
 }


-
To unsubscribe, e-mail: 

[spark] branch branch-3.0 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 74c910a  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
74c910a is described below

commit 74c910afb2101ac1335176a0824b508e9fd9e43f
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 07:43:56 2020 -0700

[SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER 
BY in DISTRIBUTE BY

### What changes were proposed in this pull request?

This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.

### Why are the changes needed?

```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")

$ ls -al /tmp/master/
total 56
drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
-rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
-rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
```

The following was found during SPARK-32276. If Spark optimizer removes the 
inner `ORDER BY`, the file size increases.
```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")

$ ls -al /tmp/SPARK-32276/
total 632
drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
-rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
-rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
```

### Does this PR introduce _any_ user-facing change?

No. This only improves the test coverage.

### How was this patch tested?

Pass the GitHub Action or Jenkins.

Closes #29118 from dongjoon-hyun/SPARK-32318.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
index d7eb048..e2b599a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
@@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest {
 comparePlans(optimized, correctAnswer)
   }
 
+  test("SPARK-32318: should not remove orderBy in distribute statement") {
+val projectPlan = testRelation.select('a, 'b)
+val orderByPlan = projectPlan.orderBy('b.desc)
+val distributedPlan = orderByPlan.distribute('a)(1)
+val optimized = Optimize.execute(distributedPlan.analyze)
+val correctAnswer = 

[spark] branch branch-2.4 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 9aeeb0f  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
9aeeb0f is described below

commit 9aeeb0f5932550c8025b6804235a50fc203da3a1
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 07:43:56 2020 -0700

[SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER 
BY in DISTRIBUTE BY

This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.

```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")

$ ls -al /tmp/master/
total 56
drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
-rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
-rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
```

The following was found during SPARK-32276. If Spark optimizer removes the 
inner `ORDER BY`, the file size increases.
```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")

$ ls -al /tmp/SPARK-32276/
total 632
drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
-rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
-rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
```

No. This only improves the test coverage.

Pass the GitHub Action or Jenkins.

Closes #29118 from dongjoon-hyun/SPARK-32318.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
index e318f36..5d4f99a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
@@ -83,4 +83,13 @@ class EliminateSortsSuite extends PlanTest {
 
 comparePlans(optimized, correctAnswer)
   }
+
+  test("SPARK-32318: should not remove orderBy in distribute statement") {
+val projectPlan = testRelation.select('a, 'b)
+val orderByPlan = projectPlan.orderBy('b.desc)
+val distributedPlan = orderByPlan.distribute('a)(1)
+val optimized = Optimize.execute(distributedPlan.analyze)
+val correctAnswer = distributedPlan.analyze
+comparePlans(optimized, correctAnswer)
+  }
 }


-
To unsubscribe, e-mail: 

[spark] branch branch-3.0 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 74c910a  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
74c910a is described below

commit 74c910afb2101ac1335176a0824b508e9fd9e43f
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 07:43:56 2020 -0700

[SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER 
BY in DISTRIBUTE BY

### What changes were proposed in this pull request?

This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.

### Why are the changes needed?

```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")

$ ls -al /tmp/master/
total 56
drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
-rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
-rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
```

The following was found during SPARK-32276. If Spark optimizer removes the 
inner `ORDER BY`, the file size increases.
```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")

$ ls -al /tmp/SPARK-32276/
total 632
drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
-rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
-rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
```

### Does this PR introduce _any_ user-facing change?

No. This only improves the test coverage.

### How was this patch tested?

Pass the GitHub Action or Jenkins.

Closes #29118 from dongjoon-hyun/SPARK-32318.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
index d7eb048..e2b599a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
@@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest {
 comparePlans(optimized, correctAnswer)
   }
 
+  test("SPARK-32318: should not remove orderBy in distribute statement") {
+val projectPlan = testRelation.select('a, 'b)
+val orderByPlan = projectPlan.orderBy('b.desc)
+val distributedPlan = orderByPlan.distribute('a)(1)
+val optimized = Optimize.execute(distributedPlan.analyze)
+val correctAnswer = 

[spark] branch master updated (e449993 -> 8950dcb)

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e449993  [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for 
DSV2's Scan Node
 add 8950dcb  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 9aeeb0f  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
9aeeb0f is described below

commit 9aeeb0f5932550c8025b6804235a50fc203da3a1
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 07:43:56 2020 -0700

[SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER 
BY in DISTRIBUTE BY

This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.

```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")

$ ls -al /tmp/master/
total 56
drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
-rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
-rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
```

The following was found during SPARK-32276. If Spark optimizer removes the 
inner `ORDER BY`, the file size increases.
```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")

$ ls -al /tmp/SPARK-32276/
total 632
drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
-rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
-rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
```

No. This only improves the test coverage.

Pass the GitHub Action or Jenkins.

Closes #29118 from dongjoon-hyun/SPARK-32318.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
index e318f36..5d4f99a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
@@ -83,4 +83,13 @@ class EliminateSortsSuite extends PlanTest {
 
 comparePlans(optimized, correctAnswer)
   }
+
+  test("SPARK-32318: should not remove orderBy in distribute statement") {
+val projectPlan = testRelation.select('a, 'b)
+val orderByPlan = projectPlan.orderBy('b.desc)
+val distributedPlan = orderByPlan.distribute('a)(1)
+val optimized = Optimize.execute(distributedPlan.analyze)
+val correctAnswer = distributedPlan.analyze
+comparePlans(optimized, correctAnswer)
+  }
 }


-
To unsubscribe, e-mail: 

[spark] branch branch-3.0 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 74c910a  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
74c910a is described below

commit 74c910afb2101ac1335176a0824b508e9fd9e43f
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 07:43:56 2020 -0700

[SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER 
BY in DISTRIBUTE BY

### What changes were proposed in this pull request?

This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.

### Why are the changes needed?

```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")

$ ls -al /tmp/master/
total 56
drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
-rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
-rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
```

The following was found during SPARK-32276. If Spark optimizer removes the 
inner `ORDER BY`, the file size increases.
```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")

$ ls -al /tmp/SPARK-32276/
total 632
drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
-rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
-rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
```

### Does this PR introduce _any_ user-facing change?

No. This only improves the test coverage.

### How was this patch tested?

Pass the GitHub Action or Jenkins.

Closes #29118 from dongjoon-hyun/SPARK-32318.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
index d7eb048..e2b599a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
@@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest {
 comparePlans(optimized, correctAnswer)
   }
 
+  test("SPARK-32318: should not remove orderBy in distribute statement") {
+val projectPlan = testRelation.select('a, 'b)
+val orderByPlan = projectPlan.orderBy('b.desc)
+val distributedPlan = orderByPlan.distribute('a)(1)
+val optimized = Optimize.execute(distributedPlan.analyze)
+val correctAnswer = 

[spark] branch master updated (e449993 -> 8950dcb)

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e449993  [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for 
DSV2's Scan Node
 add 8950dcb  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 74c910a  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
74c910a is described below

commit 74c910afb2101ac1335176a0824b508e9fd9e43f
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 07:43:56 2020 -0700

[SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER 
BY in DISTRIBUTE BY

### What changes were proposed in this pull request?

This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.

### Why are the changes needed?

```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")

$ ls -al /tmp/master/
total 56
drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
-rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
-rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
```

The following was found during SPARK-32276. If Spark optimizer removes the 
inner `ORDER BY`, the file size increases.
```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")

$ ls -al /tmp/SPARK-32276/
total 632
drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
-rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
-rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
```

### Does this PR introduce _any_ user-facing change?

No. This only improves the test coverage.

### How was this patch tested?

Pass the GitHub Action or Jenkins.

Closes #29118 from dongjoon-hyun/SPARK-32318.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
index d7eb048..e2b599a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
@@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest {
 comparePlans(optimized, correctAnswer)
   }
 
+  test("SPARK-32318: should not remove orderBy in distribute statement") {
+val projectPlan = testRelation.select('a, 'b)
+val orderByPlan = projectPlan.orderBy('b.desc)
+val distributedPlan = orderByPlan.distribute('a)(1)
+val optimized = Optimize.execute(distributedPlan.analyze)
+val correctAnswer = 

[spark] branch master updated (e449993 -> 8950dcb)

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e449993  [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for 
DSV2's Scan Node
 add 8950dcb  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 74c910a  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
74c910a is described below

commit 74c910afb2101ac1335176a0824b508e9fd9e43f
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 07:43:56 2020 -0700

[SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER 
BY in DISTRIBUTE BY

### What changes were proposed in this pull request?

This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.

### Why are the changes needed?

```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")

$ ls -al /tmp/master/
total 56
drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
-rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
-rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
```

The following was found during SPARK-32276. If Spark optimizer removes the 
inner `ORDER BY`, the file size increases.
```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")

$ ls -al /tmp/SPARK-32276/
total 632
drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
-rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
-rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
```

### Does this PR introduce _any_ user-facing change?

No. This only improves the test coverage.

### How was this patch tested?

Pass the GitHub Action or Jenkins.

Closes #29118 from dongjoon-hyun/SPARK-32318.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
index d7eb048..e2b599a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
@@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest {
 comparePlans(optimized, correctAnswer)
   }
 
+  test("SPARK-32318: should not remove orderBy in distribute statement") {
+val projectPlan = testRelation.select('a, 'b)
+val orderByPlan = projectPlan.orderBy('b.desc)
+val distributedPlan = orderByPlan.distribute('a)(1)
+val optimized = Optimize.execute(distributedPlan.analyze)
+val correctAnswer = 

[spark] branch master updated (e449993 -> 8950dcb)

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e449993  [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for 
DSV2's Scan Node
 add 8950dcb  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8950dcb  [SPARK-32318][SQL][TESTS] Add a test case to 
EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
8950dcb is described below

commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4
Author: Dongjoon Hyun 
AuthorDate: Wed Jul 15 07:43:56 2020 -0700

[SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER 
BY in DISTRIBUTE BY

### What changes were proposed in this pull request?

This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.

### Why are the changes needed?

```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")

$ ls -al /tmp/master/
total 56
drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
-rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
-rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
```

The following was found during SPARK-32276. If Spark optimizer removes the 
inner `ORDER BY`, the file size increases.
```scala
scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")

scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")

$ ls -al /tmp/SPARK-32276/
total 632
drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
-rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
-rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
```

### Does this PR introduce _any_ user-facing change?

No. This only improves the test coverage.

### How was this patch tested?

Pass the GitHub Action or Jenkins.

Closes #29118 from dongjoon-hyun/SPARK-32318.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala   | 9 +
 1 file changed, 9 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
index d7eb048..e2b599a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
@@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest {
 comparePlans(optimized, correctAnswer)
   }
 
+  test("SPARK-32318: should not remove orderBy in distribute statement") {
+val projectPlan = testRelation.select('a, 'b)
+val orderByPlan = projectPlan.orderBy('b.desc)
+val distributedPlan = orderByPlan.distribute('a)(1)
+val optimized = Optimize.execute(distributedPlan.analyze)
+val correctAnswer = distributedPlan.analyze
+comparePlans(optimized, correctAnswer)
+  }
+
   test("should not remove orderBy in 

[spark] branch master updated (2527fbc -> e449993)

2020-07-15 Thread dkbiswal
This is an automated email from the ASF dual-hosted git repository.

dkbiswal pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2527fbc  Revert "[SPARK-32276][SQL] Remove redundant sorts before 
repartition nodes"
 add e449993  [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for 
DSV2's Scan Node

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 33 +-
 ...treamingUpdate.scala => SupportsMetadata.scala} | 14 +++---
 .../datasources/v2/DataSourceV2ScanExecBase.scala  | 30 -
 .../sql/execution/datasources/v2/FileScan.scala| 27 +++-
 .../sql/execution/datasources/v2/csv/CSVScan.scala |  4 ++
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  4 ++
 .../datasources/v2/parquet/ParquetScan.scala   |  4 ++
 .../scala/org/apache/spark/sql/ExplainSuite.scala  | 50 +-
 8 files changed, 144 insertions(+), 22 deletions(-)
 copy 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala
 => SupportsMetadata.scala} (75%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (2527fbc -> e449993)

2020-07-15 Thread dkbiswal
This is an automated email from the ASF dual-hosted git repository.

dkbiswal pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2527fbc  Revert "[SPARK-32276][SQL] Remove redundant sorts before 
repartition nodes"
 add e449993  [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for 
DSV2's Scan Node

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 33 +-
 ...treamingUpdate.scala => SupportsMetadata.scala} | 14 +++---
 .../datasources/v2/DataSourceV2ScanExecBase.scala  | 30 -
 .../sql/execution/datasources/v2/FileScan.scala| 27 +++-
 .../sql/execution/datasources/v2/csv/CSVScan.scala |  4 ++
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  4 ++
 .../datasources/v2/parquet/ParquetScan.scala   |  4 ++
 .../scala/org/apache/spark/sql/ExplainSuite.scala  | 50 +-
 8 files changed, 144 insertions(+), 22 deletions(-)
 copy 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala
 => SupportsMetadata.scala} (75%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (2527fbc -> e449993)

2020-07-15 Thread dkbiswal
This is an automated email from the ASF dual-hosted git repository.

dkbiswal pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2527fbc  Revert "[SPARK-32276][SQL] Remove redundant sorts before 
repartition nodes"
 add e449993  [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for 
DSV2's Scan Node

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 33 +-
 ...treamingUpdate.scala => SupportsMetadata.scala} | 14 +++---
 .../datasources/v2/DataSourceV2ScanExecBase.scala  | 30 -
 .../sql/execution/datasources/v2/FileScan.scala| 27 +++-
 .../sql/execution/datasources/v2/csv/CSVScan.scala |  4 ++
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  4 ++
 .../datasources/v2/parquet/ParquetScan.scala   |  4 ++
 .../scala/org/apache/spark/sql/ExplainSuite.scala  | 50 +-
 8 files changed, 144 insertions(+), 22 deletions(-)
 copy 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala
 => SupportsMetadata.scala} (75%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (2527fbc -> e449993)

2020-07-15 Thread dkbiswal
This is an automated email from the ASF dual-hosted git repository.

dkbiswal pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2527fbc  Revert "[SPARK-32276][SQL] Remove redundant sorts before 
repartition nodes"
 add e449993  [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for 
DSV2's Scan Node

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 33 +-
 ...treamingUpdate.scala => SupportsMetadata.scala} | 14 +++---
 .../datasources/v2/DataSourceV2ScanExecBase.scala  | 30 -
 .../sql/execution/datasources/v2/FileScan.scala| 27 +++-
 .../sql/execution/datasources/v2/csv/CSVScan.scala |  4 ++
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  4 ++
 .../datasources/v2/parquet/ParquetScan.scala   |  4 ++
 .../scala/org/apache/spark/sql/ExplainSuite.scala  | 50 +-
 8 files changed, 144 insertions(+), 22 deletions(-)
 copy 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala
 => SupportsMetadata.scala} (75%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (2527fbc -> e449993)

2020-07-15 Thread dkbiswal
This is an automated email from the ASF dual-hosted git repository.

dkbiswal pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2527fbc  Revert "[SPARK-32276][SQL] Remove redundant sorts before 
repartition nodes"
 add e449993  [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for 
DSV2's Scan Node

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 33 +-
 ...treamingUpdate.scala => SupportsMetadata.scala} | 14 +++---
 .../datasources/v2/DataSourceV2ScanExecBase.scala  | 30 -
 .../sql/execution/datasources/v2/FileScan.scala| 27 +++-
 .../sql/execution/datasources/v2/csv/CSVScan.scala |  4 ++
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  4 ++
 .../datasources/v2/parquet/ParquetScan.scala   |  4 ++
 .../scala/org/apache/spark/sql/ExplainSuite.scala  | 50 +-
 8 files changed, 144 insertions(+), 22 deletions(-)
 copy 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala
 => SupportsMetadata.scala} (75%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org