[spark] branch master updated (186477c -> b1493d8)

2021-05-18 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 186477c  [SPARK-35263][TEST] Refactor ShuffleBlockFetcherIteratorSuite 
to reduce duplicated code
 add b1493d8  [SPARK-35398][SQL] Simplify the way to get classes from 
ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` method

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/codegen/CodeGenerator.scala   | 14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on pull request #343: Make 2.4.8 as EOL release

2021-05-18 Thread GitBox


dongjoon-hyun commented on pull request #343:
URL: https://github.com/apache/spark-website/pull/343#issuecomment-843725836


   Thank you, @viirya and @maropu !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35263][TEST] Refactor ShuffleBlockFetcherIteratorSuite to reduce duplicated code

2021-05-18 Thread mridulm80
This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 186477c  [SPARK-35263][TEST] Refactor ShuffleBlockFetcherIteratorSuite 
to reduce duplicated code
186477c is described below

commit 186477c60e9cad71434b15fd9e08789740425d59
Author: Erik Krogen 
AuthorDate: Tue May 18 22:37:47 2021 -0500

[SPARK-35263][TEST] Refactor ShuffleBlockFetcherIteratorSuite to reduce 
duplicated code

### What changes were proposed in this pull request?
Introduce new shared methods to `ShuffleBlockFetcherIteratorSuite` to 
replace copy-pasted code. Use modern, Scala-like Mockito `Answer` syntax.

### Why are the changes needed?
`ShuffleFetcherBlockIteratorSuite` has tons of duplicate code, like 
https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala#L172-L185
 . It's challenging to tell what the interesting parts are vs. what is just 
being set to some default/unused value.

Similarly but not as bad, there are many calls like the following
```
verify(transfer, times(1)).fetchBlocks(any(), any(), any(), any(), any(), 
any())
when(transfer.fetchBlocks(any(), any(), any(), any(), any(), 
any())).thenAnswer ...
```

These changes result in about 10% reduction in both lines and characters in 
the file:
```bash
# Before
> wc 
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
10633950   43201 
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala

# After
> wc 
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
 9283609   39053 
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
```

It also helps readability, e.g.:
```
val iterator = createShuffleBlockIteratorWithDefaults(
  transfer,
  blocksByAddress,
  maxBytesInFlight = 1000L
)
```
Now I can clearly tell that `maxBytesInFlight` is the main parameter we're 
interested in here.

### Does this PR introduce _any_ user-facing change?
No, test only. There aren't even any behavior changes, just refactoring.

### How was this patch tested?
Unit tests pass.

Closes #32389 from 
xkrogen/xkrogen-spark-35263-refactor-shuffleblockfetcheriteratorsuite.

Authored-by: Erik Krogen 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../storage/ShuffleBlockFetcherIteratorSuite.scala | 689 -
 1 file changed, 245 insertions(+), 444 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
 
b/core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
index 99c43b1..4be5fae 100644
--- 
a/core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
@@ -27,7 +27,7 @@ import scala.concurrent.Future
 
 import org.mockito.ArgumentMatchers.{any, eq => meq}
 import org.mockito.Mockito.{mock, times, verify, when}
-import org.mockito.invocation.InvocationOnMock
+import org.mockito.stubbing.Answer
 import org.scalatest.PrivateMethodTester
 
 import org.apache.spark.{SparkFunSuite, TaskContext}
@@ -35,35 +35,44 @@ import org.apache.spark.network._
 import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, 
ManagedBuffer}
 import org.apache.spark.network.shuffle.{BlockFetchingListener, 
DownloadFileManager, ExternalBlockStoreClient}
 import org.apache.spark.network.util.LimitedInputStream
-import org.apache.spark.shuffle.FetchFailedException
+import org.apache.spark.shuffle.{FetchFailedException, 
ShuffleReadMetricsReporter}
 import org.apache.spark.storage.ShuffleBlockFetcherIterator.FetchBlockInfo
 import org.apache.spark.util.Utils
 
 
 class ShuffleBlockFetcherIteratorSuite extends SparkFunSuite with 
PrivateMethodTester {
 
+  private var transfer: BlockTransferService = _
+
+  override def beforeEach(): Unit = {
+transfer = mock(classOf[BlockTransferService])
+  }
+
   private def doReturn(value: Any) = org.mockito.Mockito.doReturn(value, 
Seq.empty: _*)
 
+  private def answerFetchBlocks(answer: Answer[Unit]): Unit =
+when(transfer.fetchBlocks(any(), any(), any(), any(), any(), 
any())).thenAnswer(answer)
+
+  private def verifyFetchBlocksInvocationCount(expectedCount: Int): Unit =
+verify(transfer, times(expectedCount)).fetchBlocks(any(), any(), any(), 
any(), any(), any())
+
   // Some of the tests are quite tricky because we are testing the cleanup 
behavior
   // in the presence of faults.
 
-  /** Creates a mock 

[spark-website] branch asf-site updated: Make 2.4.8 as EOL release

2021-05-18 Thread viirya
This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new db70b52  Make 2.4.8 as EOL release
db70b52 is described below

commit db70b525ce89fae2596339fcdf132ce547d6d502
Author: Dongjoon Hyun 
AuthorDate: Tue May 18 14:48:00 2021 -0700

Make 2.4.8 as EOL release

We finished a long journey for 2.4 line. This will make the EOL of 2.4 line 
officially in our website.

Author: Dongjoon Hyun 

Closes #343 from dongjoon-hyun/eol.
---
 site/versioning-policy.html | 18 +-
 versioning-policy.md|  8 +---
 2 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/site/versioning-policy.html b/site/versioning-policy.html
index 4fe6fb4..05a8c59 100644
--- a/site/versioning-policy.html
+++ b/site/versioning-policy.html
@@ -360,24 +360,8 @@ For example, branch 2.3.x is no longer considered 
maintained as of September 201
 of 2.3.0 in February 2018. No more 2.3.x releases should be expected after 
that point, even for bug fixes.
 
 The last minor release within a major a release will typically be 
maintained for longer as an LTS release.
-For example, 2.4.0 was released in November 2nd 2018 and has been maintained 
for 29 months as of March 2021. 2.4.8 will be the last release and no more 
2.4.x releases should be expected after that, even for bug fixes.
+For example, 2.4.0 was released in November 2nd 2018 and had been maintained 
for 31 months until 2.4.8 was released on May 2021. 2.4.8 is the last release 
and no more 2.4.x releases should be expected even for bug fixes.
 
-Spark 2.4 LTS Release Window
-
-
-  
-
-  Date
-  Event
-
-  
-  
-
-  Mar 2021
-  Release 2.4.8
-
-  
-
 
   
 
diff --git a/versioning-policy.md b/versioning-policy.md
index 80949a0..2d9d570 100644
--- a/versioning-policy.md
+++ b/versioning-policy.md
@@ -118,11 +118,5 @@ For example, branch 2.3.x is no longer considered 
maintained as of September 201
 of 2.3.0 in February 2018. No more 2.3.x releases should be expected after 
that point, even for bug fixes.
 
 The last minor release within a major a release will typically be maintained 
for longer as an "LTS" release.
-For example, 2.4.0 was released in November 2nd 2018 and has been maintained 
for 29 months as of March 2021. 2.4.8 will be the last release and no more 
2.4.x releases should be expected after that, even for bug fixes.
+For example, 2.4.0 was released in November 2nd 2018 and had been maintained 
for 31 months until 2.4.8 was released on May 2021. 2.4.8 is the last release 
and no more 2.4.x releases should be expected even for bug fixes.
 
-
-Spark 2.4 LTS Release Window
-
-| Date | Event |
-|  | - |
-| Mar 2021 | Release 2.4.8 |

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] viirya closed pull request #343: Make 2.4.8 as EOL release

2021-05-18 Thread GitBox


viirya closed pull request #343:
URL: https://github.com/apache/spark-website/pull/343


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] viirya commented on pull request #343: Make 2.4.8 as EOL release

2021-05-18 Thread GitBox


viirya commented on pull request #343:
URL: https://github.com/apache/spark-website/pull/343#issuecomment-843585610


   Thanks @dongjoon-hyun and @maropu! Merging to asf-site!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] maropu commented on pull request #343: Make 2.4.8 as EOL release

2021-05-18 Thread GitBox


maropu commented on pull request #343:
URL: https://github.com/apache/spark-website/pull/343#issuecomment-843583586


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r47796 - /dev/spark/v2.4.8-rc4-docs/

2021-05-18 Thread viirya
Author: viirya
Date: Tue May 18 19:51:28 2021
New Revision: 47796

Log:
Removing RC artifacts.

Removed:
dev/spark/v2.4.8-rc4-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r47794 - in /dev/spark: v2.4.8-rc1-bin/ v2.4.8-rc1-docs/ v2.4.8-rc2-bin/ v2.4.8-rc2-docs/ v2.4.8-rc3-bin/ v2.4.8-rc3-docs/

2021-05-18 Thread viirya
Author: viirya
Date: Tue May 18 19:45:22 2021
New Revision: 47794

Log:
Removing RC artifacts.

Removed:
dev/spark/v2.4.8-rc1-bin/
dev/spark/v2.4.8-rc1-docs/
dev/spark/v2.4.8-rc2-bin/
dev/spark/v2.4.8-rc2-docs/
dev/spark/v2.4.8-rc3-bin/
dev/spark/v2.4.8-rc3-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on pull request #343: Make 2.4.8 as EOL release

2021-05-18 Thread GitBox


dongjoon-hyun commented on pull request #343:
URL: https://github.com/apache/spark-website/pull/343#issuecomment-843469636


   cc @srowen , @HyukjinKwon , @viirya , @maropu , @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun opened a new pull request #343: Make 2.4.8 as EOL release

2021-05-18 Thread GitBox


dongjoon-hyun opened a new pull request #343:
URL: https://github.com/apache/spark-website/pull/343


   We finished a long journey for 2.4 line. This will make the EOL of 2.4 line 
officially in our website.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-35425][BUILD][3.0] Pin jinja2 in spark-rm/Dockerfile and add as a required dependency in the release README.md

2021-05-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 180fae8  [SPARK-35425][BUILD][3.0] Pin jinja2 in spark-rm/Dockerfile 
and add as a required dependency in the release README.md
180fae8 is described below

commit 180fae86fb9e116706f0e6c94e1e179d5bdd8147
Author: Kousuke Saruta 
AuthorDate: Tue May 18 09:39:02 2021 -0700

[SPARK-35425][BUILD][3.0] Pin jinja2 in spark-rm/Dockerfile and add as a 
required dependency in the release README.md

### What changes were proposed in this pull request?

This PR backports SPARK-35425 (#32573).

The following two things are done in this PR.

* Add note about Jinja2 as a required dependency for document build.
* Add Jinja2 dependency for the document build to `spark-rm/Dockerfile`

### Why are the changes needed?

SPARK-35375(#32509) confined the version of Jinja to <3.0.0.
So it's good to note about it in `docs/README.md` and add the dependency to 
`spark-rm/Dockerfile`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confimed that `make html` succeed under `python/docs` with dependencies 
installed by both of the following commands.
```
pip install sphinx==2.3.1 mkdocs==1.0.4 numpy==1.18.1 jinja2==2.11.3
pip install 'sphinx<3.5.0' mkdocs numpy 'jinja2<3.0.0'
```

Closes #32579 from sarutak/backport-SPARK-35425-branch-3.0.

Authored-by: Kousuke Saruta 
Signed-off-by: Dongjoon Hyun 
---
 dev/create-release/spark-rm/Dockerfile | 4 +++-
 docs/README.md | 7 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index ff6af6f..2fad573 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -33,7 +33,9 @@ ENV DEBCONF_NONINTERACTIVE_SEEN true
 # These arguments are just for reuse and not really meant to be customized.
 ARG APT_INSTALL="apt-get install --no-install-recommends -y"
 
-ARG PIP_PKGS="sphinx==2.3.1 mkdocs==1.0.4 numpy==1.18.1"
+# TODO(SPARK-35375): Jinja2 3.0.0+ causes error when building with Sphinx.
+#   See also https://issues.apache.org/jira/browse/SPARK-35375.
+ARG PIP_PKGS="sphinx==2.3.1 mkdocs==1.0.4 numpy==1.18.1 jinja2==2.11.3"
 ARG GEM_PKGS="jekyll:4.0.0 jekyll-redirect-from:0.16.0 rouge:3.15.0"
 
 # Install extra needed repos and refresh.
diff --git a/docs/README.md b/docs/README.md
index 984ef8e..1d31fd1 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -63,8 +63,13 @@ Note: Other versions of roxygen2 might work in SparkR 
documentation generation b
 
 To generate API docs for any language, you'll need to install these libraries:
 
+
+
 ```sh
-$ sudo pip install 'sphinx<3.5.0' mkdocs numpy
+$ sudo pip install 'sphinx<3.5.0' mkdocs numpy 'jinja2<3.0.0'
 ```
 
 ## Generating the Documentation HTML

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated (38808c2 -> 3699a67)

2021-05-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 38808c2  [SPARK-35411][SQL] Add essential information while 
serializing TreeNode to json
 add 3699a67  [SPARK-35425][BUILD][3.1] Pin jinja2 in spark-rm/Dockerfile 
and add as a required dependency in the release README.md

No new revisions were added by this update.

Summary of changes:
 dev/create-release/spark-rm/Dockerfile | 4 +++-
 docs/README.md | 5 -
 2 files changed, 7 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (9804f07 -> 8c70c17)

2021-05-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9804f07  [SPARK-35411][SQL] Add essential information while 
serializing TreeNode to json
 add 8c70c17  [SPARK-35434][BUILD] Upgrade scalatestplus artifacts to 
3.2.9.0

No new revisions were added by this update.

Summary of changes:
 pom.xml | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [SPARK-35411][SQL] Add essential information while serializing TreeNode to json

2021-05-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 38808c2  [SPARK-35411][SQL] Add essential information while 
serializing TreeNode to json
38808c2 is described below

commit 38808c2ca5b05f2d3471187eada3d670f4fbcd68
Author: Tengfei Huang 
AuthorDate: Tue May 18 23:20:12 2021 +0800

[SPARK-35411][SQL] Add essential information while serializing TreeNode to 
json

### What changes were proposed in this pull request?
Write out Seq of product objects which contain TreeNode, to avoid the cases 
as described in https://issues.apache.org/jira/browse/SPARK-35411 that 
essential information will be ignored and just written out as null values. 
These information are necessary to understand the query plans.

### Why are the changes needed?
Information like cteRelations in With node, and branches in CaseWhen 
expression are necessary to understand the query plans, they should be written 
out to the result json string.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT case added.

Closes #32557 from ivoson/plan-json-fix.

Authored-by: Tengfei Huang 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 9804f07c17af6d8e789f729d5872b85740cc3186)
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/catalyst/trees/TreeNode.scala  | 10 +++---
 .../spark/sql/catalyst/trees/TreeNodeSuite.scala| 21 +
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
index 5b7beb3..d6da04e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
@@ -800,9 +800,10 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
 ("deserialized" -> s.deserialized) ~ ("replication" -> s.replication)
 case n: TreeNode[_] => n.jsonValue
 case o: Option[_] => o.map(parseToJson)
-// Recursive scan Seq[TreeNode], Seq[Partitioning], Seq[DataType]
-case t: Seq[_] if t.forall(_.isInstanceOf[TreeNode[_]]) ||
-  t.forall(_.isInstanceOf[Partitioning]) || 
t.forall(_.isInstanceOf[DataType]) =>
+// Recursive scan Seq[Partitioning], Seq[DataType], Seq[Product]
+case t: Seq[_] if t.forall(_.isInstanceOf[Partitioning]) ||
+  t.forall(_.isInstanceOf[DataType]) ||
+  t.forall(_.isInstanceOf[Product]) =>
   JArray(t.map(parseToJson).toList)
 case t: Seq[_] if t.length > 0 && t.head.isInstanceOf[String] =>
   JString(truncatedString(t, "[", ", ", "]", 
SQLConf.get.maxToStringFields))
@@ -840,6 +841,9 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
 case broadcast: BroadcastMode => true
 case table: CatalogTableType => true
 case storage: CatalogStorageFormat => true
+// Write out product that contains TreeNode, since there are some Tuples 
such as cteRelations
+// in With, branches in CaseWhen which are essential to understand the 
plan.
+case p if p.productIterator.exists(_.isInstanceOf[TreeNode[_]]) => true
 case _ => false
   }
 }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
index 4ad8475..d837af7 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
@@ -594,6 +594,27 @@ class TreeNodeSuite extends SparkFunSuite with SQLHelper {
   "class" -> classOf[JsonTestTreeNode].getName,
   "num-children" -> 0,
   "arg" -> "1")))
+
+// Convert Seq of Product contains TreeNode to JSON.
+assertJSON(
+  Seq(("a", JsonTestTreeNode("0")), ("b", JsonTestTreeNode("1"))),
+  List(
+JObject(
+  "product-class" -> "scala.Tuple2",
+  "_1" -> "a",
+  "_2" -> List(JObject(
+"class" -> classOf[JsonTestTreeNode].getName,
+"num-children" -> 0,
+"arg" -> "0"
+  ))),
+JObject(
+  "product-class" -> "scala.Tuple2",
+  "_1" -> "b",
+  "_2" -> List(JObject(
+"class" -> classOf[JsonTestTreeNode].getName,
+"num-children" -> 0,
+"arg" -> "1"
+  )
   }
 
   test("toJSON should not throws java.lang.StackOverflowError") {

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For 

[spark] branch master updated (746d80d -> 9804f07)

2021-05-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 746d80d  [SPARK-35422][SQL] Fix plan-printing issues to pass the TPCDS 
plan stability tests in Scala v2.13
 add 9804f07  [SPARK-35411][SQL] Add essential information while 
serializing TreeNode to json

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/trees/TreeNode.scala  | 10 +++---
 .../spark/sql/catalyst/trees/TreeNodeSuite.scala| 21 +
 2 files changed, 28 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35389][SQL] V2 ScalarFunction should support magic method with null arguments

2021-05-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 44d762a  [SPARK-35389][SQL] V2 ScalarFunction should support magic 
method with null arguments
44d762a is described below

commit 44d762abc6395570f1f493a145fd5d1cbdf0b49e
Author: Chao Sun 
AuthorDate: Tue May 18 08:45:55 2021 +

[SPARK-35389][SQL] V2 ScalarFunction should support magic method with null 
arguments

### What changes were proposed in this pull request?

When creating `Invoke` and `StaticInvoke` for `ScalarFunction`'s magic 
method, set `propagateNull` to false.

### Why are the changes needed?

When `propgagateNull` is true (which is the default value), `Invoke` and 
`StaticInvoke` will return null if any of the argument is null. For scalar 
function this is incorrect, as we should leave the logic to function 
implementation instead.

### Does this PR introduce _any_ user-facing change?

Yes. Now null arguments shall be properly handled with magic method.

### How was this patch tested?

Added new tests.

Closes #32553 from sunchao/SPARK-35389.

Authored-by: Chao Sun 
Signed-off-by: Wenchen Fan 
---
 .../catalog/functions/ScalarFunction.java  | 19 +++
 .../spark/sql/catalyst/analysis/Analyzer.scala |  5 +--
 .../sql/catalyst/expressions/objects/objects.scala | 26 +++
 .../connector/catalog/functions/JavaStrLen.java| 19 +++
 .../sql/connector/DataSourceV2FunctionSuite.scala  | 37 +-
 5 files changed, 96 insertions(+), 10 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java
index 858ab92..d261a24 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java
@@ -31,6 +31,7 @@ import org.apache.spark.sql.types.DataType;
  * InternalRow API for the {@link DataType SQL data type} returned by {@link 
#resultType()}.
  * The mapping between {@link DataType} and the corresponding JVM type is 
defined below.
  * 
+ *  Magic method 
  * IMPORTANT: the default implementation of {@link #produceResult} 
throws
  * {@link UnsupportedOperationException}. Users must choose to either override 
this method, or
  * implement a magic method with name {@link #MAGIC_METHOD_NAME}, which takes 
individual parameters
@@ -82,6 +83,24 @@ import org.apache.spark.sql.types.DataType;
  * following the mapping defined below, and then checking if there is a 
matching method from all the
  * declared methods in the UDF class, using method name and the Java types.
  * 
+ *  Handling of nullable primitive arguments 
+ * The handling of null primitive arguments is different between the magic 
method approach and
+ * the {@link #produceResult} approach. With the former, whenever any of the 
method arguments meet
+ * the following conditions:
+ * 
+ *   the argument is of primitive type
+ *   the argument is nullable
+ *   the value of the argument is null
+ * 
+ * Spark will return null directly instead of calling the magic method. On the 
other hand, Spark
+ * will pass null primitive arguments to {@link #produceResult} and it is 
user's responsibility to
+ * handle them in the function implementation.
+ * 
+ * Because of the difference, if Spark users want to implement special 
handling of nulls for
+ * nullable primitive arguments, they should override the {@link 
#produceResult} method instead
+ * of using the magic method approach.
+ * 
+ *  Spark data type to Java type mapping 
  * The following are the mapping from {@link DataType SQL data type} to Java 
type which is used
  * by Spark to infer parameter types for the magic methods as well as return 
value type for
  * {@link #produceResult}:
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 9954ca0..3f2e93a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -2204,11 +2204,12 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 findMethod(scalarFunc, MAGIC_METHOD_NAME, argClasses) match {
   case Some(m) if Modifier.isStatic(m.getModifiers) =>
 StaticInvoke(scalarFunc.getClass, scalarFunc.resultType(),
-  MAGIC_METHOD_NAME, arguments, returnNullable = 
scalarFunc.isResultNullable)
+  MAGIC_METHOD_NAME, arguments, 

[spark] branch master updated (7b942d5 -> cce0048)

2021-05-18 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7b942d5  [SPARK-35425][BUILD] Pin jinja2 in `spark-rm/Dockerfile` and 
add as a required dependency in the release README.md
 add cce0048  [SPARK-35351][SQL] Add code-gen for left anti sort merge join

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/joins/SortMergeJoinExec.scala|  97 ++
 .../approved-plans-v1_4/q16.sf100/explain.txt  |   4 +-
 .../approved-plans-v1_4/q16.sf100/simplified.txt   |   5 +-
 .../approved-plans-v1_4/q16/explain.txt|   4 +-
 .../approved-plans-v1_4/q16/simplified.txt |   5 +-
 .../approved-plans-v1_4/q69.sf100/explain.txt  |  36 +++
 .../approved-plans-v1_4/q69.sf100/simplified.txt   | 110 +++--
 .../approved-plans-v1_4/q87.sf100/explain.txt  |   8 +-
 .../approved-plans-v1_4/q87.sf100/simplified.txt   |  10 +-
 .../approved-plans-v1_4/q94.sf100/explain.txt  |   4 +-
 .../approved-plans-v1_4/q94.sf100/simplified.txt   |   5 +-
 .../approved-plans-v1_4/q94/explain.txt|   4 +-
 .../approved-plans-v1_4/q94/simplified.txt |   5 +-
 .../sql/execution/WholeStageCodegenSuite.scala |  22 +
 .../sql/execution/metric/SQLMetricsSuite.scala |   4 +-
 15 files changed, 208 insertions(+), 115 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (3b859a1 -> 7b942d5)

2021-05-18 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3b859a1  [SPARK-35431][SQL][TESTS] Sort elements generated by 
collect_set in SQLQueryTestSuite
 add 7b942d5  [SPARK-35425][BUILD] Pin jinja2 in `spark-rm/Dockerfile` and 
add as a required dependency in the release README.md

No new revisions were added by this update.

Summary of changes:
 dev/create-release/spark-rm/Dockerfile | 4 +++-
 docs/README.md | 5 -
 2 files changed, 7 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [SPARK-35431][SQL][TESTS] Sort elements generated by collect_set in SQLQueryTestSuite

2021-05-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 9dba27c  [SPARK-35431][SQL][TESTS] Sort elements generated by 
collect_set in SQLQueryTestSuite
9dba27c is described below

commit 9dba27c61a5451124afc7a4293986457e5d95177
Author: Takeshi Yamamuro 
AuthorDate: Mon May 17 22:51:32 2021 -0700

[SPARK-35431][SQL][TESTS] Sort elements generated by collect_set in 
SQLQueryTestSuite

### What changes were proposed in this pull request?

To pass `subquery/scalar-subquery/scalar-subquery-select.sql` 
(`SQLQueryTestSuite`) in Scala v2.13,  this PR proposes to change the aggregate 
expr of a test query in the file from `collect_set(...)` to 
`sort_array(collect_set(...))` because `collect_set` depends on the 
`mutable.HashSet` implementation and elements in the set are printed in a 
different order in Scala v2.12/v2.13.

### Why are the changes needed?

To pass the test in Scala v2.13.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually checked.

Closes #32578 from maropu/FixSQLTestIssueInScala213.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 3b859a16c03fe0caaf8683d9cbc1d1c65551105a)
Signed-off-by: Dongjoon Hyun 
---
 .../inputs/subquery/scalar-subquery/scalar-subquery-select.sql| 2 +-
 .../results/subquery/scalar-subquery/scalar-subquery-select.sql.out   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/scalar-subquery/scalar-subquery-select.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/scalar-subquery/scalar-subquery-select.sql
index 81712bf..936da959 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/subquery/scalar-subquery/scalar-subquery-select.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/subquery/scalar-subquery/scalar-subquery-select.sql
@@ -135,6 +135,6 @@ SELECT t1a,
 (SELECT count_if(t2d > 0) FROM t2 WHERE t2a = t1a) count_if_t2,
 (SELECT approx_count_distinct(t2d) FROM t2 WHERE t2a = t1a) 
approx_count_distinct_t2,
 (SELECT collect_list(t2d) FROM t2 WHERE t2a = t1a) collect_list_t2,
-(SELECT collect_set(t2d) FROM t2 WHERE t2a = t1a) collect_set_t2,
+(SELECT sort_array(collect_set(t2d)) FROM t2 WHERE t2a = t1a) 
collect_set_t2,
 (SELECT hex(count_min_sketch(t2d, 0.5d, 0.5d, 1)) FROM t2 WHERE t2a = t1a) 
collect_set_t2
 FROM t1;
\ No newline at end of file
diff --git 
a/sql/core/src/test/resources/sql-tests/results/subquery/scalar-subquery/scalar-subquery-select.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/subquery/scalar-subquery/scalar-subquery-select.sql.out
index 16570c6..68aad89 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/subquery/scalar-subquery/scalar-subquery-select.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/subquery/scalar-subquery/scalar-subquery-select.sql.out
@@ -204,7 +204,7 @@ SELECT t1a,
 (SELECT count_if(t2d > 0) FROM t2 WHERE t2a = t1a) count_if_t2,
 (SELECT approx_count_distinct(t2d) FROM t2 WHERE t2a = t1a) 
approx_count_distinct_t2,
 (SELECT collect_list(t2d) FROM t2 WHERE t2a = t1a) collect_list_t2,
-(SELECT collect_set(t2d) FROM t2 WHERE t2a = t1a) collect_set_t2,
+(SELECT sort_array(collect_set(t2d)) FROM t2 WHERE t2a = t1a) 
collect_set_t2,
 (SELECT hex(count_min_sketch(t2d, 0.5d, 0.5d, 1)) FROM t2 WHERE t2a = t1a) 
collect_set_t2
 FROM t1
 -- !query schema
@@ -215,7 +215,7 @@ val1a   0   0   0   []  []  
0001000100045D8D6AB900
 val1a  0   0   0   []  []  
0001000100045D8D6AB9
 val1a  0   0   0   []  []  
0001000100045D8D6AB9
 val1b  6   6   3   [19,119,319,19,19,19]   [19,119,319]
00010006000100045D8D6AB9000400010001
-val1c  2   2   2   [219,19][219,19]
00010002000100045D8D6AB900010001
+val1c  2   2   2   [219,19][19,219]
00010002000100045D8D6AB900010001
 val1d  0   0   0   []  []  
0001000100045D8D6AB9
 val1d  0   0