from:"lixiao"

RE: [PATCH v2 00/11] Connect VFIO to IOMMUFD

2022-11-14 Thread Yang, Lixiao

On 2022/11/14 20:51, Yi Liu wrote:
> On 2022/11/10 00:57, Jason Gunthorpe wrote:
>> On Tue, Nov 08, 2022 at 11:18:03PM +0800, Yi Liu wrote:
>>> On 2022/11/8 17:19, Nicolin Chen wrote:
>>>> On Mon, Nov 07, 2022 at 08:52:44PM -0400, Jason Gunthorpe wrote:
>>>>
>>>>> This is on github:
>>>>> https://github.com/jgunthorpe/linux/commits/vfio_iommufd
>>>> [...]
>>>>> v2:
>>>>>- Rebase to v6.1-rc3, v4 iommufd series
>>>>>- Fixup comments and commit messages from list remarks
>>>>>- Fix leaking of the iommufd for mdevs
>>>>>- New patch to fix vfio modaliases when vfio container is disabled
>>>>>- Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened
>>>>>  to signal that iommufd is providing this
>>>>
>>>> I've redone my previous sanity tests. Except those reported bugs, 
>>>> things look fine. Once we fix those issues, GVT and other modules 
>>>> can run some more stressful tests, I think.
>>>
>>> our side is also starting test (gvt, nic passthrough) this version. 
>>> need to wait a while for the result.
>>
>> I've updated the branches with the two functional fixes discussed on 
>> the list plus all the doc updates.
>>
>
> I see, due to timzone, the kernel we grabbed is 37c9e6e44d77a, it has 
> slight diff in the scripts/kernel-doc compared with the latest commit 
> (6bb16a9c67769). I don't think it impacts the test.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next
>   (37c9e6e44d77a)
>
> Our side, Yu He, Lixiao Yang has done below tests on Intel platform 
> with the above kernel, results are:
>
> 1) GVT-g test suit passed, Intel iGFx passthrough passed.
>
> 2) NIC passthrough test with different guest memory (1G/4G), passed.
>
> 3) Booting two different QEMUs in the same time but one QEMU opens 
> legacy /dev/vfio/vfio and another opens /dev/iommu. Tests passed.
>
> 4) Tried below Kconfig combinations, results are expected.
>
> VFIO_CONTAINER=y, IOMMUFD=y   -- test pass
> VFIO_CONTAINER=y, IOMMUFD=n   -- test pass
> VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=y  -- test pass 
> VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=n  -- no 
> /dev/vfio/vfio, so test fail, expected
>
> 5) Tested devices from multi-device group. Assign such devices to the 
> same VM, pass; assign them to different VMs, fail; assign them to a VM 
> with Intel virtual VT-d, fail; Results are expected.
>
> Meanwhile, I also tested the branch in development branch for nesting, 
> the basic functionality looks good.
>
> Tested-by: Yi Liu 
>
Tested-by: Lixiao Yang 

--
Regards,
Lixiao Yang

Re: [Intel-gfx] [PATCH v2 00/11] Connect VFIO to IOMMUFD

2022-11-14 Thread Yang, Lixiao

On 2022/11/14 20:51, Yi Liu wrote:
> On 2022/11/10 00:57, Jason Gunthorpe wrote:
>> On Tue, Nov 08, 2022 at 11:18:03PM +0800, Yi Liu wrote:
>>> On 2022/11/8 17:19, Nicolin Chen wrote:
>>>> On Mon, Nov 07, 2022 at 08:52:44PM -0400, Jason Gunthorpe wrote:
>>>>
>>>>> This is on github:
>>>>> https://github.com/jgunthorpe/linux/commits/vfio_iommufd
>>>> [...]
>>>>> v2:
>>>>>- Rebase to v6.1-rc3, v4 iommufd series
>>>>>- Fixup comments and commit messages from list remarks
>>>>>- Fix leaking of the iommufd for mdevs
>>>>>- New patch to fix vfio modaliases when vfio container is disabled
>>>>>- Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened
>>>>>  to signal that iommufd is providing this
>>>>
>>>> I've redone my previous sanity tests. Except those reported bugs, 
>>>> things look fine. Once we fix those issues, GVT and other modules 
>>>> can run some more stressful tests, I think.
>>>
>>> our side is also starting test (gvt, nic passthrough) this version. 
>>> need to wait a while for the result.
>>
>> I've updated the branches with the two functional fixes discussed on 
>> the list plus all the doc updates.
>>
>
> I see, due to timzone, the kernel we grabbed is 37c9e6e44d77a, it has 
> slight diff in the scripts/kernel-doc compared with the latest commit 
> (6bb16a9c67769). I don't think it impacts the test.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next
>   (37c9e6e44d77a)
>
> Our side, Yu He, Lixiao Yang has done below tests on Intel platform 
> with the above kernel, results are:
>
> 1) GVT-g test suit passed, Intel iGFx passthrough passed.
>
> 2) NIC passthrough test with different guest memory (1G/4G), passed.
>
> 3) Booting two different QEMUs in the same time but one QEMU opens 
> legacy /dev/vfio/vfio and another opens /dev/iommu. Tests passed.
>
> 4) Tried below Kconfig combinations, results are expected.
>
> VFIO_CONTAINER=y, IOMMUFD=y   -- test pass
> VFIO_CONTAINER=y, IOMMUFD=n   -- test pass
> VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=y  -- test pass 
> VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=n  -- no 
> /dev/vfio/vfio, so test fail, expected
>
> 5) Tested devices from multi-device group. Assign such devices to the 
> same VM, pass; assign them to different VMs, fail; assign them to a VM 
> with Intel virtual VT-d, fail; Results are expected.
>
> Meanwhile, I also tested the branch in development branch for nesting, 
> the basic functionality looks good.
>
> Tested-by: Yi Liu 
>
Tested-by: Lixiao Yang 

--
Regards,
Lixiao Yang

[spark-website] branch asf-site updated: Remove preview for 3.0 in Download page (#368)

2021-11-09 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 13be9bc  Remove preview for 3.0 in Download page (#368)
13be9bc is described below

commit 13be9bcd059cfb60f60320e520c3eb36adf00cc8
Author: wuyi 
AuthorDate: Wed Nov 10 00:57:51 2021 +0800

Remove preview for 3.0 in Download page (#368)
---
 downloads.md| 8 
 site/downloads.html | 8 
 2 files changed, 16 deletions(-)

diff --git a/downloads.md b/downloads.md
index 518ae5b..993bd7a 100644
--- a/downloads.md
+++ b/downloads.md
@@ -30,14 +30,6 @@ window.onload = function () {
 
 Note that, Spark 2.x is pre-built with Scala 2.11 except version 2.4.2, which 
is pre-built with Scala 2.12. Spark 3.0+ is pre-built with Scala 2.12.
 
-### Latest preview release
-Preview releases, as the name suggests, are releases for previewing upcoming 
features.
-Unlike nightly packages, preview releases have been audited by the project's 
management committee
-to satisfy the legal requirements of Apache Software Foundation's release 
policy.
-Preview releases are not meant to be functional, i.e. they can and highly 
likely will contain
-critical bugs or documentation errors.
-The latest preview release is Spark 3.0.0-preview2, published on Dec 23, 2019.
-
 ### Link with Spark
 Spark artifacts are [hosted in Maven 
Central](https://search.maven.org/search?q=g:org.apache.spark). You can add a 
Maven dependency with the following coordinates:
 
diff --git a/site/downloads.html b/site/downloads.html
index 8869e19..2deb4e4 100644
--- a/site/downloads.html
+++ b/site/downloads.html
@@ -174,14 +174,6 @@ window.onload = function () {
 
 Note that, Spark 2.x is pre-built with Scala 2.11 except version 2.4.2, 
which is pre-built with Scala 2.12. Spark 3.0+ is pre-built with Scala 2.12.
 
-Latest preview release
-Preview releases, as the name suggests, are releases for previewing 
upcoming features.
-Unlike nightly packages, preview releases have been audited by the 
projects management committee
-to satisfy the legal requirements of Apache Software Foundations 
release policy.
-Preview releases are not meant to be functional, i.e. they can and highly 
likely will contain
-critical bugs or documentation errors.
-The latest preview release is Spark 3.0.0-preview2, published on Dec 23, 
2019.
-
 Link with Spark
 Spark artifacts are https://search.maven.org/search?q=g:org.apache.spark;>hosted in Maven 
Central. You can add a Maven dependency with the following coordinates:
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Update Spark 3.3 release window (#366)

2021-10-29 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new ec02d91  Update Spark 3.3 release window (#366)
ec02d91 is described below

commit ec02d9186df432ada948b4c22e326814c0ec79b2
Author: Hyukjin Kwon 
AuthorDate: Sat Oct 30 05:39:02 2021 +0900

Update Spark 3.3 release window (#366)
---
 site/versioning-policy.html | 8 
 versioning-policy.md| 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/site/versioning-policy.html b/site/versioning-policy.html
index d4d024b..ed75de8 100644
--- a/site/versioning-policy.html
+++ b/site/versioning-policy.html
@@ -272,7 +272,7 @@ available APIs.
 generally be released about 6 months after 2.2.0. Maintenance releases happen 
as needed
 in between feature releases. Major releases do not happen according to a fixed 
schedule.
 
-Spark 3.2 release window
+Spark 3.3 release window
 
 
   
@@ -283,15 +283,15 @@ in between feature releases. Major releases do not happen 
according to a fixed s
   
   
 
-  July 1st 2021
+  March 15th 2022
   Code freeze. Release branch cut.
 
 
-  Mid July 2021
+  Late March 2022
   QA period. Focus on bug fixes, tests, stability and docs. Generally, 
no new features merged.
 
 
-  August 2021
+  April 2022
   Release candidates (RC), voting, etc. until final release passes
 
   
diff --git a/versioning-policy.md b/versioning-policy.md
index 3d3f03f..55a0bd3 100644
--- a/versioning-policy.md
+++ b/versioning-policy.md
@@ -103,13 +103,13 @@ In general, feature ("minor") releases occur about every 
6 months. Hence, Spark
 generally be released about 6 months after 2.2.0. Maintenance releases happen 
as needed
 in between feature releases. Major releases do not happen according to a fixed 
schedule.
 
-Spark 3.2 release window
+Spark 3.3 release window
 
 | Date  | Event |
 | - | - |
-| July 1st 2021 | Code freeze. Release branch cut.|
-| Mid July 2021 | QA period. Focus on bug fixes, tests, stability and docs. 
Generally, no new features merged.|
-| August 2021 | Release candidates (RC), voting, etc. until final release 
passes|
+| March 15th 2022 | Code freeze. Release branch cut.|
+| Late March 2022 | QA period. Focus on bug fixes, tests, stability and docs. 
Generally, no new features merged.|
+| April 2022 | Release candidates (RC), voting, etc. until final release 
passes|
 
 Maintenance releases and EOL
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ed9e6fc -> dfa3978)

2020-11-25 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ed9e6fc  [SPARK-33565][INFRA][FOLLOW-UP] Keep the test coverage with 
Python 3.8 in GitHub Actions
 add dfa3978  [SPARK-33551][SQL] Do not use custom shuffle reader for 
repartition

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|   2 +-
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  31 +++---
 .../adaptive/CoalesceShufflePartitions.scala   |  11 +-
 ...costing.scala => CustomShuffleReaderRule.scala} |  15 +--
 .../adaptive/OptimizeLocalShuffleReader.scala  |   9 +-
 .../execution/adaptive/OptimizeSkewedJoin.scala|  14 ++-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 116 -
 7 files changed, 162 insertions(+), 36 deletions(-)
 copy 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/{costing.scala 
=> CustomShuffleReaderRule.scala} (69%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a09747b -> 14aeab3)

2020-10-05 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a09747b  [SPARK-33063][K8S] Improve error message for insufficient K8s 
volume confs
 add 14aeab3  [SPARK-33038][SQL] Combine AQE initial and current plan 
string when two plans are the same

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  50 ++---
 .../sql-tests/results/explain-aqe.sql.out  | 123 +++--
 .../adaptive/AdaptiveQueryExecSuite.scala  |   4 +-
 3 files changed, 47 insertions(+), 130 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a09747b -> 14aeab3)

2020-10-05 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a09747b  [SPARK-33063][K8S] Improve error message for insufficient K8s 
volume confs
 add 14aeab3  [SPARK-33038][SQL] Combine AQE initial and current plan 
string when two plans are the same

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  50 ++---
 .../sql-tests/results/explain-aqe.sql.out  | 123 +++--
 .../adaptive/AdaptiveQueryExecSuite.scala  |   4 +-
 3 files changed, 47 insertions(+), 130 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a09747b -> 14aeab3)

2020-10-05 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a09747b  [SPARK-33063][K8S] Improve error message for insufficient K8s 
volume confs
 add 14aeab3  [SPARK-33038][SQL] Combine AQE initial and current plan 
string when two plans are the same

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  50 ++---
 .../sql-tests/results/explain-aqe.sql.out  | 123 +++--
 .../adaptive/AdaptiveQueryExecSuite.scala  |   4 +-
 3 files changed, 47 insertions(+), 130 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a09747b -> 14aeab3)

2020-10-05 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a09747b  [SPARK-33063][K8S] Improve error message for insufficient K8s 
volume confs
 add 14aeab3  [SPARK-33038][SQL] Combine AQE initial and current plan 
string when two plans are the same

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  50 ++---
 .../sql-tests/results/explain-aqe.sql.out  | 123 +++--
 .../adaptive/AdaptiveQueryExecSuite.scala  |   4 +-
 3 files changed, 47 insertions(+), 130 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a09747b -> 14aeab3)

2020-10-05 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a09747b  [SPARK-33063][K8S] Improve error message for insufficient K8s 
volume confs
 add 14aeab3  [SPARK-33038][SQL] Combine AQE initial and current plan 
string when two plans are the same

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  50 ++---
 .../sql-tests/results/explain-aqe.sql.out  | 123 +++--
 .../adaptive/AdaptiveQueryExecSuite.scala  |   4 +-
 3 files changed, 47 insertions(+), 130 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Update the artifactId in the Download Page #276

2020-06-23 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 2c5679f  Update the artifactId in the Download Page #276
2c5679f is described below

commit 2c5679f415c3605726e68c0a2b8c204c91131d0c
Author: Xiao Li 
AuthorDate: Tue Jun 23 17:38:28 2020 -0700

Update the artifactId in the Download Page #276

The existing artifactId is not correct. We need to update it from 2.11 to 
2.12
---
 downloads.md| 2 +-
 site/downloads.html | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/downloads.md b/downloads.md
index 2ed9870..880024a 100644
--- a/downloads.md
+++ b/downloads.md
@@ -40,7 +40,7 @@ The latest preview release is Spark 3.0.0-preview2, published 
on Dec 23, 2019.
 Spark artifacts are [hosted in Maven 
Central](https://search.maven.org/search?q=g:org.apache.spark). You can add a 
Maven dependency with the following coordinates:
 
 groupId: org.apache.spark
-artifactId: spark-core_2.11
+artifactId: spark-core_2.12
 version: 3.0.0
 
 ### Installing with PyPi
diff --git a/site/downloads.html b/site/downloads.html
index e3b060f..d820471 100644
--- a/site/downloads.html
+++ b/site/downloads.html
@@ -240,7 +240,7 @@ The latest preview release is Spark 3.0.0-preview2, 
published on Dec 23, 2019.Spark artifacts are https://search.maven.org/search?q=g:org.apache.spark;>hosted in Maven 
Central. You can add a Maven dependency with the following coordinates:
 
 groupId: org.apache.spark
-artifactId: spark-core_2.11
+artifactId: spark-core_2.12
 version: 3.0.0
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener

2020-05-12 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 512cb2f  [SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener
512cb2f is described below

commit 512cb2f0246a0d020f0ba726b4596555b15797c6
Author: Ali Smesseim 
AuthorDate: Tue May 12 09:14:34 2020 -0700

[SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener

### What changes were proposed in this pull request?

The update methods in HiveThriftServer2Listener now check if the parameter 
operation/session ID actually exist in the `sessionList` and `executionList` 
respectively. This prevents NullPointerExceptions if the operation or session 
ID is unknown. Instead, a warning is written to the log.

Also, in HiveSessionImpl.close(), we catch any exception thrown by 
`operationManager.closeOperation`. If for any reason this throws an exception, 
other operations are not prevented from being closed.

### Why are the changes needed?

The listener's update methods would throw an exception if the operation or 
session ID is unknown. In Spark 2, where the listener is called directly, this 
hampers with the caller's control flow. In Spark 3, the exception is caught by 
the ListenerBus but results in an uninformative NullPointerException.

In HiveSessionImpl.close(), if an exception is thrown when closing an 
operation, all following operations are not closed.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Unit tests

Closes #28155 from alismess-db/hive-thriftserver-listener-update-safer.

Authored-by: Ali Smesseim 
Signed-off-by: gatorsmile 
(cherry picked from commit 6994c64efd5770a8fd33220cbcaddc1d96fed886)
Signed-off-by: gatorsmile 
---
 .../ui/HiveThriftServer2Listener.scala | 120 -
 .../hive/thriftserver/HiveSessionImplSuite.scala   |  73 +
 .../ui/HiveThriftServer2ListenerSuite.scala|  16 +++
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 5 files changed, 170 insertions(+), 51 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
index 6d0a506..20a8f2c 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
@@ -25,6 +25,7 @@ import scala.collection.mutable.ArrayBuffer
 import org.apache.hive.service.server.HiveServer2
 
 import org.apache.spark.{SparkConf, SparkContext}
+import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config.Status.LIVE_ENTITY_UPDATE_PERIOD
 import org.apache.spark.scheduler._
 import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.ExecutionState
@@ -38,7 +39,7 @@ private[thriftserver] class HiveThriftServer2Listener(
 kvstore: ElementTrackingStore,
 sparkConf: SparkConf,
 server: Option[HiveServer2],
-live: Boolean = true) extends SparkListener {
+live: Boolean = true) extends SparkListener with Logging {
 
   private val sessionList = new ConcurrentHashMap[String, LiveSessionData]()
   private val executionList = new ConcurrentHashMap[String, 
LiveExecutionData]()
@@ -131,60 +132,81 @@ private[thriftserver] class HiveThriftServer2Listener(
 updateLiveStore(session)
   }
 
-  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
= {
-val session = sessionList.get(e.sessionId)
-session.finishTimestamp = e.finishTime
-updateStoreWithTriggerEnabled(session)
-sessionList.remove(e.sessionId)
-  }
+  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
=
+Option(sessionList.get(e.sessionId)) match {
+  case None => logWarning(s"onSessionClosed called with unknown session 
id: ${e.sessionId}")
+  case Some(sessionData) =>
+val session = sessionData
+session.finishTimestamp = e.finishTime
+updateStoreWithTriggerEnabled(session)
+sessionList.remove(e.sessionId)
+}
 
-  private def onOperationStart(e: SparkListenerThriftServerOperationStart): 
Unit = {
-val info = getOrCreateExecution(
-  e.id,
-  e.statement,
-  e.sessionId,
-  e.startTime,
-  e.userName)
-
-info.state = ExecutionState.STARTED
-executionList.put(e.id, info)
-sessionList.get(e.sessionId).totalExecution += 1
-executionList.get(e.id).gro

[spark] branch branch-3.0 updated: [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener

2020-05-12 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 512cb2f  [SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener
512cb2f is described below

commit 512cb2f0246a0d020f0ba726b4596555b15797c6
Author: Ali Smesseim 
AuthorDate: Tue May 12 09:14:34 2020 -0700

[SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener

### What changes were proposed in this pull request?

The update methods in HiveThriftServer2Listener now check if the parameter 
operation/session ID actually exist in the `sessionList` and `executionList` 
respectively. This prevents NullPointerExceptions if the operation or session 
ID is unknown. Instead, a warning is written to the log.

Also, in HiveSessionImpl.close(), we catch any exception thrown by 
`operationManager.closeOperation`. If for any reason this throws an exception, 
other operations are not prevented from being closed.

### Why are the changes needed?

The listener's update methods would throw an exception if the operation or 
session ID is unknown. In Spark 2, where the listener is called directly, this 
hampers with the caller's control flow. In Spark 3, the exception is caught by 
the ListenerBus but results in an uninformative NullPointerException.

In HiveSessionImpl.close(), if an exception is thrown when closing an 
operation, all following operations are not closed.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Unit tests

Closes #28155 from alismess-db/hive-thriftserver-listener-update-safer.

Authored-by: Ali Smesseim 
Signed-off-by: gatorsmile 
(cherry picked from commit 6994c64efd5770a8fd33220cbcaddc1d96fed886)
Signed-off-by: gatorsmile 
---
 .../ui/HiveThriftServer2Listener.scala | 120 -
 .../hive/thriftserver/HiveSessionImplSuite.scala   |  73 +
 .../ui/HiveThriftServer2ListenerSuite.scala|  16 +++
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 5 files changed, 170 insertions(+), 51 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
index 6d0a506..20a8f2c 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
@@ -25,6 +25,7 @@ import scala.collection.mutable.ArrayBuffer
 import org.apache.hive.service.server.HiveServer2
 
 import org.apache.spark.{SparkConf, SparkContext}
+import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config.Status.LIVE_ENTITY_UPDATE_PERIOD
 import org.apache.spark.scheduler._
 import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.ExecutionState
@@ -38,7 +39,7 @@ private[thriftserver] class HiveThriftServer2Listener(
 kvstore: ElementTrackingStore,
 sparkConf: SparkConf,
 server: Option[HiveServer2],
-live: Boolean = true) extends SparkListener {
+live: Boolean = true) extends SparkListener with Logging {
 
   private val sessionList = new ConcurrentHashMap[String, LiveSessionData]()
   private val executionList = new ConcurrentHashMap[String, 
LiveExecutionData]()
@@ -131,60 +132,81 @@ private[thriftserver] class HiveThriftServer2Listener(
 updateLiveStore(session)
   }
 
-  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
= {
-val session = sessionList.get(e.sessionId)
-session.finishTimestamp = e.finishTime
-updateStoreWithTriggerEnabled(session)
-sessionList.remove(e.sessionId)
-  }
+  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
=
+Option(sessionList.get(e.sessionId)) match {
+  case None => logWarning(s"onSessionClosed called with unknown session 
id: ${e.sessionId}")
+  case Some(sessionData) =>
+val session = sessionData
+session.finishTimestamp = e.finishTime
+updateStoreWithTriggerEnabled(session)
+sessionList.remove(e.sessionId)
+}
 
-  private def onOperationStart(e: SparkListenerThriftServerOperationStart): 
Unit = {
-val info = getOrCreateExecution(
-  e.id,
-  e.statement,
-  e.sessionId,
-  e.startTime,
-  e.userName)
-
-info.state = ExecutionState.STARTED
-executionList.put(e.id, info)
-sessionList.get(e.sessionId).totalExecution += 1
-executionList.get(e.id).gro

[spark] branch master updated (e248bc7 -> 6994c64)

2020-05-12 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e248bc7  [SPARK-31610][SPARK-31668][ML] Address hashingTF 
saving bug and expose hashFunc property in HashingTF
 add 6994c64  [SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener

No new revisions were added by this update.

Summary of changes:
 .../ui/HiveThriftServer2Listener.scala | 120 -
 .../hive/thriftserver/HiveSessionImplSuite.scala   |  73 +
 .../ui/HiveThriftServer2ListenerSuite.scala|  16 +++
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 5 files changed, 170 insertions(+), 51 deletions(-)
 create mode 100644 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveSessionImplSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener

2020-05-12 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 512cb2f  [SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener
512cb2f is described below

commit 512cb2f0246a0d020f0ba726b4596555b15797c6
Author: Ali Smesseim 
AuthorDate: Tue May 12 09:14:34 2020 -0700

[SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener

### What changes were proposed in this pull request?

The update methods in HiveThriftServer2Listener now check if the parameter 
operation/session ID actually exist in the `sessionList` and `executionList` 
respectively. This prevents NullPointerExceptions if the operation or session 
ID is unknown. Instead, a warning is written to the log.

Also, in HiveSessionImpl.close(), we catch any exception thrown by 
`operationManager.closeOperation`. If for any reason this throws an exception, 
other operations are not prevented from being closed.

### Why are the changes needed?

The listener's update methods would throw an exception if the operation or 
session ID is unknown. In Spark 2, where the listener is called directly, this 
hampers with the caller's control flow. In Spark 3, the exception is caught by 
the ListenerBus but results in an uninformative NullPointerException.

In HiveSessionImpl.close(), if an exception is thrown when closing an 
operation, all following operations are not closed.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Unit tests

Closes #28155 from alismess-db/hive-thriftserver-listener-update-safer.

Authored-by: Ali Smesseim 
Signed-off-by: gatorsmile 
(cherry picked from commit 6994c64efd5770a8fd33220cbcaddc1d96fed886)
Signed-off-by: gatorsmile 
---
 .../ui/HiveThriftServer2Listener.scala | 120 -
 .../hive/thriftserver/HiveSessionImplSuite.scala   |  73 +
 .../ui/HiveThriftServer2ListenerSuite.scala|  16 +++
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 5 files changed, 170 insertions(+), 51 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
index 6d0a506..20a8f2c 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
@@ -25,6 +25,7 @@ import scala.collection.mutable.ArrayBuffer
 import org.apache.hive.service.server.HiveServer2
 
 import org.apache.spark.{SparkConf, SparkContext}
+import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config.Status.LIVE_ENTITY_UPDATE_PERIOD
 import org.apache.spark.scheduler._
 import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.ExecutionState
@@ -38,7 +39,7 @@ private[thriftserver] class HiveThriftServer2Listener(
 kvstore: ElementTrackingStore,
 sparkConf: SparkConf,
 server: Option[HiveServer2],
-live: Boolean = true) extends SparkListener {
+live: Boolean = true) extends SparkListener with Logging {
 
   private val sessionList = new ConcurrentHashMap[String, LiveSessionData]()
   private val executionList = new ConcurrentHashMap[String, 
LiveExecutionData]()
@@ -131,60 +132,81 @@ private[thriftserver] class HiveThriftServer2Listener(
 updateLiveStore(session)
   }
 
-  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
= {
-val session = sessionList.get(e.sessionId)
-session.finishTimestamp = e.finishTime
-updateStoreWithTriggerEnabled(session)
-sessionList.remove(e.sessionId)
-  }
+  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
=
+Option(sessionList.get(e.sessionId)) match {
+  case None => logWarning(s"onSessionClosed called with unknown session 
id: ${e.sessionId}")
+  case Some(sessionData) =>
+val session = sessionData
+session.finishTimestamp = e.finishTime
+updateStoreWithTriggerEnabled(session)
+sessionList.remove(e.sessionId)
+}
 
-  private def onOperationStart(e: SparkListenerThriftServerOperationStart): 
Unit = {
-val info = getOrCreateExecution(
-  e.id,
-  e.statement,
-  e.sessionId,
-  e.startTime,
-  e.userName)
-
-info.state = ExecutionState.STARTED
-executionList.put(e.id, info)
-sessionList.get(e.sessionId).totalExecution += 1
-executionList.get(e.id).gro

[spark] branch master updated (e248bc7 -> 6994c64)

2020-05-12 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e248bc7  [SPARK-31610][SPARK-31668][ML] Address hashingTF 
saving bug and expose hashFunc property in HashingTF
 add 6994c64  [SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener

No new revisions were added by this update.

Summary of changes:
 .../ui/HiveThriftServer2Listener.scala | 120 -
 .../hive/thriftserver/HiveSessionImplSuite.scala   |  73 +
 .../ui/HiveThriftServer2ListenerSuite.scala|  16 +++
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 5 files changed, 170 insertions(+), 51 deletions(-)
 create mode 100644 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveSessionImplSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e248bc7 -> 6994c64)

2020-05-12 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e248bc7  [SPARK-31610][SPARK-31668][ML] Address hashingTF 
saving bug and expose hashFunc property in HashingTF
 add 6994c64  [SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener

No new revisions were added by this update.

Summary of changes:
 .../ui/HiveThriftServer2Listener.scala | 120 -
 .../hive/thriftserver/HiveSessionImplSuite.scala   |  73 +
 .../ui/HiveThriftServer2ListenerSuite.scala|  16 +++
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 5 files changed, 170 insertions(+), 51 deletions(-)
 create mode 100644 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveSessionImplSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31658][SQL] Fix SQL UI not showing write commands of AQE plan

2020-05-08 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ba43922  [SPARK-31658][SQL] Fix SQL UI not showing write commands of 
AQE plan
ba43922 is described below

commit ba4392217b461d20bfd10dbc00714dbb7268d71a
Author: manuzhang 
AuthorDate: Fri May 8 10:24:13 2020 -0700

[SPARK-31658][SQL] Fix SQL UI not showing write commands of AQE plan

Show write commands on SQL UI of an AQE plan

Currently the leaf node of an AQE plan is always a `AdaptiveSparkPlan` 
which is not true when it's a child of a write command. Hence, the node of the 
write command as well as its metrics are not shown on the SQL UI.


![image](https://user-images.githubusercontent.com/1191767/81288918-1893f580-9098-11ea-9771-e3d0820ba806.png)


![image](https://user-images.githubusercontent.com/1191767/81289008-3a8d7800-9098-11ea-93ec-516bbaf25d2d.png)

No

Add UT.

Closes #28474 from manuzhang/aqe-ui.

Lead-authored-by: manuzhang 
Co-authored-by: Xiao Li 
Signed-off-by: gatorsmile 
(cherry picked from commit 77c690a7252b22c9dd8f3cb7ac32f79fd6845cad)
Signed-off-by: gatorsmile 
---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  4 +--
 .../adaptive/AdaptiveQueryExecSuite.scala  | 35 --
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index cd6936b..90d1db9 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -526,8 +526,8 @@ case class AdaptiveSparkPlanExec(
 } else {
   
context.session.sparkContext.listenerBus.post(SparkListenerSQLAdaptiveExecutionUpdate(
 executionId,
-SQLExecution.getQueryExecution(executionId).toString,
-SparkPlanInfo.fromSparkPlan(this)))
+context.qe.toString,
+SparkPlanInfo.fromSparkPlan(context.qe.executedPlan)))
 }
   }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
index f30d1e9..29b9755 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
@@ -805,9 +805,11 @@ class AdaptiveQueryExecSuite
   test("SPARK-30953: InsertAdaptiveSparkPlan should apply AQE on child plan of 
write commands") {
 withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
   SQLConf.ADAPTIVE_EXECUTION_FORCE_APPLY.key -> "true") {
-  val plan = sql("CREATE TABLE t1 AS SELECT 1 
col").queryExecution.executedPlan
-  assert(plan.isInstanceOf[DataWritingCommandExec])
-  
assert(plan.asInstanceOf[DataWritingCommandExec].child.isInstanceOf[AdaptiveSparkPlanExec])
+  withTable("t1") {
+val plan = sql("CREATE TABLE t1 AS SELECT 1 
col").queryExecution.executedPlan
+assert(plan.isInstanceOf[DataWritingCommandExec])
+
assert(plan.asInstanceOf[DataWritingCommandExec].child.isInstanceOf[AdaptiveSparkPlanExec])
+  }
 }
   }
 
@@ -847,4 +849,31 @@ class AdaptiveQueryExecSuite
   }
 }
   }
+
+  test("SPARK-31658: SQL UI should show write commands") {
+withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+  SQLConf.ADAPTIVE_EXECUTION_FORCE_APPLY.key -> "true") {
+  withTable("t1") {
+var checkDone = false
+val listener = new SparkListener {
+  override def onOtherEvent(event: SparkListenerEvent): Unit = {
+event match {
+  case SparkListenerSQLAdaptiveExecutionUpdate(_, _, planInfo) =>
+assert(planInfo.nodeName == "Execute 
CreateDataSourceTableAsSelectCommand")
+checkDone = true
+  case _ => // ignore other events
+}
+  }
+}
+spark.sparkContext.addSparkListener(listener)
+try {
+  sql("CREATE TABLE t1 AS SELECT 1 col").collect()
+  spark.sparkContext.listenerBus.waitUntilEmpty()
+  assert(checkDone)
+} finally {
+  spark.sparkContext.removeSparkListener(listener)
+}
+  }
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0fb607e -> 77c690a)

2020-05-08 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0fb607e  [SPARK-30385][WEBUI] WebUI occasionally throw IOException on 
stop()
 add 77c690a  [SPARK-31658][SQL] Fix SQL UI not showing write commands of 
AQE plan

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  4 +--
 .../adaptive/AdaptiveQueryExecSuite.scala  | 35 --
 2 files changed, 34 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0fb607e -> 77c690a)

2020-05-08 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0fb607e  [SPARK-30385][WEBUI] WebUI occasionally throw IOException on 
stop()
 add 77c690a  [SPARK-31658][SQL] Fix SQL UI not showing write commands of 
AQE plan

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  4 +--
 .../adaptive/AdaptiveQueryExecSuite.scala  | 35 --
 2 files changed, 34 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (348fd53 -> 75da050)

2020-05-01 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 348fd53  [SPARK-31307][ML][EXAMPLES] Add examples for ml.fvalue
 add 75da050  [MINOR][SQL][DOCS] Remove two leading spaces from sql tables

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md   |  40 +-
 docs/sql-ref-functions-udf-hive.md|  82 ++--
 docs/sql-ref-null-semantics.md| 512 +++---
 docs/sql-ref-syntax-aux-analyze-table.md  |  88 ++--
 docs/sql-ref-syntax-aux-conf-mgmt-set.md  |  10 +-
 docs/sql-ref-syntax-aux-describe-database.md  |  44 +-
 docs/sql-ref-syntax-aux-describe-function.md  |  84 ++--
 docs/sql-ref-syntax-aux-describe-query.md |  60 +--
 docs/sql-ref-syntax-aux-describe-table.md | 164 +++
 docs/sql-ref-syntax-aux-show-columns.md   |  42 +-
 docs/sql-ref-syntax-aux-show-create-table.md  |  20 +-
 docs/sql-ref-syntax-aux-show-databases.md |  40 +-
 docs/sql-ref-syntax-aux-show-functions.md |  96 ++--
 docs/sql-ref-syntax-aux-show-partitions.md|  60 +--
 docs/sql-ref-syntax-aux-show-table.md | 178 
 docs/sql-ref-syntax-aux-show-tables.md|  64 +--
 docs/sql-ref-syntax-aux-show-tblproperties.md |  48 +-
 docs/sql-ref-syntax-aux-show-views.md |  68 +--
 docs/sql-ref-syntax-ddl-alter-database.md |  16 +-
 docs/sql-ref-syntax-ddl-alter-table.md| 252 +--
 docs/sql-ref-syntax-ddl-alter-view.md | 112 ++---
 docs/sql-ref-syntax-ddl-create-database.md|  16 +-
 docs/sql-ref-syntax-ddl-create-function.md|  46 +-
 docs/sql-ref-syntax-ddl-drop-function.md  |  32 +-
 docs/sql-ref-syntax-ddl-repair-table.md   |  18 +-
 docs/sql-ref-syntax-ddl-truncate-table.md |  32 +-
 docs/sql-ref-syntax-dml-insert-into.md| 164 +++
 docs/sql-ref-syntax-dml-insert-overwrite-table.md | 124 +++---
 docs/sql-ref-syntax-dml-load.md   |  44 +-
 docs/sql-ref-syntax-qry-aggregation.md|  22 -
 docs/sql-ref-syntax-qry-explain.md| 100 ++---
 docs/sql-ref-syntax-qry-sampling.md   |  82 ++--
 docs/sql-ref-syntax-qry-select-clusterby.md   |  40 +-
 docs/sql-ref-syntax-qry-select-cte.md |  60 +--
 docs/sql-ref-syntax-qry-select-distinct.md|  22 -
 docs/sql-ref-syntax-qry-select-distribute-by.md   |  40 +-
 docs/sql-ref-syntax-qry-select-groupby.md | 216 -
 docs/sql-ref-syntax-qry-select-having.md  |  68 +--
 docs/sql-ref-syntax-qry-select-inline-table.md|  36 +-
 docs/sql-ref-syntax-qry-select-join.md| 175 
 docs/sql-ref-syntax-qry-select-limit.md   |  50 +--
 docs/sql-ref-syntax-qry-select-orderby.md |  90 ++--
 docs/sql-ref-syntax-qry-select-setops.md  | 190 
 docs/sql-ref-syntax-qry-select-sortby.md  | 132 +++---
 docs/sql-ref-syntax-qry-select-tvf.md |  68 +--
 docs/sql-ref-syntax-qry-select-where.md   |  82 ++--
 docs/sql-ref-syntax-qry-window.md | 168 +++
 47 files changed, 2076 insertions(+), 2121 deletions(-)
 delete mode 100644 docs/sql-ref-syntax-qry-aggregation.md
 delete mode 100644 docs/sql-ref-syntax-qry-select-distinct.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (348fd53 -> 75da050)

2020-05-01 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 348fd53  [SPARK-31307][ML][EXAMPLES] Add examples for ml.fvalue
 add 75da050  [MINOR][SQL][DOCS] Remove two leading spaces from sql tables

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md   |  40 +-
 docs/sql-ref-functions-udf-hive.md|  82 ++--
 docs/sql-ref-null-semantics.md| 512 +++---
 docs/sql-ref-syntax-aux-analyze-table.md  |  88 ++--
 docs/sql-ref-syntax-aux-conf-mgmt-set.md  |  10 +-
 docs/sql-ref-syntax-aux-describe-database.md  |  44 +-
 docs/sql-ref-syntax-aux-describe-function.md  |  84 ++--
 docs/sql-ref-syntax-aux-describe-query.md |  60 +--
 docs/sql-ref-syntax-aux-describe-table.md | 164 +++
 docs/sql-ref-syntax-aux-show-columns.md   |  42 +-
 docs/sql-ref-syntax-aux-show-create-table.md  |  20 +-
 docs/sql-ref-syntax-aux-show-databases.md |  40 +-
 docs/sql-ref-syntax-aux-show-functions.md |  96 ++--
 docs/sql-ref-syntax-aux-show-partitions.md|  60 +--
 docs/sql-ref-syntax-aux-show-table.md | 178 
 docs/sql-ref-syntax-aux-show-tables.md|  64 +--
 docs/sql-ref-syntax-aux-show-tblproperties.md |  48 +-
 docs/sql-ref-syntax-aux-show-views.md |  68 +--
 docs/sql-ref-syntax-ddl-alter-database.md |  16 +-
 docs/sql-ref-syntax-ddl-alter-table.md| 252 +--
 docs/sql-ref-syntax-ddl-alter-view.md | 112 ++---
 docs/sql-ref-syntax-ddl-create-database.md|  16 +-
 docs/sql-ref-syntax-ddl-create-function.md|  46 +-
 docs/sql-ref-syntax-ddl-drop-function.md  |  32 +-
 docs/sql-ref-syntax-ddl-repair-table.md   |  18 +-
 docs/sql-ref-syntax-ddl-truncate-table.md |  32 +-
 docs/sql-ref-syntax-dml-insert-into.md| 164 +++
 docs/sql-ref-syntax-dml-insert-overwrite-table.md | 124 +++---
 docs/sql-ref-syntax-dml-load.md   |  44 +-
 docs/sql-ref-syntax-qry-aggregation.md|  22 -
 docs/sql-ref-syntax-qry-explain.md| 100 ++---
 docs/sql-ref-syntax-qry-sampling.md   |  82 ++--
 docs/sql-ref-syntax-qry-select-clusterby.md   |  40 +-
 docs/sql-ref-syntax-qry-select-cte.md |  60 +--
 docs/sql-ref-syntax-qry-select-distinct.md|  22 -
 docs/sql-ref-syntax-qry-select-distribute-by.md   |  40 +-
 docs/sql-ref-syntax-qry-select-groupby.md | 216 -
 docs/sql-ref-syntax-qry-select-having.md  |  68 +--
 docs/sql-ref-syntax-qry-select-inline-table.md|  36 +-
 docs/sql-ref-syntax-qry-select-join.md| 175 
 docs/sql-ref-syntax-qry-select-limit.md   |  50 +--
 docs/sql-ref-syntax-qry-select-orderby.md |  90 ++--
 docs/sql-ref-syntax-qry-select-setops.md  | 190 
 docs/sql-ref-syntax-qry-select-sortby.md  | 132 +++---
 docs/sql-ref-syntax-qry-select-tvf.md |  68 +--
 docs/sql-ref-syntax-qry-select-where.md   |  82 ++--
 docs/sql-ref-syntax-qry-window.md | 168 +++
 47 files changed, 2076 insertions(+), 2121 deletions(-)
 delete mode 100644 docs/sql-ref-syntax-qry-aggregation.md
 delete mode 100644 docs/sql-ref-syntax-qry-select-distinct.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file

2020-04-30 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f5e018e  [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the 
MD file
f5e018e is described below

commit f5e018edc71fd5ddf5ce4f82d02ac777bc3d7280
Author: Xiao Li 
AuthorDate: Thu Apr 30 09:34:56 2020 -0700

[SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file

### What changes were proposed in this pull request?
This PR is to clean up the markdown file in SHOW COLUMNS page.

- remove the unneeded embedded inline HTML markup by using the basic 
markdown syntax.
- use the ``` sql for highlighting the SQL syntax.

### Why are the changes needed?
Make the doc cleaner and easily editable by MD editors.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
**Before**

![Screen Shot 2020-04-29 at 5 20 11 
PM](https://user-images.githubusercontent.com/11567269/80661963-fa4d4a80-8a44-11ea-9dea-c43cda6de010.png)

**After**

![Screen Shot 2020-04-29 at 6 03 50 
PM](https://user-images.githubusercontent.com/11567269/80661940-f15c7900-8a44-11ea-9943-a83e8d8618fb.png)

Closes #28414 from gatorsmile/cleanupShowColumns.

Lead-authored-by: Xiao Li 
Co-authored-by: gatorsmile 
Signed-off-by: gatorsmile 
(cherry picked from commit b5ecc41c73018bbc742186d2e752101a99cfe852)
Signed-off-by: gatorsmile 
---
 docs/sql-ref-syntax-aux-show-columns.md | 51 ++---
 1 file changed, 22 insertions(+), 29 deletions(-)

diff --git a/docs/sql-ref-syntax-aux-show-columns.md 
b/docs/sql-ref-syntax-aux-show-columns.md
index 8f73aac..c8c90a9 100644
--- a/docs/sql-ref-syntax-aux-show-columns.md
+++ b/docs/sql-ref-syntax-aux-show-columns.md
@@ -25,41 +25,34 @@ Return the list of columns in a table. If the table does 
not exist, an exception
 
 ### Syntax
 
-{% highlight sql %}
+```sql
 SHOW COLUMNS table_identifier [ database ]
-{% endhighlight %}
+```
 
 ### Parameters
 
-
-  table_identifier
-  
+* **table_identifier**
+
 Specifies the table name of an existing table. The table may be optionally 
qualified
-with a database name.
-Syntax:
-  
-{ IN | FROM } [ database_name . ] table_name
-  
-Note:
-Keywords IN and FROM are interchangeable.
-  
-  database
-  
+with a database name.
+
+**Syntax:** `{ IN | FROM } [ database_name . ] table_name`
+
+**Note:** Keywords `IN` and `FROM` are interchangeable.
+
+* **database**
+
 Specifies an optional database name. The table is resolved from this 
database when it
-is specified. Please note that when this parameter is specified then table
-name should not be qualified with a different database name. 
-Syntax:
-  
-{ IN | FROM } database_name
-  
-Note:
-Keywords IN and FROM are interchangeable.
-  
-
+is specified. When this parameter is specified then table
+name should not be qualified with a different database name. 
+
+**Syntax:** `{ IN | FROM } database_name`
+
+**Note:** Keywords `IN` and `FROM` are interchangeable.
 
 ### Examples
 
-{% highlight sql %}
+```sql
 -- Create `customer` table in `salesdb` database;
 USE salesdb;
 CREATE TABLE customer(
@@ -96,9 +89,9 @@ SHOW COLUMNS IN customer IN salesdb;
   | name|
   |cust_addr|
   +-+
-{% endhighlight %}
+```
 
 ### Related Statements
 
- * [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html)
- * [SHOW TABLE](sql-ref-syntax-aux-show-table.html)
+* [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html)
+* [SHOW TABLE](sql-ref-syntax-aux-show-table.html)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file

2020-04-30 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f5e018e  [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the 
MD file
f5e018e is described below

commit f5e018edc71fd5ddf5ce4f82d02ac777bc3d7280
Author: Xiao Li 
AuthorDate: Thu Apr 30 09:34:56 2020 -0700

[SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the MD file

### What changes were proposed in this pull request?
This PR is to clean up the markdown file in SHOW COLUMNS page.

- remove the unneeded embedded inline HTML markup by using the basic 
markdown syntax.
- use the ``` sql for highlighting the SQL syntax.

### Why are the changes needed?
Make the doc cleaner and easily editable by MD editors.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
**Before**

![Screen Shot 2020-04-29 at 5 20 11 
PM](https://user-images.githubusercontent.com/11567269/80661963-fa4d4a80-8a44-11ea-9dea-c43cda6de010.png)

**After**

![Screen Shot 2020-04-29 at 6 03 50 
PM](https://user-images.githubusercontent.com/11567269/80661940-f15c7900-8a44-11ea-9943-a83e8d8618fb.png)

Closes #28414 from gatorsmile/cleanupShowColumns.

Lead-authored-by: Xiao Li 
Co-authored-by: gatorsmile 
Signed-off-by: gatorsmile 
(cherry picked from commit b5ecc41c73018bbc742186d2e752101a99cfe852)
Signed-off-by: gatorsmile 
---
 docs/sql-ref-syntax-aux-show-columns.md | 51 ++---
 1 file changed, 22 insertions(+), 29 deletions(-)

diff --git a/docs/sql-ref-syntax-aux-show-columns.md 
b/docs/sql-ref-syntax-aux-show-columns.md
index 8f73aac..c8c90a9 100644
--- a/docs/sql-ref-syntax-aux-show-columns.md
+++ b/docs/sql-ref-syntax-aux-show-columns.md
@@ -25,41 +25,34 @@ Return the list of columns in a table. If the table does 
not exist, an exception
 
 ### Syntax
 
-{% highlight sql %}
+```sql
 SHOW COLUMNS table_identifier [ database ]
-{% endhighlight %}
+```
 
 ### Parameters
 
-
-  table_identifier
-  
+* **table_identifier**
+
 Specifies the table name of an existing table. The table may be optionally 
qualified
-with a database name.
-Syntax:
-  
-{ IN | FROM } [ database_name . ] table_name
-  
-Note:
-Keywords IN and FROM are interchangeable.
-  
-  database
-  
+with a database name.
+
+**Syntax:** `{ IN | FROM } [ database_name . ] table_name`
+
+**Note:** Keywords `IN` and `FROM` are interchangeable.
+
+* **database**
+
 Specifies an optional database name. The table is resolved from this 
database when it
-is specified. Please note that when this parameter is specified then table
-name should not be qualified with a different database name. 
-Syntax:
-  
-{ IN | FROM } database_name
-  
-Note:
-Keywords IN and FROM are interchangeable.
-  
-
+is specified. When this parameter is specified then table
+name should not be qualified with a different database name. 
+
+**Syntax:** `{ IN | FROM } database_name`
+
+**Note:** Keywords `IN` and `FROM` are interchangeable.
 
 ### Examples
 
-{% highlight sql %}
+```sql
 -- Create `customer` table in `salesdb` database;
 USE salesdb;
 CREATE TABLE customer(
@@ -96,9 +89,9 @@ SHOW COLUMNS IN customer IN salesdb;
   | name|
   |cust_addr|
   +-+
-{% endhighlight %}
+```
 
 ### Related Statements
 
- * [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html)
- * [SHOW TABLE](sql-ref-syntax-aux-show-table.html)
+* [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html)
+* [SHOW TABLE](sql-ref-syntax-aux-show-table.html)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c09cfb9 -> b5ecc41)

2020-04-30 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c09cfb9  [SPARK-31557][SQL] Fix timestamps rebasing in legacy parsers
 add b5ecc41  [SPARK-28806][DOCS][FOLLOW-UP] Remove unneeded HTML from the 
MD file

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-show-columns.md | 51 ++---
 1 file changed, 22 insertions(+), 29 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31234][SQL][FOLLOW-UP] ResetCommand should not affect static SQL Configuration

2020-04-20 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 1701f78  [SPARK-31234][SQL][FOLLOW-UP] ResetCommand should not affect 
static SQL Configuration
1701f78 is described below

commit 1701f7882aac9e3efaa36c628815edfad09b62fa
Author: gatorsmile 
AuthorDate: Mon Apr 20 13:08:55 2020 -0700

[SPARK-31234][SQL][FOLLOW-UP] ResetCommand should not affect static SQL 
Configuration

### What changes were proposed in this pull request?
This PR is the follow-up PR of https://github.com/apache/spark/pull/28003

- add a migration guide
- add an end-to-end test case.

### Why are the changes needed?
The original PR made the major behavior change in the user-facing RESET 
command.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Added a new end-to-end test

Closes #28265 from gatorsmile/spark-31234followup.

Authored-by: gatorsmile 
Signed-off-by: gatorsmile 
(cherry picked from commit 6c792a79c10e7b01bd040ef14c848a2a2378e28c)
Signed-off-by: gatorsmile 
---
 docs/core-migration-guide.md |  2 +-
 docs/sql-migration-guide.md  |  4 
 .../org/apache/spark/sql/internal/StaticSQLConf.scala|  3 +++
 .../org/apache/spark/sql/internal/SharedState.scala  |  3 ---
 .../org/apache/spark/sql/SparkSessionBuilderSuite.scala  | 16 
 .../org/apache/spark/sql/internal/SQLConfSuite.scala |  2 +-
 6 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index cde6e07..33406d0 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -25,7 +25,7 @@ license: |
 ## Upgrading from Core 2.4 to 3.0
 
 - The `org.apache.spark.ExecutorPlugin` interface and related configuration 
has been replaced with
-  `org.apache.spark.plugin.SparkPlugin`, which adds new functionality. Plugins 
using the old
+  `org.apache.spark.api.plugin.SparkPlugin`, which adds new functionality. 
Plugins using the old
   interface must be modified to extend the new interfaces. Check the
   [Monitoring](monitoring.html) guide for more details.
 
diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index f5c81e9..8945c13 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -210,6 +210,10 @@ license: |
 
 * The decimal string representation can be different between Hive 1.2 and 
Hive 2.3 when using `TRANSFORM` operator in SQL for script transformation, 
which depends on hive's behavior. In Hive 1.2, the string representation omits 
trailing zeroes. But in Hive 2.3, it is always padded to 18 digits with 
trailing zeroes if necessary.
 
+## Upgrading from Spark SQL 2.4.5 to 2.4.6
+
+  - In Spark 2.4.6, the `RESET` command does not reset the static SQL 
configuration values to the default. It only clears the runtime SQL 
configuration values.
+
 ## Upgrading from Spark SQL 2.4.4 to 2.4.5
 
   - Since Spark 2.4.5, `TRUNCATE TABLE` command tries to set back original 
permission and ACLs during re-creating the table/partition paths. To restore 
the behaviour of earlier versions, set 
`spark.sql.truncateTable.ignorePermissionAcl.enabled` to `true`.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
index d202528..9618ff6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
@@ -47,6 +47,9 @@ object StaticSQLConf {
 .internal()
 .version("2.1.0")
 .stringConf
+// System preserved database should not exists in metastore. However it's 
hard to guarantee it
+// for every session, because case-sensitivity differs. Here we always 
lowercase it to make our
+// life easier.
 .transform(_.toLowerCase(Locale.ROOT))
 .createWithDefault("global_temp")
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
index 14b8ea6..47119ab 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
@@ -153,9 +153,6 @@ private[sql] class SharedState(
* A manager for global temporary views.
*/
   lazy val globalTempViewManager: GlobalTempViewManager = {
-// System preserved database should not exists in metastore. However it's 
hard to guarantee it
-// for every session, because case-sensitivity differs. Here we always 
lowercase it to make our

[spark] branch master updated (44d370d -> 6c792a7)

2020-04-20 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 44d370d  [SPARK-31475][SQL] Broadcast stage in AQE did not timeout
 add 6c792a7  [SPARK-31234][SQL][FOLLOW-UP] ResetCommand should not affect 
static SQL Configuration

No new revisions were added by this update.

Summary of changes:
 docs/core-migration-guide.md |  2 +-
 docs/sql-migration-guide.md  |  4 
 .../org/apache/spark/sql/internal/StaticSQLConf.scala|  3 +++
 .../org/apache/spark/sql/internal/SharedState.scala  |  3 ---
 .../org/apache/spark/sql/SparkSessionBuilderSuite.scala  | 16 
 .../org/apache/spark/sql/internal/SQLConfSuite.scala |  2 +-
 6 files changed, 25 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31475][SQL] Broadcast stage in AQE did not timeout

2020-04-20 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2e32160  [SPARK-31475][SQL] Broadcast stage in AQE did not timeout
2e32160 is described below

commit 2e3216012e8ad85d4cd88671493dd6e4d0e6a668
Author: Maryann Xue 
AuthorDate: Mon Apr 20 11:55:48 2020 -0700

[SPARK-31475][SQL] Broadcast stage in AQE did not timeout

### What changes were proposed in this pull request?

This PR adds a timeout for the Future of a BroadcastQueryStageExec to make 
sure it can have the same timeout behavior as a non-AQE broadcast exchange.

### Why are the changes needed?

This is to make the broadcast timeout behavior in AQE consistent with that 
in non-AQE.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added UT.

Closes #28250 from maryannxue/aqe-broadcast-timeout.

Authored-by: Maryann Xue 
Signed-off-by: gatorsmile 
(cherry picked from commit 44d370dd4501f0a4abb7194f7cff0d346aac0992)
Signed-off-by: gatorsmile 
---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/execution/adaptive/QueryStageExec.scala| 35 ++
 .../execution/exchange/BroadcastExchangeExec.scala |  8 ++---
 .../sql/execution/joins/BroadcastJoinSuite.scala   | 23 --
 4 files changed, 56 insertions(+), 12 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index 2b46724..0ec8b5f 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -546,7 +546,7 @@ case class AdaptiveSparkPlanExec(
 }
 
 object AdaptiveSparkPlanExec {
-  private val executionContext = ExecutionContext.fromExecutorService(
+  private[adaptive] val executionContext = 
ExecutionContext.fromExecutorService(
 ThreadUtils.newDaemonCachedThreadPool("QueryStageCreator", 16))
 
   /**
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
index beaa972..f414f85 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
@@ -17,9 +17,11 @@
 
 package org.apache.spark.sql.execution.adaptive
 
-import scala.concurrent.Future
+import java.util.concurrent.TimeUnit
 
-import org.apache.spark.{FutureAction, MapOutputStatistics}
+import scala.concurrent.{Future, Promise}
+
+import org.apache.spark.{FutureAction, MapOutputStatistics, SparkException}
 import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.catalyst.InternalRow
@@ -28,6 +30,8 @@ import org.apache.spark.sql.catalyst.plans.logical.Statistics
 import org.apache.spark.sql.catalyst.plans.physical.Partitioning
 import org.apache.spark.sql.execution._
 import org.apache.spark.sql.execution.exchange._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.ThreadUtils
 
 /**
  * A query stage is an independent subgraph of the query plan. Query stage 
materializes its output
@@ -100,8 +104,8 @@ abstract class QueryStageExec extends LeafExecNode {
   override def executeTail(n: Int): Array[InternalRow] = plan.executeTail(n)
   override def executeToIterator(): Iterator[InternalRow] = 
plan.executeToIterator()
 
-  override def doPrepare(): Unit = plan.prepare()
-  override def doExecute(): RDD[InternalRow] = plan.execute()
+  protected override def doPrepare(): Unit = plan.prepare()
+  protected override def doExecute(): RDD[InternalRow] = plan.execute()
   override def doExecuteBroadcast[T](): Broadcast[T] = plan.executeBroadcast()
   override def doCanonicalize(): SparkPlan = plan.canonicalized
 
@@ -187,8 +191,24 @@ case class BroadcastQueryStageExec(
   throw new IllegalStateException("wrong plan for broadcast stage:\n " + 
plan.treeString)
   }
 
+  @transient private lazy val materializeWithTimeout = {
+val broadcastFuture = broadcast.completionFuture
+val timeout = SQLConf.get.broadcastTimeout
+val promise = Promise[Any]()
+val fail = BroadcastQueryStageExec.scheduledExecutor.schedule(new 
Runnable() {
+  override def run(): Unit = {
+promise.tryFailure(new SparkException(s"Could not execute broadcast in 
$timeout secs. " +
+  s"You can increase the timeout for broadcasts via 
${SQLConf.BROADCAST_TIMEOUT.key} or " +
+

[spark] branch branch-3.0 updated: [SPARK-31475][SQL] Broadcast stage in AQE did not timeout

2020-04-20 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2e32160  [SPARK-31475][SQL] Broadcast stage in AQE did not timeout
2e32160 is described below

commit 2e3216012e8ad85d4cd88671493dd6e4d0e6a668
Author: Maryann Xue 
AuthorDate: Mon Apr 20 11:55:48 2020 -0700

[SPARK-31475][SQL] Broadcast stage in AQE did not timeout

### What changes were proposed in this pull request?

This PR adds a timeout for the Future of a BroadcastQueryStageExec to make 
sure it can have the same timeout behavior as a non-AQE broadcast exchange.

### Why are the changes needed?

This is to make the broadcast timeout behavior in AQE consistent with that 
in non-AQE.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added UT.

Closes #28250 from maryannxue/aqe-broadcast-timeout.

Authored-by: Maryann Xue 
Signed-off-by: gatorsmile 
(cherry picked from commit 44d370dd4501f0a4abb7194f7cff0d346aac0992)
Signed-off-by: gatorsmile 
---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/execution/adaptive/QueryStageExec.scala| 35 ++
 .../execution/exchange/BroadcastExchangeExec.scala |  8 ++---
 .../sql/execution/joins/BroadcastJoinSuite.scala   | 23 --
 4 files changed, 56 insertions(+), 12 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index 2b46724..0ec8b5f 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -546,7 +546,7 @@ case class AdaptiveSparkPlanExec(
 }
 
 object AdaptiveSparkPlanExec {
-  private val executionContext = ExecutionContext.fromExecutorService(
+  private[adaptive] val executionContext = 
ExecutionContext.fromExecutorService(
 ThreadUtils.newDaemonCachedThreadPool("QueryStageCreator", 16))
 
   /**
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
index beaa972..f414f85 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
@@ -17,9 +17,11 @@
 
 package org.apache.spark.sql.execution.adaptive
 
-import scala.concurrent.Future
+import java.util.concurrent.TimeUnit
 
-import org.apache.spark.{FutureAction, MapOutputStatistics}
+import scala.concurrent.{Future, Promise}
+
+import org.apache.spark.{FutureAction, MapOutputStatistics, SparkException}
 import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.catalyst.InternalRow
@@ -28,6 +30,8 @@ import org.apache.spark.sql.catalyst.plans.logical.Statistics
 import org.apache.spark.sql.catalyst.plans.physical.Partitioning
 import org.apache.spark.sql.execution._
 import org.apache.spark.sql.execution.exchange._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.ThreadUtils
 
 /**
  * A query stage is an independent subgraph of the query plan. Query stage 
materializes its output
@@ -100,8 +104,8 @@ abstract class QueryStageExec extends LeafExecNode {
   override def executeTail(n: Int): Array[InternalRow] = plan.executeTail(n)
   override def executeToIterator(): Iterator[InternalRow] = 
plan.executeToIterator()
 
-  override def doPrepare(): Unit = plan.prepare()
-  override def doExecute(): RDD[InternalRow] = plan.execute()
+  protected override def doPrepare(): Unit = plan.prepare()
+  protected override def doExecute(): RDD[InternalRow] = plan.execute()
   override def doExecuteBroadcast[T](): Broadcast[T] = plan.executeBroadcast()
   override def doCanonicalize(): SparkPlan = plan.canonicalized
 
@@ -187,8 +191,24 @@ case class BroadcastQueryStageExec(
   throw new IllegalStateException("wrong plan for broadcast stage:\n " + 
plan.treeString)
   }
 
+  @transient private lazy val materializeWithTimeout = {
+val broadcastFuture = broadcast.completionFuture
+val timeout = SQLConf.get.broadcastTimeout
+val promise = Promise[Any]()
+val fail = BroadcastQueryStageExec.scheduledExecutor.schedule(new 
Runnable() {
+  override def run(): Unit = {
+promise.tryFailure(new SparkException(s"Could not execute broadcast in 
$timeout secs. " +
+  s"You can increase the timeout for broadcasts via 
${SQLConf.BROADCAST_TIMEOUT.key} or " +
+

[spark] branch master updated: [SPARK-31475][SQL] Broadcast stage in AQE did not timeout

2020-04-20 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 44d370d  [SPARK-31475][SQL] Broadcast stage in AQE did not timeout
44d370d is described below

commit 44d370dd4501f0a4abb7194f7cff0d346aac0992
Author: Maryann Xue 
AuthorDate: Mon Apr 20 11:55:48 2020 -0700

[SPARK-31475][SQL] Broadcast stage in AQE did not timeout

### What changes were proposed in this pull request?

This PR adds a timeout for the Future of a BroadcastQueryStageExec to make 
sure it can have the same timeout behavior as a non-AQE broadcast exchange.

### Why are the changes needed?

This is to make the broadcast timeout behavior in AQE consistent with that 
in non-AQE.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added UT.

Closes #28250 from maryannxue/aqe-broadcast-timeout.

Authored-by: Maryann Xue 
Signed-off-by: gatorsmile 
---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/execution/adaptive/QueryStageExec.scala| 35 ++
 .../execution/exchange/BroadcastExchangeExec.scala |  8 ++---
 .../sql/execution/joins/BroadcastJoinSuite.scala   | 23 --
 4 files changed, 56 insertions(+), 12 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index 3ac4ea5..f819937 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -547,7 +547,7 @@ case class AdaptiveSparkPlanExec(
 }
 
 object AdaptiveSparkPlanExec {
-  private val executionContext = ExecutionContext.fromExecutorService(
+  private[adaptive] val executionContext = 
ExecutionContext.fromExecutorService(
 ThreadUtils.newDaemonCachedThreadPool("QueryStageCreator", 16))
 
   /**
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
index beaa972..f414f85 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
@@ -17,9 +17,11 @@
 
 package org.apache.spark.sql.execution.adaptive
 
-import scala.concurrent.Future
+import java.util.concurrent.TimeUnit
 
-import org.apache.spark.{FutureAction, MapOutputStatistics}
+import scala.concurrent.{Future, Promise}
+
+import org.apache.spark.{FutureAction, MapOutputStatistics, SparkException}
 import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.catalyst.InternalRow
@@ -28,6 +30,8 @@ import org.apache.spark.sql.catalyst.plans.logical.Statistics
 import org.apache.spark.sql.catalyst.plans.physical.Partitioning
 import org.apache.spark.sql.execution._
 import org.apache.spark.sql.execution.exchange._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.ThreadUtils
 
 /**
  * A query stage is an independent subgraph of the query plan. Query stage 
materializes its output
@@ -100,8 +104,8 @@ abstract class QueryStageExec extends LeafExecNode {
   override def executeTail(n: Int): Array[InternalRow] = plan.executeTail(n)
   override def executeToIterator(): Iterator[InternalRow] = 
plan.executeToIterator()
 
-  override def doPrepare(): Unit = plan.prepare()
-  override def doExecute(): RDD[InternalRow] = plan.execute()
+  protected override def doPrepare(): Unit = plan.prepare()
+  protected override def doExecute(): RDD[InternalRow] = plan.execute()
   override def doExecuteBroadcast[T](): Broadcast[T] = plan.executeBroadcast()
   override def doCanonicalize(): SparkPlan = plan.canonicalized
 
@@ -187,8 +191,24 @@ case class BroadcastQueryStageExec(
   throw new IllegalStateException("wrong plan for broadcast stage:\n " + 
plan.treeString)
   }
 
+  @transient private lazy val materializeWithTimeout = {
+val broadcastFuture = broadcast.completionFuture
+val timeout = SQLConf.get.broadcastTimeout
+val promise = Promise[Any]()
+val fail = BroadcastQueryStageExec.scheduledExecutor.schedule(new 
Runnable() {
+  override def run(): Unit = {
+promise.tryFailure(new SparkException(s"Could not execute broadcast in 
$timeout secs. " +
+  s"You can increase the timeout for broadcasts via 
${SQLConf.BROADCAST_TIMEOUT.key} or " +
+  s"disable broadcast join by setting 
${SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key} to -1&

[spark] branch master updated: [SPARK-31475][SQL] Broadcast stage in AQE did not timeout

2020-04-20 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 44d370d  [SPARK-31475][SQL] Broadcast stage in AQE did not timeout
44d370d is described below

commit 44d370dd4501f0a4abb7194f7cff0d346aac0992
Author: Maryann Xue 
AuthorDate: Mon Apr 20 11:55:48 2020 -0700

[SPARK-31475][SQL] Broadcast stage in AQE did not timeout

### What changes were proposed in this pull request?

This PR adds a timeout for the Future of a BroadcastQueryStageExec to make 
sure it can have the same timeout behavior as a non-AQE broadcast exchange.

### Why are the changes needed?

This is to make the broadcast timeout behavior in AQE consistent with that 
in non-AQE.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added UT.

Closes #28250 from maryannxue/aqe-broadcast-timeout.

Authored-by: Maryann Xue 
Signed-off-by: gatorsmile 
---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../sql/execution/adaptive/QueryStageExec.scala| 35 ++
 .../execution/exchange/BroadcastExchangeExec.scala |  8 ++---
 .../sql/execution/joins/BroadcastJoinSuite.scala   | 23 --
 4 files changed, 56 insertions(+), 12 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index 3ac4ea5..f819937 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -547,7 +547,7 @@ case class AdaptiveSparkPlanExec(
 }
 
 object AdaptiveSparkPlanExec {
-  private val executionContext = ExecutionContext.fromExecutorService(
+  private[adaptive] val executionContext = 
ExecutionContext.fromExecutorService(
 ThreadUtils.newDaemonCachedThreadPool("QueryStageCreator", 16))
 
   /**
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
index beaa972..f414f85 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
@@ -17,9 +17,11 @@
 
 package org.apache.spark.sql.execution.adaptive
 
-import scala.concurrent.Future
+import java.util.concurrent.TimeUnit
 
-import org.apache.spark.{FutureAction, MapOutputStatistics}
+import scala.concurrent.{Future, Promise}
+
+import org.apache.spark.{FutureAction, MapOutputStatistics, SparkException}
 import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.catalyst.InternalRow
@@ -28,6 +30,8 @@ import org.apache.spark.sql.catalyst.plans.logical.Statistics
 import org.apache.spark.sql.catalyst.plans.physical.Partitioning
 import org.apache.spark.sql.execution._
 import org.apache.spark.sql.execution.exchange._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.ThreadUtils
 
 /**
  * A query stage is an independent subgraph of the query plan. Query stage 
materializes its output
@@ -100,8 +104,8 @@ abstract class QueryStageExec extends LeafExecNode {
   override def executeTail(n: Int): Array[InternalRow] = plan.executeTail(n)
   override def executeToIterator(): Iterator[InternalRow] = 
plan.executeToIterator()
 
-  override def doPrepare(): Unit = plan.prepare()
-  override def doExecute(): RDD[InternalRow] = plan.execute()
+  protected override def doPrepare(): Unit = plan.prepare()
+  protected override def doExecute(): RDD[InternalRow] = plan.execute()
   override def doExecuteBroadcast[T](): Broadcast[T] = plan.executeBroadcast()
   override def doCanonicalize(): SparkPlan = plan.canonicalized
 
@@ -187,8 +191,24 @@ case class BroadcastQueryStageExec(
   throw new IllegalStateException("wrong plan for broadcast stage:\n " + 
plan.treeString)
   }
 
+  @transient private lazy val materializeWithTimeout = {
+val broadcastFuture = broadcast.completionFuture
+val timeout = SQLConf.get.broadcastTimeout
+val promise = Promise[Any]()
+val fail = BroadcastQueryStageExec.scheduledExecutor.schedule(new 
Runnable() {
+  override def run(): Unit = {
+promise.tryFailure(new SparkException(s"Could not execute broadcast in 
$timeout secs. " +
+  s"You can increase the timeout for broadcasts via 
${SQLConf.BROADCAST_TIMEOUT.key} or " +
+  s"disable broadcast join by setting 
${SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key} to -1&

[spark] branch master updated (55dea9b -> 2c39502)

2020-04-02 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 55dea9b  [SPARK-29153][CORE] Add ability to merge resource profiles 
within a stage with Stage Level Scheduling
 add 2c39502  [SPARK-31253][SQL][FOLLOWUP] Add metrics to AQE shuffle reader

No new revisions were added by this update.

Summary of changes:
 .../adaptive/CustomShuffleReaderExec.scala | 27 ++
 .../execution/adaptive/OptimizeSkewedJoin.scala| 14 ++-
 2 files changed, 20 insertions(+), 21 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (55dea9b -> 2c39502)

2020-04-02 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 55dea9b  [SPARK-29153][CORE] Add ability to merge resource profiles 
within a stage with Stage Level Scheduling
 add 2c39502  [SPARK-31253][SQL][FOLLOWUP] Add metrics to AQE shuffle reader

No new revisions were added by this update.

Summary of changes:
 .../adaptive/CustomShuffleReaderExec.scala | 27 ++
 .../execution/adaptive/OptimizeSkewedJoin.scala| 14 ++-
 2 files changed, 20 insertions(+), 21 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (590b9a0 -> 34c7ec8)

2020-03-31 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 590b9a0  [SPARK-31010][SQL][FOLLOW-UP] Add Java UDF suggestion in 
error message of untyped Scala UDF
 add 34c7ec8  [SPARK-31253][SQL] Add metrics to AQE shuffle reader

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/ShuffledRowRDD.scala   |  16 ++-
 .../adaptive/CoalesceShufflePartitions.scala   |   6 +-
 .../adaptive/CustomShuffleReaderExec.scala | 114 ++---
 .../adaptive/OptimizeLocalShuffleReader.scala  |  15 ++-
 .../execution/adaptive/OptimizeSkewedJoin.scala|  82 ---
 .../sql/execution/adaptive/QueryStageExec.scala|   5 +
 .../execution/CoalesceShufflePartitionsSuite.scala |  23 +++--
 .../adaptive/AdaptiveQueryExecSuite.scala  |  74 -
 8 files changed, 229 insertions(+), 106 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31087] [SQL] Add Back Multiple Removed APIs

2020-03-28 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f375930  [SPARK-31087] [SQL] Add Back Multiple Removed APIs
f375930 is described below

commit f375930d81337f2facbe5da71bb126d4d935e49d
Author: gatorsmile 
AuthorDate: Sat Mar 28 22:05:16 2020 -0700

[SPARK-31087] [SQL] Add Back Multiple Removed APIs

### What changes were proposed in this pull request?

Based on the discussion in the mailing list [[Proposal] Modification to 
Spark's Semantic Versioning 
Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html)
 , this PR is to add back the following APIs whose maintenance cost are 
relatively small.

- functions.toDegrees/toRadians
- functions.approxCountDistinct
- functions.monotonicallyIncreasingId
- Column.!==
- Dataset.explode
- Dataset.registerTempTable
- SQLContext.getOrCreate, setActive, clearActive, constructors

Below is the other removed APIs in the original PR, but not added back in 
this PR [https://issues.apache.org/jira/browse/SPARK-25908]:

- Remove some AccumulableInfo .apply() methods
- Remove non-label-specific multiclass precision/recall/fScore in favor of 
accuracy
- Remove unused Python StorageLevel constants
- Remove unused multiclass option in libsvm parsing
- Remove references to deprecated spark configs like spark.yarn.am.port
- Remove TaskContext.isRunningLocally
- Remove ShuffleMetrics.shuffle* methods
- Remove BaseReadWrite.context in favor of session

### Why are the changes needed?
Avoid breaking the APIs that are commonly used.

### Does this PR introduce any user-facing change?
Adding back the APIs that were removed in 3.0 branch does not introduce the 
user-facing changes, because Spark 3.0 has not been released.

### How was this patch tested?
Added a new test suite for these APIs.

Author: gatorsmile 
Author: yi.wu 

Closes #27821 from gatorsmile/addAPIBackV2.

(cherry picked from commit 3884455780a214c620f309e00d5a083039746755)
Signed-off-by: gatorsmile 
---
 project/MimaExcludes.scala |   8 --
 python/pyspark/sql/dataframe.py|  19 
 python/pyspark/sql/functions.py|  11 ++
 .../main/scala/org/apache/spark/sql/Column.scala   |  18 
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  98 ++
 .../scala/org/apache/spark/sql/SQLContext.scala|  50 -
 .../scala/org/apache/spark/sql/functions.scala |  79 ++
 .../org/apache/spark/sql/DataFrameSuite.scala  |  46 +
 .../org/apache/spark/sql/DeprecatedAPISuite.scala  | 114 +
 .../org/apache/spark/sql/SQLContextSuite.scala |  30 --
 10 files changed, 458 insertions(+), 15 deletions(-)

diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index 9a5029e..d1ed48a 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -235,14 +235,6 @@ object MimaExcludes {
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.ShuffleWriteMetrics.shuffleWriteTime"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.ShuffleWriteMetrics.shuffleRecordsWritten"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.AccumulableInfo.apply"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.approxCountDistinct"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.toRadians"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.toDegrees"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.monotonicallyIncreasingId"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.clearActive"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.getOrCreate"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.setActive"),
-
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.SQLContext.this"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.evaluation.MulticlassMetrics.fMeasure"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.evaluation.MulticlassMetrics.recall"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.evaluati

[spark] branch master updated: [SPARK-31087] [SQL] Add Back Multiple Removed APIs

2020-03-28 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3884455  [SPARK-31087] [SQL] Add Back Multiple Removed APIs
3884455 is described below

commit 3884455780a214c620f309e00d5a083039746755
Author: gatorsmile 
AuthorDate: Sat Mar 28 22:05:16 2020 -0700

[SPARK-31087] [SQL] Add Back Multiple Removed APIs

### What changes were proposed in this pull request?

Based on the discussion in the mailing list [[Proposal] Modification to 
Spark's Semantic Versioning 
Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html)
 , this PR is to add back the following APIs whose maintenance cost are 
relatively small.

- functions.toDegrees/toRadians
- functions.approxCountDistinct
- functions.monotonicallyIncreasingId
- Column.!==
- Dataset.explode
- Dataset.registerTempTable
- SQLContext.getOrCreate, setActive, clearActive, constructors

Below is the other removed APIs in the original PR, but not added back in 
this PR [https://issues.apache.org/jira/browse/SPARK-25908]:

- Remove some AccumulableInfo .apply() methods
- Remove non-label-specific multiclass precision/recall/fScore in favor of 
accuracy
- Remove unused Python StorageLevel constants
- Remove unused multiclass option in libsvm parsing
- Remove references to deprecated spark configs like spark.yarn.am.port
- Remove TaskContext.isRunningLocally
- Remove ShuffleMetrics.shuffle* methods
- Remove BaseReadWrite.context in favor of session

### Why are the changes needed?
Avoid breaking the APIs that are commonly used.

### Does this PR introduce any user-facing change?
Adding back the APIs that were removed in 3.0 branch does not introduce the 
user-facing changes, because Spark 3.0 has not been released.

### How was this patch tested?
Added a new test suite for these APIs.

Author: gatorsmile 
Author: yi.wu 

Closes #27821 from gatorsmile/addAPIBackV2.
---
 project/MimaExcludes.scala |   8 --
 python/pyspark/sql/dataframe.py|  19 
 python/pyspark/sql/functions.py|  11 ++
 .../main/scala/org/apache/spark/sql/Column.scala   |  18 
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  98 ++
 .../scala/org/apache/spark/sql/SQLContext.scala|  50 -
 .../scala/org/apache/spark/sql/functions.scala |  79 ++
 .../org/apache/spark/sql/DataFrameSuite.scala  |  46 +
 .../org/apache/spark/sql/DeprecatedAPISuite.scala  | 114 +
 .../org/apache/spark/sql/SQLContextSuite.scala |  30 --
 10 files changed, 458 insertions(+), 15 deletions(-)

diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index 3f521e6..f28ae56 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -242,14 +242,6 @@ object MimaExcludes {
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.ShuffleWriteMetrics.shuffleWriteTime"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.ShuffleWriteMetrics.shuffleRecordsWritten"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.AccumulableInfo.apply"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.approxCountDistinct"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.toRadians"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.toDegrees"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.functions.monotonicallyIncreasingId"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.clearActive"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.getOrCreate"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.setActive"),
-
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.SQLContext.this"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.evaluation.MulticlassMetrics.fMeasure"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.evaluation.MulticlassMetrics.recall"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.evaluation.MulticlassMetrics.precision"),
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataf

[spark] branch master updated (b7e4cc7 -> b9eafcb)

2020-03-27 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b7e4cc7  [SPARK-31086][SQL] Add Back the Deprecated SQLContext methods
 add b9eafcb  [SPARK-31088][SQL] Add back HiveContext and 
createExternalTable

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|   4 -
 project/MimaExcludes.scala |   2 -
 python/pyspark/__init__.py |   2 +-
 python/pyspark/sql/__init__.py |   4 +-
 python/pyspark/sql/catalog.py  |  20 
 python/pyspark/sql/context.py  |  67 +-
 .../scala/org/apache/spark/sql/SQLContext.scala|  91 ++
 .../org/apache/spark/sql/catalog/Catalog.scala | 102 +++-
 .../DeprecatedCreateExternalTableSuite.scala   |  85 +
 .../org/apache/spark/sql/hive/HiveContext.scala|  63 +
 .../sql/hive/HiveContextCompatibilitySuite.scala   | 103 +
 11 files changed, 532 insertions(+), 11 deletions(-)
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/internal/DeprecatedCreateExternalTableSuite.scala
 create mode 100644 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
 create mode 100644 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31088][SQL] Add back HiveContext and createExternalTable

2020-03-27 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2a449df  [SPARK-31088][SQL] Add back HiveContext and 
createExternalTable
2a449df is described below

commit 2a449df305d5f8495959fd71d937e0f5f4fff87d
Author: gatorsmile 
AuthorDate: Thu Mar 26 23:51:15 2020 -0700

[SPARK-31088][SQL] Add back HiveContext and createExternalTable

### What changes were proposed in this pull request?
Based on the discussion in the mailing list [[Proposal] Modification to 
Spark's Semantic Versioning 
Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html)
 , this PR is to add back the following APIs whose maintenance cost are 
relatively small.

- HiveContext
- createExternalTable APIs

### Why are the changes needed?

Avoid breaking the APIs that are commonly used.

### Does this PR introduce any user-facing change?
Adding back the APIs that were removed in 3.0 branch does not introduce the 
user-facing changes, because Spark 3.0 has not been released.

### How was this patch tested?

add a new test suite for createExternalTable APIs.

Closes #27815 from gatorsmile/addAPIsBack.

Lead-authored-by: gatorsmile 
Co-authored-by: yi.wu 
Signed-off-by: gatorsmile 
(cherry picked from commit b9eafcb52658b7f5ec60bb4ebcc9da0fde94e105)
Signed-off-by: gatorsmile 
---
 docs/sql-migration-guide.md|   4 -
 project/MimaExcludes.scala |   2 -
 python/pyspark/__init__.py |   2 +-
 python/pyspark/sql/__init__.py |   4 +-
 python/pyspark/sql/catalog.py  |  20 
 python/pyspark/sql/context.py  |  67 +-
 .../scala/org/apache/spark/sql/SQLContext.scala|  91 ++
 .../org/apache/spark/sql/catalog/Catalog.scala | 102 +++-
 .../DeprecatedCreateExternalTableSuite.scala   |  85 +
 .../org/apache/spark/sql/hive/HiveContext.scala|  63 +
 .../sql/hive/HiveContextCompatibilitySuite.scala   | 103 +
 11 files changed, 532 insertions(+), 11 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index d2773d8..ab35e1f 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -309,10 +309,6 @@ license: |
 
 ### Others
 
-  - In Spark 3.0, the deprecated methods `SQLContext.createExternalTable` and 
`SparkSession.createExternalTable` have been removed in favor of its 
replacement, `createTable`.
-
-  - In Spark 3.0, the deprecated `HiveContext` class has been removed. Use 
`SparkSession.builder.enableHiveSupport()` instead.
-
   - In Spark version 2.4, when a spark session is created via 
`cloneSession()`, the newly created spark session inherits its configuration 
from its parent `SparkContext` even though the same configuration may exist 
with a different value in its parent spark session. Since Spark 3.0, the 
configurations of a parent `SparkSession` have a higher precedence over the 
parent `SparkContext`. The old behavior can be restored by setting 
`spark.sql.legacy.sessionInitWithConfigDefaults` to `true`.
 
   - Since Spark 3.0, if `hive.default.fileformat` is not found in `Spark SQL 
configuration` then it will fallback to hive-site.xml present in the `Hadoop 
configuration` of `SparkContext`.
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index f8ad60b..9a5029e 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -48,8 +48,6 @@ object MimaExcludes {
 
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.ExecutorPlugin"),
 
 // [SPARK-28980][SQL][CORE][MLLIB] Remove more old deprecated items in 
Spark 3
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.createExternalTable"),
-
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.createExternalTable"),
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.clustering.KMeans.train"),
 
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.mllib.clustering.KMeans.train"),
 
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.mllib.classification.LogisticRegressionWithSGD$"),
diff --git a/python/pyspark/__init__.py b/python/pyspark/__init__.py
index 76a5bd0..70c0b27 100644
--- a/python/pyspark/__init__.py
+++ b/python/pyspark/__init__.py
@@ -113,7 +113,7 @@ def keyword_only(func):
 
 
 # for back compatibility
-from pyspark.sql import SQLContext, Row
+from pyspark.sql import

[spark] branch master updated (cb0db21 -> b7e4cc7)

2020-03-27 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cb0db21  
[SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][TEST-HIVE1.2] Nested 
Column Predicate Pushdown for Parquet
 add b7e4cc7  [SPARK-31086][SQL] Add Back the Deprecated SQLContext methods

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/SQLContext.scala| 283 +
 .../org/apache/spark/sql/DeprecatedAPISuite.scala  | 106 
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |  14 +
 3 files changed, 403 insertions(+)
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/DeprecatedAPISuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31086][SQL] Add Back the Deprecated SQLContext methods

2020-03-27 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ebc358c  [SPARK-31086][SQL] Add Back the Deprecated SQLContext methods
ebc358c is described below

commit ebc358c8d2b6d67c7319be006452c9c993b7a098
Author: gatorsmile 
AuthorDate: Thu Mar 26 23:49:24 2020 -0700

[SPARK-31086][SQL] Add Back the Deprecated SQLContext methods

### What changes were proposed in this pull request?

Based on the discussion in the mailing list [[Proposal] Modification to 
Spark's Semantic Versioning 
Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html)
 , this PR is to add back the following APIs whose maintenance cost are 
relatively small.

- SQLContext.applySchema
- SQLContext.parquetFile
- SQLContext.jsonFile
- SQLContext.jsonRDD
- SQLContext.load
- SQLContext.jdbc

### Why are the changes needed?
Avoid breaking the APIs that are commonly used.

### Does this PR introduce any user-facing change?
Adding back the APIs that were removed in 3.0 branch does not introduce the 
user-facing changes, because Spark 3.0 has not been released.

### How was this patch tested?
The existing tests.

Closes #27839 from gatorsmile/addAPIBackV3.

Lead-authored-by: gatorsmile 
Co-authored-by: yi.wu 
Signed-off-by: gatorsmile 
(cherry picked from commit b7e4cc775b7eac68606d1f385911613f5139db1b)
Signed-off-by: gatorsmile 
---
 .../scala/org/apache/spark/sql/SQLContext.scala| 283 +
 .../org/apache/spark/sql/DeprecatedAPISuite.scala  | 106 
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |  14 +
 3 files changed, 403 insertions(+)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
index 2054874..592c64c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
@@ -611,6 +611,289 @@ class SQLContext private[sql](val sparkSession: 
SparkSession)
 sessionState.catalog.listTables(databaseName).map(_.table).toArray
   }
 
+  
+  
+  // Deprecated methods
+  
+  
+
+  /**
+   * @deprecated As of 1.3.0, replaced by `createDataFrame()`.
+   */
+  @deprecated("Use createDataFrame instead.", "1.3.0")
+  def applySchema(rowRDD: RDD[Row], schema: StructType): DataFrame = {
+createDataFrame(rowRDD, schema)
+  }
+
+  /**
+   * @deprecated As of 1.3.0, replaced by `createDataFrame()`.
+   */
+  @deprecated("Use createDataFrame instead.", "1.3.0")
+  def applySchema(rowRDD: JavaRDD[Row], schema: StructType): DataFrame = {
+createDataFrame(rowRDD, schema)
+  }
+
+  /**
+   * @deprecated As of 1.3.0, replaced by `createDataFrame()`.
+   */
+  @deprecated("Use createDataFrame instead.", "1.3.0")
+  def applySchema(rdd: RDD[_], beanClass: Class[_]): DataFrame = {
+createDataFrame(rdd, beanClass)
+  }
+
+  /**
+   * @deprecated As of 1.3.0, replaced by `createDataFrame()`.
+   */
+  @deprecated("Use createDataFrame instead.", "1.3.0")
+  def applySchema(rdd: JavaRDD[_], beanClass: Class[_]): DataFrame = {
+createDataFrame(rdd, beanClass)
+  }
+
+  /**
+   * Loads a Parquet file, returning the result as a `DataFrame`. This 
function returns an empty
+   * `DataFrame` if no paths are passed in.
+   *
+   * @group specificdata
+   * @deprecated As of 1.4.0, replaced by `read().parquet()`.
+   */
+  @deprecated("Use read.parquet() instead.", "1.4.0")
+  @scala.annotation.varargs
+  def parquetFile(paths: String*): DataFrame = {
+if (paths.isEmpty) {
+  emptyDataFrame
+} else {
+  read.parquet(paths : _*)
+}
+  }
+
+  /**
+   * Loads a JSON file (one object per line), returning the result as a 
`DataFrame`.
+   * It goes through the entire dataset once to determine the schema.
+   *
+   * @group specificdata
+   * @deprecated As of 1.4.0, replaced by `read().json()`.
+   */
+  @deprecated("Use read.json() instead.", "1.4.0")
+  def jsonFile(path: String): DataFrame = {
+read.json(path)
+  }
+
+  /**
+   * Loads a JSON file (one object per line) and applies the given schema,
+   * returning the result as a `DataFrame`.
+   *
+   * @group specificdata
+   * @deprecated As of 1.4.0, replaced by `rea

[spark] branch master updated (1369a97 -> 30d9535)

2020-03-17 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1369a97  [SPARK-31164][SQL] Inconsistent rdd and output partitioning 
for bucket table when output doesn't contain all bucket columns
 add 30d9535  [SPARK-31134][SQL] optimize skew join after shuffle 
partitions are coalesced

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |   9 +-
 .../adaptive/CoalesceShufflePartitions.scala   |   2 -
 .../execution/adaptive/OptimizeSkewedJoin.scala| 272 ++---
 .../execution/adaptive/ShufflePartitionsUtil.scala |  18 +-
 .../sql/execution/ShufflePartitionsUtilSuite.scala |   2 -
 5 files changed, 146 insertions(+), 157 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31134][SQL] optimize skew join after shuffle partitions are coalesced

2020-03-17 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 0512b3f  [SPARK-31134][SQL] optimize skew join after shuffle 
partitions are coalesced
0512b3f is described below

commit 0512b3f427274c8bda249fba02cd16f5694a4ea5
Author: Wenchen Fan 
AuthorDate: Tue Mar 17 00:23:16 2020 -0700

[SPARK-31134][SQL] optimize skew join after shuffle partitions are coalesced

### What changes were proposed in this pull request?

Run the `OptimizeSkewedJoin` rule after the `CoalesceShufflePartitions` 
rule.

### Why are the changes needed?

Remove duplicated coalescing code in `OptimizeSkewedJoin`.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

existing tests

Closes #27893 from cloud-fan/aqe.

Authored-by: Wenchen Fan 
Signed-off-by: gatorsmile 
(cherry picked from commit 30d95356f1881c32eb39e51525d2bcb331fcf867)
Signed-off-by: gatorsmile 
---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |   9 +-
 .../adaptive/CoalesceShufflePartitions.scala   |   2 -
 .../execution/adaptive/OptimizeSkewedJoin.scala| 272 ++---
 .../execution/adaptive/ShufflePartitionsUtil.scala |  18 +-
 .../sql/execution/ShufflePartitionsUtilSuite.scala |   2 -
 5 files changed, 146 insertions(+), 157 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index 68da06d..b54a32f 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -96,13 +96,10 @@ case class AdaptiveSparkPlanExec(
   // optimizations should be stage-independent.
   @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
 ReuseAdaptiveSubquery(conf, context.subqueryCache),
-// Here the 'OptimizeSkewedJoin' rule should be executed
-// before 'CoalesceShufflePartitions', as the skewed partition handled
-// in 'OptimizeSkewedJoin' rule, should be omitted in 
'CoalesceShufflePartitions'.
-OptimizeSkewedJoin(conf),
 CoalesceShufflePartitions(context.session),
-// The rule of 'OptimizeLocalShuffleReader' need to make use of the 
'partitionStartIndices'
-// in 'CoalesceShufflePartitions' rule. So it must be after 
'CoalesceShufflePartitions' rule.
+// The following two rules need to make use of 
'CustomShuffleReaderExec.partitionSpecs'
+// added by `CoalesceShufflePartitions`. So they must be executed after it.
+OptimizeSkewedJoin(conf),
 OptimizeLocalShuffleReader(conf),
 ApplyColumnarRulesAndInsertTransitions(conf, 
context.session.sessionState.columnarRules),
 CollapseCodegenStages(conf)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
index d2a7f6a..226d692 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
@@ -74,8 +74,6 @@ case class CoalesceShufflePartitions(session: SparkSession) 
extends Rule[SparkPl
   .getOrElse(session.sparkContext.defaultParallelism)
 val partitionSpecs = ShufflePartitionsUtil.coalescePartitions(
   validMetrics.toArray,
-  firstPartitionIndex = 0,
-  lastPartitionIndex = distinctNumPreShufflePartitions.head,
   advisoryTargetSize = 
conf.getConf(SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES),
   minNumPartitions = minPartitionNum)
 // This transformation adds new nodes, so we must use `transformUp` 
here.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
index db65af6..e02b9af 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
@@ -21,7 +21,7 @@ import scala.collection.mutable
 
 import org.apache.commons.io.FileUtils
 
-import org.apache.spark.{MapOutputStatistics, MapOutputTrackerMaster, 
SparkContext, SparkEnv}
+import org.apache.spark.{MapOutputStatistics, MapOutputTrackerMaster, SparkEnv}
 import org.apache.spark.sql.catalyst.plans._
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.execution._
@@ -83,14 +83,14 @@ case

[spark-website] branch asf-site updated: Add "Amend Spark's Semantic Versioning Policy" #263

2020-03-14 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 6f1e0de  Add "Amend Spark's Semantic Versioning Policy" #263
6f1e0de is described below

commit 6f1e0deb6632f75ad0492ffba372f1ebb828ddfb
Author: Xiao Li 
AuthorDate: Sat Mar 14 17:40:30 2020 -0700

Add "Amend Spark's Semantic Versioning Policy" #263

The vote of  "Amend Spark's Semantic Versioning Policy" passed in the dev 
mailing list 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Amend-Spark-s-Semantic-Versioning-Policy-td28988.html

This PR is to add it to the versioning-policy page.


![image](https://user-images.githubusercontent.com/11567269/76592244-063e7680-64b0-11ea-9875-c0e8573d7321.png)
---
 site/versioning-policy.html | 77 +
 versioning-policy.md| 47 +++
 2 files changed, 124 insertions(+)

diff --git a/site/versioning-policy.html b/site/versioning-policy.html
index 34547e8..679e9b2 100644
--- a/site/versioning-policy.html
+++ b/site/versioning-policy.html
@@ -245,6 +245,83 @@ maximum compatibility. Code should not be merged into the 
project as expe
 a plan to change the API later, because users expect the maximum compatibility 
from all 
 available APIs.
 
+Considerations When Breaking APIs
+
+The Spark project strives to avoid breaking APIs or silently changing 
behavior, even at major versions. While this is not always possible, the 
balance of the following factors should be considered before choosing to break 
an API.
+
+Cost of Breaking an API
+
+Breaking an API almost always has a non-trivial cost to the users of Spark. 
A broken API means that Spark programs need to be rewritten before they can be 
upgraded. However, there are a few considerations when thinking about what the 
cost will be:
+
+
+  Usage - an API that is actively used in many different 
places, is always very costly to break. While it is hard to know usage for 
sure, there are a bunch of ways that we can estimate:
+
+  
+How long has the API been in Spark?
+  
+  
+Is the API common even for basic programs?
+  
+  
+How often do we see recent questions in JIRA or mailing lists?
+  
+  
+How often does it appear in StackOverflow or blogs?
+  
+
+  
+  
+Behavior after the break - How will a program that 
works today, work after the break? The following are listed roughly in order of 
increasing severity:
+
+
+  
+Will there be a compiler or linker error?
+  
+  
+Will there be a runtime exception?
+  
+  
+Will that exception happen after significant processing has been 
done?
+  
+  
+Will we silently return different answers? (very hard to debug, 
might not even notice!)
+  
+
+  
+
+
+Cost of Maintaining an API
+
+Of course, the above does not mean that we will never 
break any APIs. We must also consider the cost both to the 
project and to our users of keeping the API in question.
+
+
+  
+Project Costs - Every API we have needs to be tested 
and needs to keep working as other parts of the project changes. These costs 
are significantly exacerbated when external dependencies change (the JVM, 
Scala, etc). In some cases, while not completely technically infeasible, the 
cost of maintaining a particular API can become too high.
+  
+  
+User Costs - APIs also have a cognitive cost to users 
learning Spark or trying to understand Spark programs. This cost becomes even 
higher when the API in question has confusing or undefined semantics.
+  
+
+
+Alternatives to Breaking an API
+
+In cases where there is a Bad API, but where the cost of 
removal is also high, there are alternatives that should be considered that do 
not hurt existing users but do address some of the maintenance costs.
+
+
+  
+Avoid Bad APIs - While this is a bit obvious, it is an 
important point. Anytime we are adding a new interface to Spark we should 
consider that we might be stuck with this API forever. Think deeply about how 
new APIs relate to existing ones, as well as how you expect them to evolve over 
time.
+  
+  
+Deprecation Warnings - All deprecation warnings should 
point to a clear alternative and should never just say that an API is 
deprecated.
+  
+  
+Updated Docs - Documentation should point to the 
best recommended way of performing a given task. In the cases 
where we maintain legacy documentation, we should clearly point to newer APIs 
and suggest to users the right way.
+  
+  
+Community Work - Many people learn Spark by reading 
blogs and other sites such as StackOverflow. However, many of these resources 
are out of date. Update them, to reduce t

[spark] branch branch-3.0 updated: [SPARK-31070][SQL] make skew join split skewed partitions more evenly

2020-03-10 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3f23529  [SPARK-31070][SQL] make skew join split skewed partitions 
more evenly
3f23529 is described below

commit 3f23529cac3a306afe0ed175b8034d4f24b08acb
Author: Wenchen Fan 
AuthorDate: Tue Mar 10 21:50:44 2020 -0700

[SPARK-31070][SQL] make skew join split skewed partitions more evenly



### What changes were proposed in this pull request?

There are two problems when splitting skewed partitions:
1. It's impossible that we can't split the skewed partition, then we 
shouldn't create a skew join.
2. When splitting, it's possible that we create a partition for very small 
amount of data..

This PR fixes them
1. don't create `PartialReducerPartitionSpec` if we can't split.
2. merge small partitions to the previous partition.
### Why are the changes needed?

make skew join split skewed partitions more evenly

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

updated test

Closes #27833 from cloud-fan/aqe.

Authored-by: Wenchen Fan 
Signed-off-by: gatorsmile 
(cherry picked from commit d5f5720efa7232f1339976462d462a7360978ab5)
Signed-off-by: gatorsmile 
---
 .../adaptive/CoalesceShufflePartitions.scala   |  2 +-
 .../execution/adaptive/OptimizeSkewedJoin.scala| 44 +++
 ...Coalescer.scala => ShufflePartitionsUtil.scala} | 50 +-
 ...uite.scala => ShufflePartitionsUtilSuite.scala} | 32 --
 .../adaptive/AdaptiveQueryExecSuite.scala  | 14 +++---
 5 files changed, 102 insertions(+), 40 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
index a8e2d8e..d779a20 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
@@ -66,7 +66,7 @@ case class CoalesceShufflePartitions(conf: SQLConf) extends 
Rule[SparkPlan] {
   val distinctNumPreShufflePartitions =
 validMetrics.map(stats => stats.bytesByPartitionId.length).distinct
   if (validMetrics.nonEmpty && distinctNumPreShufflePartitions.length == 
1) {
-val partitionSpecs = ShufflePartitionsCoalescer.coalescePartitions(
+val partitionSpecs = ShufflePartitionsUtil.coalescePartitions(
   validMetrics.toArray,
   firstPartitionIndex = 0,
   lastPartitionIndex = distinctNumPreShufflePartitions.head,
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
index 4387409..7f52393 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
@@ -18,7 +18,6 @@
 package org.apache.spark.sql.execution.adaptive
 
 import scala.collection.mutable
-import scala.collection.mutable.ArrayBuffer
 
 import org.apache.commons.io.FileUtils
 
@@ -111,22 +110,7 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends 
Rule[SparkPlan] {
   targetSize: Long): Seq[Int] = {
 val shuffleId = stage.shuffle.shuffleDependency.shuffleHandle.shuffleId
 val mapPartitionSizes = getMapSizesForReduceId(shuffleId, partitionId)
-val partitionStartIndices = ArrayBuffer[Int]()
-partitionStartIndices += 0
-var i = 0
-var postMapPartitionSize = 0L
-while (i < mapPartitionSizes.length) {
-  val nextMapPartitionSize = mapPartitionSizes(i)
-  if (i > 0 && postMapPartitionSize + nextMapPartitionSize > targetSize) {
-partitionStartIndices += i
-postMapPartitionSize = nextMapPartitionSize
-  } else {
-postMapPartitionSize += nextMapPartitionSize
-  }
-  i += 1
-}
-
-partitionStartIndices
+ShufflePartitionsUtil.splitSizeListByTargetSize(mapPartitionSizes, 
targetSize)
   }
 
   private def getStatistics(stage: ShuffleQueryStageExec): MapOutputStatistics 
= {
@@ -211,21 +195,25 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends 
Rule[SparkPlan] {
   }
 
   val leftParts = if (isLeftSkew) {
-leftSkewDesc.addPartitionSize(leftSize)
-createSkewPartitions(
-  partitionIndex,
-  getMapStartIndices(left, partitionIndex, leftTargetSize),
-  getNumMappers(left))
+val mapStartIndices = getMa

[spark] branch branch-3.0 updated: [SPARK-31070][SQL] make skew join split skewed partitions more evenly

2020-03-10 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3f23529  [SPARK-31070][SQL] make skew join split skewed partitions 
more evenly
3f23529 is described below

commit 3f23529cac3a306afe0ed175b8034d4f24b08acb
Author: Wenchen Fan 
AuthorDate: Tue Mar 10 21:50:44 2020 -0700

[SPARK-31070][SQL] make skew join split skewed partitions more evenly



### What changes were proposed in this pull request?

There are two problems when splitting skewed partitions:
1. It's impossible that we can't split the skewed partition, then we 
shouldn't create a skew join.
2. When splitting, it's possible that we create a partition for very small 
amount of data..

This PR fixes them
1. don't create `PartialReducerPartitionSpec` if we can't split.
2. merge small partitions to the previous partition.
### Why are the changes needed?

make skew join split skewed partitions more evenly

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

updated test

Closes #27833 from cloud-fan/aqe.

Authored-by: Wenchen Fan 
Signed-off-by: gatorsmile 
(cherry picked from commit d5f5720efa7232f1339976462d462a7360978ab5)
Signed-off-by: gatorsmile 
---
 .../adaptive/CoalesceShufflePartitions.scala   |  2 +-
 .../execution/adaptive/OptimizeSkewedJoin.scala| 44 +++
 ...Coalescer.scala => ShufflePartitionsUtil.scala} | 50 +-
 ...uite.scala => ShufflePartitionsUtilSuite.scala} | 32 --
 .../adaptive/AdaptiveQueryExecSuite.scala  | 14 +++---
 5 files changed, 102 insertions(+), 40 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
index a8e2d8e..d779a20 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
@@ -66,7 +66,7 @@ case class CoalesceShufflePartitions(conf: SQLConf) extends 
Rule[SparkPlan] {
   val distinctNumPreShufflePartitions =
 validMetrics.map(stats => stats.bytesByPartitionId.length).distinct
   if (validMetrics.nonEmpty && distinctNumPreShufflePartitions.length == 
1) {
-val partitionSpecs = ShufflePartitionsCoalescer.coalescePartitions(
+val partitionSpecs = ShufflePartitionsUtil.coalescePartitions(
   validMetrics.toArray,
   firstPartitionIndex = 0,
   lastPartitionIndex = distinctNumPreShufflePartitions.head,
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
index 4387409..7f52393 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
@@ -18,7 +18,6 @@
 package org.apache.spark.sql.execution.adaptive
 
 import scala.collection.mutable
-import scala.collection.mutable.ArrayBuffer
 
 import org.apache.commons.io.FileUtils
 
@@ -111,22 +110,7 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends 
Rule[SparkPlan] {
   targetSize: Long): Seq[Int] = {
 val shuffleId = stage.shuffle.shuffleDependency.shuffleHandle.shuffleId
 val mapPartitionSizes = getMapSizesForReduceId(shuffleId, partitionId)
-val partitionStartIndices = ArrayBuffer[Int]()
-partitionStartIndices += 0
-var i = 0
-var postMapPartitionSize = 0L
-while (i < mapPartitionSizes.length) {
-  val nextMapPartitionSize = mapPartitionSizes(i)
-  if (i > 0 && postMapPartitionSize + nextMapPartitionSize > targetSize) {
-partitionStartIndices += i
-postMapPartitionSize = nextMapPartitionSize
-  } else {
-postMapPartitionSize += nextMapPartitionSize
-  }
-  i += 1
-}
-
-partitionStartIndices
+ShufflePartitionsUtil.splitSizeListByTargetSize(mapPartitionSizes, 
targetSize)
   }
 
   private def getStatistics(stage: ShuffleQueryStageExec): MapOutputStatistics 
= {
@@ -211,21 +195,25 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends 
Rule[SparkPlan] {
   }
 
   val leftParts = if (isLeftSkew) {
-leftSkewDesc.addPartitionSize(leftSize)
-createSkewPartitions(
-  partitionIndex,
-  getMapStartIndices(left, partitionIndex, leftTargetSize),
-  getNumMappers(left))
+val mapStartIndices = getMa

[spark] branch master updated (93def95 -> d5f5720)

2020-03-10 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 93def95  [SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final
 add d5f5720  [SPARK-31070][SQL] make skew join split skewed partitions 
more evenly

No new revisions were added by this update.

Summary of changes:
 .../adaptive/CoalesceShufflePartitions.scala   |  2 +-
 .../execution/adaptive/OptimizeSkewedJoin.scala| 44 +++
 ...Coalescer.scala => ShufflePartitionsUtil.scala} | 50 +-
 ...uite.scala => ShufflePartitionsUtilSuite.scala} | 32 --
 .../adaptive/AdaptiveQueryExecSuite.scala  | 14 +++---
 5 files changed, 102 insertions(+), 40 deletions(-)
 rename 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/{ShufflePartitionsCoalescer.scala
 => ShufflePartitionsUtil.scala} (73%)
 rename 
sql/core/src/test/scala/org/apache/spark/sql/execution/{ShufflePartitionsCoalescerSuite.scala
 => ShufflePartitionsUtilSuite.scala} (88%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (93def95 -> d5f5720)

2020-03-10 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 93def95  [SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final
 add d5f5720  [SPARK-31070][SQL] make skew join split skewed partitions 
more evenly

No new revisions were added by this update.

Summary of changes:
 .../adaptive/CoalesceShufflePartitions.scala   |  2 +-
 .../execution/adaptive/OptimizeSkewedJoin.scala| 44 +++
 ...Coalescer.scala => ShufflePartitionsUtil.scala} | 50 +-
 ...uite.scala => ShufflePartitionsUtilSuite.scala} | 32 --
 .../adaptive/AdaptiveQueryExecSuite.scala  | 14 +++---
 5 files changed, 102 insertions(+), 40 deletions(-)
 rename 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/{ShufflePartitionsCoalescer.scala
 => ShufflePartitionsUtil.scala} (73%)
 rename 
sql/core/src/test/scala/org/apache/spark/sql/execution/{ShufflePartitionsCoalescerSuite.scala
 => ShufflePartitionsUtilSuite.scala} (88%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize

2020-03-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2732980  [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed 
before call doMaterialize
2732980 is described below

commit 27329806c36d0b403153fe1ad0077acb72d92606
Author: yi.wu 
AuthorDate: Tue Mar 3 13:40:51 2020 -0800

[SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call 
doMaterialize

### What changes were proposed in this pull request?

This PR proposes to not cancel a `QueryStageExec` which failed before 
calling `doMaterialize`.

Besides, this PR also includes 2 minor improvements:

* fail fast when stage failed before calling `doMaterialize`

* format Exception with Cause

### Why are the changes needed?

For a stage which failed before materializing the lazy value (e.g. 
`inputRDD`), calling `cancel` on it could re-trigger the same failure again, 
e.g. executing child node again(see `AdaptiveQueryExecSuite`.`SPARK-30291: AQE 
should catch the exceptions when doing materialize` for example). And finally, 
the same failure will be counted 2 times, one is for materialize error and 
another is for cancel error.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Updated test.

Closes #27752 from Ngone51/avoid_cancel_finished_stage.

Authored-by: yi.wu 
Signed-off-by: gatorsmile 
(cherry picked from commit 380e8876316d6ef5a74358be2a04ab20e8b6e7ca)
Signed-off-by: gatorsmile 
---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 23 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  |  3 ++-
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index 4036424..c018ca4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -165,7 +165,7 @@ case class AdaptiveSparkPlanExec(
   stagesToReplace = result.newStages ++ stagesToReplace
   executionId.foreach(onUpdatePlan)
 
-  // Start materialization of all new stages.
+  // Start materialization of all new stages and fail fast if any 
stages failed eagerly
   result.newStages.foreach { stage =>
 try {
   stage.materialize().onComplete { res =>
@@ -176,7 +176,10 @@ case class AdaptiveSparkPlanExec(
 }
   }(AdaptiveSparkPlanExec.executionContext)
 } catch {
-  case e: Throwable => events.offer(StageFailure(stage, e))
+  case e: Throwable =>
+val ex = new SparkException(
+  s"Early failed query stage found: ${stage.treeString}", e)
+cleanUpAndThrowException(Seq(ex), Some(stage.id))
 }
   }
 }
@@ -192,13 +195,12 @@ case class AdaptiveSparkPlanExec(
 stage.resultOption = Some(res)
   case StageFailure(stage, ex) =>
 errors.append(
-  new SparkException(s"Failed to materialize query stage: 
${stage.treeString}." +
-s" and the cause is ${ex.getMessage}", ex))
+  new SparkException(s"Failed to materialize query stage: 
${stage.treeString}.", ex))
 }
 
 // In case of errors, we cancel all running stages and throw exception.
 if (errors.nonEmpty) {
-  cleanUpAndThrowException(errors)
+  cleanUpAndThrowException(errors, None)
 }
 
 // Try re-optimizing and re-planning. Adopt the new plan if its cost 
is equal to or less
@@ -522,9 +524,13 @@ case class AdaptiveSparkPlanExec(
* Cancel all running stages with best effort and throw an Exception 
containing all stage
* materialization errors and stage cancellation errors.
*/
-  private def cleanUpAndThrowException(errors: Seq[SparkException]): Unit = {
+  private def cleanUpAndThrowException(
+   errors: Seq[SparkException],
+   earlyFailedStage: Option[Int]): Unit = {
 val runningStages = currentPhysicalPlan.collect {
-  case s: QueryStageExec => s
+  // earlyFailedStage is the stage which failed before calling 
doMaterialize,
+  // so we should avoid calling cancel on it to re-trigger the failure 
again.
+  case s: QueryStageExec if !earlyFailedStage.contains(s.id) => s
 }
 val cancelErrors = new mutable.ArrayBuffer[SparkException]()
 try {
@@ -539,8 +545,7 @@ case class A

[spark] branch master updated (4a1d273 -> 380e887)

2020-03-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a1d273  [SPARK-30997][SQL] Fix an analysis failure in generators with 
aggregate functions
 add 380e887  [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed 
before call doMaterialize

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 23 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  |  3 ++-
 2 files changed, 16 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30991] Refactor AQE readers and RDDs

2020-03-02 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 597  [SPARK-30991] Refactor AQE readers and RDDs
597 is described below

commit 597b5507448980e4fadbad85ffb104808081
Author: maryannxue 
AuthorDate: Mon Mar 2 16:04:00 2020 -0800

[SPARK-30991] Refactor AQE readers and RDDs

### What changes were proposed in this pull request?
This PR combines `CustomShuffledRowRDD` and `LocalShuffledRowRDD` into 
`ShuffledRowRDD`, and creates `CustomShuffleReaderExec` to unify and replace 
all existing AQE readers: `CoalescedShuffleReaderExec`, 
`LocalShuffleReaderExec` and `SkewJoinShuffleReaderExec`.

### Why are the changes needed?
To reduce code redundancy.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Passed existing UTs.

Closes #27742 from maryannxue/aqe-readers.

Authored-by: maryannxue 
Signed-off-by: gatorsmile 
(cherry picked from commit 473a28c1d032993c7fa515b39f2cb1e3105d65d3)
Signed-off-by: gatorsmile 
---
 .../spark/sql/execution/ShuffledRowRDD.scala   | 142 -
 .../apache/spark/sql/execution/SparkPlanInfo.scala |   2 +-
 .../adaptive/CustomShuffleReaderExec.scala |  81 
 .../execution/adaptive/CustomShuffledRowRDD.scala  | 113 
 .../execution/adaptive/LocalShuffledRowRDD.scala   | 112 
 .../adaptive/OptimizeLocalShuffleReader.scala  |  88 +++--
 .../execution/adaptive/OptimizeSkewedJoin.scala|  72 ++-
 .../adaptive/ReduceNumShufflePartitions.scala  |  49 ++-
 .../adaptive/ShufflePartitionsCoalescer.scala  |  23 ++--
 .../execution/exchange/ShuffleExchangeExec.scala   |  12 +-
 .../ReduceNumShufflePartitionsSuite.scala  |  28 ++--
 .../ShufflePartitionsCoalescerSuite.scala  | 101 ++-
 .../adaptive/AdaptiveQueryExecSuite.scala  |  23 ++--
 13 files changed, 317 insertions(+), 529 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala
index 4c19f95..eb02259 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala
@@ -26,17 +26,28 @@ import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.execution.metric.{SQLMetric, 
SQLShuffleReadMetricsReporter}
 import org.apache.spark.sql.internal.SQLConf
 
+sealed trait ShufflePartitionSpec
+
+// A partition that reads data of one or more reducers, from 
`startReducerIndex` (inclusive) to
+// `endReducerIndex` (exclusive).
+case class CoalescedPartitionSpec(
+  startReducerIndex: Int, endReducerIndex: Int) extends ShufflePartitionSpec
+
+// A partition that reads partial data of one reducer, from `startMapIndex` 
(inclusive) to
+// `endMapIndex` (exclusive).
+case class PartialReducerPartitionSpec(
+  reducerIndex: Int, startMapIndex: Int, endMapIndex: Int) extends 
ShufflePartitionSpec
+
+// A partition that reads partial data of one mapper, from `startReducerIndex` 
(inclusive) to
+// `endReducerIndex` (exclusive).
+case class PartialMapperPartitionSpec(
+  mapIndex: Int, startReducerIndex: Int, endReducerIndex: Int) extends 
ShufflePartitionSpec
+
 /**
- * The [[Partition]] used by [[ShuffledRowRDD]]. A post-shuffle partition
- * (identified by `postShufflePartitionIndex`) contains a range of pre-shuffle 
partitions
- * (`startPreShufflePartitionIndex` to `endPreShufflePartitionIndex - 1`, 
inclusive).
+ * The [[Partition]] used by [[ShuffledRowRDD]].
  */
-private final class ShuffledRowRDDPartition(
-val postShufflePartitionIndex: Int,
-val startPreShufflePartitionIndex: Int,
-val endPreShufflePartitionIndex: Int) extends Partition {
-  override val index: Int = postShufflePartitionIndex
-}
+private final case class ShuffledRowRDDPartition(
+  index: Int, spec: ShufflePartitionSpec) extends Partition
 
 /**
  * A dummy partitioner for use with records whose partition ids have been 
pre-computed (i.e. for
@@ -94,8 +105,7 @@ class CoalescedPartitioner(val parent: Partitioner, val 
partitionStartIndices: A
  * interfaces / internals.
  *
  * This RDD takes a [[ShuffleDependency]] (`dependency`),
- * and an optional array of partition start indices as input arguments
- * (`specifiedPartitionStartIndices`).
+ * and an array of [[ShufflePartitionSpec]] as input arguments.
  *
  * The `dependency` has the parent RDD of this RDD, which represents the 
dataset before shuffle
  * (i.e. map output). Elements of this RDD are (partitionId, Row) pairs.
@@ -103,79 +113,97 @@ class CoalescedPartitioner(val parent: Partitioner

[spark] branch master updated (f0010c8 -> 473a28c)

2020-03-02 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f0010c8  [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests
 add 473a28c  [SPARK-30991] Refactor AQE readers and RDDs

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/ShuffledRowRDD.scala   | 142 -
 .../apache/spark/sql/execution/SparkPlanInfo.scala |   2 +-
 .../adaptive/CustomShuffleReaderExec.scala |  81 
 .../execution/adaptive/CustomShuffledRowRDD.scala  | 113 
 .../execution/adaptive/LocalShuffledRowRDD.scala   | 112 
 .../adaptive/OptimizeLocalShuffleReader.scala  |  88 +++--
 .../execution/adaptive/OptimizeSkewedJoin.scala|  72 ++-
 .../adaptive/ReduceNumShufflePartitions.scala  |  49 ++-
 .../adaptive/ShufflePartitionsCoalescer.scala  |  23 ++--
 .../execution/exchange/ShuffleExchangeExec.scala   |  12 +-
 .../ReduceNumShufflePartitionsSuite.scala  |  28 ++--
 .../ShufflePartitionsCoalescerSuite.scala  | 101 ++-
 .../adaptive/AdaptiveQueryExecSuite.scala  |  23 ++--
 13 files changed, 317 insertions(+), 529 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffledRowRDD.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LocalShuffledRowRDD.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30918][SQL] improve the splitting of skewed partitions

2020-02-25 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b968cd3  [SPARK-30918][SQL] improve the splitting of skewed partitions
b968cd3 is described below

commit b968cd37796a5730fe5c2318d23a38416f550957
Author: Wenchen Fan 
AuthorDate: Tue Feb 25 14:10:29 2020 -0800

[SPARK-30918][SQL] improve the splitting of skewed partitions

### What changes were proposed in this pull request?

Use the average size of the non-skewed partitions as the target size when 
splitting skewed partitions, instead of 
ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD

### Why are the changes needed?

The goal of skew join optimization is to make the data distribution move 
even. So it makes more sense the use the average size of the non-skewed 
partitions as the target size.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #27669 from cloud-fan/aqe.

Authored-by: Wenchen Fan 
Signed-off-by: Xiao Li 
(cherry picked from commit 8f247e5d3682ad765bdbb9ea5a4315862c5a383c)
Signed-off-by: Xiao Li 
---
 .../org/apache/spark/sql/internal/SQLConf.scala| 10 +---
 .../execution/adaptive/OptimizeSkewedJoin.scala| 62 ++
 .../adaptive/AdaptiveQueryExecSuite.scala  |  4 +-
 3 files changed, 54 insertions(+), 22 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 674c6df..e6f7cfd 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -432,19 +432,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
-  val ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD =
-
buildConf("spark.sql.adaptive.skewedJoinOptimization.skewedPartitionSizeThreshold")
-  .doc("Configures the minimum size in bytes for a partition that is 
considered as a skewed " +
-"partition in adaptive skewed join.")
-  .bytesConf(ByteUnit.BYTE)
-  .createWithDefaultString("64MB")
-
   val ADAPTIVE_EXECUTION_SKEWED_PARTITION_FACTOR =
 
buildConf("spark.sql.adaptive.skewedJoinOptimization.skewedPartitionFactor")
   .doc("A partition is considered as a skewed partition if its size is 
larger than" +
 " this factor multiple the median partition size and also larger than 
" +
-s" ${ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD.key}")
+s" ${SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key}")
   .intConf
+  .checkValue(_ > 0, "The skew factor must be positive.")
   .createWithDefault(10)
 
   val NON_EMPTY_PARTITION_RATIO_FOR_BROADCAST_JOIN =
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
index 578d2d7..d3cb864 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
@@ -34,6 +34,30 @@ import 
org.apache.spark.sql.execution.exchange.{EnsureRequirements, ShuffleExcha
 import org.apache.spark.sql.execution.joins.SortMergeJoinExec
 import org.apache.spark.sql.internal.SQLConf
 
+/**
+ * A rule to optimize skewed joins to avoid straggler tasks whose share of 
data are significantly
+ * larger than those of the rest of the tasks.
+ *
+ * The general idea is to divide each skew partition into smaller partitions 
and replicate its
+ * matching partition on the other side of the join so that they can run in 
parallel tasks.
+ * Note that when matching partitions from the left side and the right side 
both have skew,
+ * it will become a cartesian product of splits from left and right joining 
together.
+ *
+ * For example, assume the Sort-Merge join has 4 partitions:
+ * left:  [L1, L2, L3, L4]
+ * right: [R1, R2, R3, R4]
+ *
+ * Let's say L2, L4 and R3, R4 are skewed, and each of them get split into 2 
sub-partitions. This
+ * is scheduled to run 4 tasks at the beginning: (L1, R1), (L2, R2), (L2, R2), 
(L2, R2).
+ * This rule expands it to 9 tasks to increase parallelism:
+ * (L1, R1),
+ * (L2-1, R2), (L2-2, R2),
+ * (L3, R3-1), (L3, R3-2),
+ * (L4-1, R4-1), (L4-2, R4-1), (L4-1, R4-2), (L4-2, R4-2)
+ *
+ * Note that, when this rule is enabled, it also coalesces non-skewed 
partitions like
+ * `ReduceNumShufflePartitions` does.
+ */
 case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[Spa

[spark] branch master updated (e086a78 -> 8f247e5)

2020-02-25 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e086a78  [MINOR][ML] ML cleanup
 add 8f247e5  [SPARK-30918][SQL] improve the splitting of skewed partitions

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala| 10 +---
 .../execution/adaptive/OptimizeSkewedJoin.scala| 62 ++
 .../adaptive/AdaptiveQueryExecSuite.scala  |  4 +-
 3 files changed, 54 insertions(+), 22 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e086a78 -> 8f247e5)

2020-02-25 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e086a78  [MINOR][ML] ML cleanup
 add 8f247e5  [SPARK-30918][SQL] improve the splitting of skewed partitions

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala| 10 +---
 .../execution/adaptive/OptimizeSkewedJoin.scala| 62 ++
 .../adaptive/AdaptiveQueryExecSuite.scala  |  4 +-
 3 files changed, 54 insertions(+), 22 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30779][SS] Fix some API issues found when reviewing Structured Streaming API docs

2020-02-10 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 45d834c  [SPARK-30779][SS] Fix some API issues found when reviewing 
Structured Streaming API docs
45d834c is described below

commit 45d834cb8cc2c30f902d0dec1cdf561b993521d0
Author: Shixiong Zhu 
AuthorDate: Mon Feb 10 14:26:14 2020 -0800

[SPARK-30779][SS] Fix some API issues found when reviewing Structured 
Streaming API docs

### What changes were proposed in this pull request?

- Fix the scope of `Logging.initializeForcefully` so that it doesn't appear 
in subclasses' public methods. Right now, `sc.initializeForcefully(false, 
false)` is allowed to called.
- Don't show classes under `org.apache.spark.internal` package in API docs.
- Add missing `since` annotation.
- Fix the scope of `ArrowUtils` to remove it from the API docs.

### Why are the changes needed?

Avoid leaking APIs unintentionally in Spark 3.0.0.

### Does this PR introduce any user-facing change?

No. All these changes are to avoid leaking APIs unintentionally in Spark 
3.0.0.

### How was this patch tested?

Manually generated the API docs and verified the above issues have been 
fixed.

Closes #27528 from zsxwing/audit-ss-apis.

Authored-by: Shixiong Zhu 
Signed-off-by: Xiao Li 
---
 core/src/main/scala/org/apache/spark/internal/Logging.scala | 2 +-
 project/SparkBuild.scala| 1 +
 .../spark/sql/connector/read/streaming/ContinuousPartitionReader.java   | 2 ++
 .../sql/connector/read/streaming/ContinuousPartitionReaderFactory.java  | 2 ++
 .../org/apache/spark/sql/connector/read/streaming/ContinuousStream.java | 2 ++
 .../org/apache/spark/sql/connector/read/streaming/MicroBatchStream.java | 2 ++
 .../main/java/org/apache/spark/sql/connector/read/streaming/Offset.java | 2 ++
 .../org/apache/spark/sql/connector/read/streaming/PartitionOffset.java  | 2 ++
 .../java/org/apache/spark/sql/connector/read/streaming/ReadLimit.java   | 1 +
 .../org/apache/spark/sql/connector/read/streaming/SparkDataStream.java  | 2 ++
 .../spark/sql/connector/write/streaming/StreamingDataWriterFactory.java | 2 ++
 .../org/apache/spark/sql/connector/write/streaming/StreamingWrite.java  | 2 ++
 sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala  | 2 +-
 13 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/Logging.scala 
b/core/src/main/scala/org/apache/spark/internal/Logging.scala
index 2e4846b..0c1d963 100644
--- a/core/src/main/scala/org/apache/spark/internal/Logging.scala
+++ b/core/src/main/scala/org/apache/spark/internal/Logging.scala
@@ -117,7 +117,7 @@ trait Logging {
   }
 
   // For testing
-  def initializeForcefully(isInterpreter: Boolean, silent: Boolean): Unit = {
+  private[spark] def initializeForcefully(isInterpreter: Boolean, silent: 
Boolean): Unit = {
 initializeLogging(isInterpreter, silent)
   }
 
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 707c31d..9d0af3a 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -819,6 +819,7 @@ object Unidoc {
   .map(_.filterNot(_.getName.contains("$")))
   .map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/deploy")))
   
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/examples")))
+  
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/internal")))
   .map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/memory")))
   
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/network")))
   .map(_.filterNot(f =>
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReader.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReader.java
index 8bd5273..c2ad9ec 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReader.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReader.java
@@ -22,6 +22,8 @@ import org.apache.spark.sql.connector.read.PartitionReader;
 
 /**
  * A variation on {@link PartitionReader} for use with continuous streaming 
processing.
+ *
+ * @since 3.0.0
  */
 @Evolving
 public interface ContinuousPartitionReader extends PartitionReader {
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReaderFactory.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ContinuousPartitionReaderFactory.java
index 962864d..385c6f6 100644
--- 
a

[spark] branch master updated (a6b91d2 -> e2ebca7)

2020-02-10 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a6b91d2  [SPARK-30556][SQL][FOLLOWUP] Reset the status changed in 
SQLExecution.withThreadLocalCaptured
 add e2ebca7  [SPARK-30779][SS] Fix some API issues found when reviewing 
Structured Streaming API docs

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/Logging.scala | 2 +-
 project/SparkBuild.scala| 1 +
 .../spark/sql/connector/read/streaming/ContinuousPartitionReader.java   | 2 ++
 .../sql/connector/read/streaming/ContinuousPartitionReaderFactory.java  | 2 ++
 .../org/apache/spark/sql/connector/read/streaming/ContinuousStream.java | 2 ++
 .../org/apache/spark/sql/connector/read/streaming/MicroBatchStream.java | 2 ++
 .../main/java/org/apache/spark/sql/connector/read/streaming/Offset.java | 2 ++
 .../org/apache/spark/sql/connector/read/streaming/PartitionOffset.java  | 2 ++
 .../java/org/apache/spark/sql/connector/read/streaming/ReadLimit.java   | 1 +
 .../org/apache/spark/sql/connector/read/streaming/SparkDataStream.java  | 2 ++
 .../spark/sql/connector/write/streaming/StreamingDataWriterFactory.java | 2 ++
 .../org/apache/spark/sql/connector/write/streaming/StreamingWrite.java  | 2 ++
 sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala  | 2 +-
 13 files changed, 22 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a6b91d2 -> e2ebca7)

2020-02-10 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a6b91d2  [SPARK-30556][SQL][FOLLOWUP] Reset the status changed in 
SQLExecution.withThreadLocalCaptured
 add e2ebca7  [SPARK-30779][SS] Fix some API issues found when reviewing 
Structured Streaming API docs

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/Logging.scala | 2 +-
 project/SparkBuild.scala| 1 +
 .../spark/sql/connector/read/streaming/ContinuousPartitionReader.java   | 2 ++
 .../sql/connector/read/streaming/ContinuousPartitionReaderFactory.java  | 2 ++
 .../org/apache/spark/sql/connector/read/streaming/ContinuousStream.java | 2 ++
 .../org/apache/spark/sql/connector/read/streaming/MicroBatchStream.java | 2 ++
 .../main/java/org/apache/spark/sql/connector/read/streaming/Offset.java | 2 ++
 .../org/apache/spark/sql/connector/read/streaming/PartitionOffset.java  | 2 ++
 .../java/org/apache/spark/sql/connector/read/streaming/ReadLimit.java   | 1 +
 .../org/apache/spark/sql/connector/read/streaming/SparkDataStream.java  | 2 ++
 .../spark/sql/connector/write/streaming/StreamingDataWriterFactory.java | 2 ++
 .../org/apache/spark/sql/connector/write/streaming/StreamingWrite.java  | 2 ++
 sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala  | 2 +-
 13 files changed, 22 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (acfdb46 -> 4439b29)

2020-02-10 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from acfdb46  [SPARK-27946][SQL][FOLLOW-UP] Change doc and error message 
for SHOW CREATE TABLE
 add 4439b29  Revert "[SPARK-30245][SQL] Add cache for Like and RLike when 
pattern is not static"

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/regexpExpressions.scala| 21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b877aac -> 9f8172e)

2020-02-09 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b877aac  [SPARK-30684 ][WEBUI][FollowUp] A new approach for SPARK-30684
 add 9f8172e  Revert "[SPARK-29721][SQL] Prune unnecessary nested fields 
from Generate without Project

No new revisions were added by this update.

Summary of changes:
 .../catalyst/optimizer/NestedColumnAliasing.scala  | 47 --
 .../spark/sql/catalyst/optimizer/Optimizer.scala   | 43 +++-
 .../execution/datasources/SchemaPruningSuite.scala | 32 ---
 3 files changed, 25 insertions(+), 97 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30719][SQL] do not log warning if AQE is intentionally skipped and add a config to force apply

2020-02-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b29cb1a  [SPARK-30719][SQL] do not log warning if AQE is intentionally 
skipped and add a config to force apply
b29cb1a is described below

commit b29cb1a82b1a1facf1dd040025db93d998dad4cd
Author: Wenchen Fan 
AuthorDate: Thu Feb 6 09:16:14 2020 -0800

[SPARK-30719][SQL] do not log warning if AQE is intentionally skipped and 
add a config to force apply

### What changes were proposed in this pull request?

Update `InsertAdaptiveSparkPlan` to not log warning if AQE is skipped 
intentionally.

This PR also add a config to not skip AQE.

### Why are the changes needed?

It's not a warning at all if we intentionally skip AQE.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

run `AdaptiveQueryExecSuite` locally and verify that there is no warning 
logs.

Closes #27452 from cloud-fan/aqe.

Authored-by: Wenchen Fan 
Signed-off-by: Xiao Li 
(cherry picked from commit 8ce58627ebe4f0372fba9a30d8cd4213611acd9b)
Signed-off-by: Xiao Li 
---
 .../org/apache/spark/sql/internal/SQLConf.scala|  9 +++
 .../adaptive/InsertAdaptiveSparkPlan.scala | 83 --
 .../adaptive/AdaptiveQueryExecSuite.scala  |  9 +++
 3 files changed, 65 insertions(+), 36 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index acc0922..bed8410 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -358,6 +358,15 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val ADAPTIVE_EXECUTION_FORCE_APPLY = 
buildConf("spark.sql.adaptive.forceApply")
+.internal()
+.doc("Adaptive query execution is skipped when the query does not have 
exchanges or " +
+  "sub-queries. By setting this config to true (together with " +
+  s"'${ADAPTIVE_EXECUTION_ENABLED.key}' enabled), Spark will force apply 
adaptive query " +
+  "execution for all supported queries.")
+.booleanConf
+.createWithDefault(false)
+
   val REDUCE_POST_SHUFFLE_PARTITIONS_ENABLED =
 buildConf("spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled")
   .doc(s"When true and '${ADAPTIVE_EXECUTION_ENABLED.key}' is enabled, 
this enables reducing " +
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala
index 9252827..621c063 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala
@@ -40,49 +40,60 @@ case class InsertAdaptiveSparkPlan(
 
   private val conf = adaptiveExecutionContext.session.sessionState.conf
 
-  def containShuffle(plan: SparkPlan): Boolean = {
-plan.find {
-  case _: Exchange => true
-  case s: SparkPlan => !s.requiredChildDistribution.forall(_ == 
UnspecifiedDistribution)
-}.isDefined
-  }
-
-  def containSubQuery(plan: SparkPlan): Boolean = {
-plan.find(_.expressions.exists(_.find {
-  case _: SubqueryExpression => true
-  case _ => false
-}.isDefined)).isDefined
-  }
-
   override def apply(plan: SparkPlan): SparkPlan = applyInternal(plan, false)
 
   private def applyInternal(plan: SparkPlan, isSubquery: Boolean): SparkPlan = 
plan match {
+case _ if !conf.adaptiveExecutionEnabled => plan
 case _: ExecutedCommandExec => plan
-case _ if conf.adaptiveExecutionEnabled && supportAdaptive(plan)
-  && (isSubquery || containShuffle(plan) || containSubQuery(plan)) =>
-  try {
-// Plan sub-queries recursively and pass in the shared stage cache for 
exchange reuse. Fall
-// back to non-adaptive mode if adaptive execution is supported in any 
of the sub-queries.
-val subqueryMap = buildSubqueryMap(plan)
-val planSubqueriesRule = PlanAdaptiveSubqueries(subqueryMap)
-val preprocessingRules = Seq(
-  planSubqueriesRule)
-// Run pre-processing rules.
-val newPlan = AdaptiveSparkPlanExec.applyPhysicalRules(plan, 
preprocessingRules)
-logDebug(s"Adaptive execution enabled for plan: $plan")
-AdaptiveSparkPlanExec(newPlan, adaptiveExecutionContext, 
preprocessingRules, isSubquery)
-  } catch {
-case SubqueryAd

[spark] branch master updated (d861357 -> 8ce5862)

2020-02-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d861357  [SPARK-26700][CORE][FOLLOWUP] Add config 
`spark.network.maxRemoteBlockSizeFetchToMem`
 add 8ce5862  [SPARK-30719][SQL] do not log warning if AQE is intentionally 
skipped and add a config to force apply

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|  9 +++
 .../adaptive/InsertAdaptiveSparkPlan.scala | 83 --
 .../adaptive/AdaptiveQueryExecSuite.scala  |  9 +++
 3 files changed, 65 insertions(+), 36 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d861357 -> 8ce5862)

2020-02-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d861357  [SPARK-26700][CORE][FOLLOWUP] Add config 
`spark.network.maxRemoteBlockSizeFetchToMem`
 add 8ce5862  [SPARK-30719][SQL] do not log warning if AQE is intentionally 
skipped and add a config to force apply

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|  9 +++
 .../adaptive/InsertAdaptiveSparkPlan.scala | 83 --
 .../adaptive/AdaptiveQueryExecSuite.scala  |  9 +++
 3 files changed, 65 insertions(+), 36 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 created (now da32d1e)

2020-02-01 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at da32d1e  [SPARK-30700][ML] NaiveBayesModel predict optimization

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 created (now da32d1e)

2020-02-01 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at da32d1e  [SPARK-30700][ML] NaiveBayesModel predict optimization

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d0c3e9f -> 8eecc20)

2020-01-31 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d0c3e9f  [SPARK-30660][ML][PYSPARK] LinearRegression blockify input 
vectors
 add 8eecc20  [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING 
"show create table"

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|   2 +
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   2 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |   2 +-
 .../sql/catalyst/plans/logical/statements.scala|   4 +-
 .../catalyst/analysis/ResolveSessionCatalog.scala  |   6 +-
 .../spark/sql/execution/command/tables.scala   | 285 --
 .../org/apache/spark/sql/internal/HiveSerDe.scala  |  16 +
 .../sql-tests/inputs/show-create-table.sql |  11 +-
 .../sql-tests/results/show-create-table.sql.out|  34 ++-
 .../apache/spark/sql/ShowCreateTableSuite.scala|  16 +-
 .../spark/sql/hive/HiveShowCreateTableSuite.scala  | 327 -
 11 files changed, 581 insertions(+), 124 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2d4b5ea -> 82b4f75)

2020-01-31 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2d4b5ea  [SPARK-30676][CORE][TESTS] Eliminate warnings from deprecated 
constructors of java.lang.Integer and java.lang.Double
 add 82b4f75  [SPARK-30508][SQL] Add SparkSession.executeCommand API for 
external datasource

No new revisions were added by this update.

Summary of changes:
 ...upportsRead.java => ExternalCommandRunner.java} | 30 +++--
 .../scala/org/apache/spark/sql/SparkSession.scala  | 31 +-
 .../spark/sql/execution/command/commands.scala | 30 ++---
 .../sql/sources/ExternalCommandRunnerSuite.scala   | 50 ++
 4 files changed, 120 insertions(+), 21 deletions(-)
 copy 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/{catalog/SupportsRead.java
 => ExternalCommandRunner.java} (51%)
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/sources/ExternalCommandRunnerSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4847f73 -> 3f76bd4)

2020-01-23 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4847f73  [SPARK-30298][SQL] Respect aliases in output partitioning of 
projects and aggregates
 add 3f76bd4  [SPARK-27083][SQL][FOLLOW-UP] Rename spark.sql.subquery.reuse 
to spark.sql.execution.subquery.reuse.enabled

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d2bca8f -> db528e4)

2020-01-22 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d2bca8f  [SPARK-30609] Allow default merge command resolution to be 
bypassed by DSv2 tables
 add db528e4  [SPARK-30535][SQL] Revert "[] Migrate ALTER TABLE commands to 
the new framework

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala |  25 +--
 .../sql/catalyst/analysis/CheckAnalysis.scala  |  41 ++---
 .../sql/catalyst/analysis/ResolveCatalogs.scala|  67 +++-
 .../spark/sql/catalyst/analysis/unresolved.scala   |  23 +++
 .../sql/catalyst/analysis/v2ResolutionPlans.scala  |  14 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  50 +++---
 .../sql/catalyst/plans/logical/statements.scala|  56 +++
 .../sql/catalyst/plans/logical/v2Commands.scala| 138 +++-
 .../sql/connector/catalog/CatalogV2Util.scala  |  14 +-
 .../spark/sql/catalyst/parser/DDLParserSuite.scala |  90 +--
 .../catalyst/analysis/ResolveSessionCatalog.scala  | 178 +++--
 .../spark/sql/execution/command/tables.scala   |   8 +
 .../datasources/v2/DataSourceV2Strategy.scala  |  14 +-
 .../sql-tests/results/change-column.sql.out|   4 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |   2 +-
 .../apache/spark/sql/execution/SQLViewSuite.scala  |   8 +-
 .../spark/sql/execution/command/DDLSuite.scala |   5 +-
 .../execution/command/PlanResolutionSuite.scala|  47 +++---
 .../sql/hive/execution/HiveCommandSuite.scala  |   2 +-
 19 files changed, 462 insertions(+), 324 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d2bca8f -> db528e4)

2020-01-22 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d2bca8f  [SPARK-30609] Allow default merge command resolution to be 
bypassed by DSv2 tables
 add db528e4  [SPARK-30535][SQL] Revert "[] Migrate ALTER TABLE commands to 
the new framework

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala |  25 +--
 .../sql/catalyst/analysis/CheckAnalysis.scala  |  41 ++---
 .../sql/catalyst/analysis/ResolveCatalogs.scala|  67 +++-
 .../spark/sql/catalyst/analysis/unresolved.scala   |  23 +++
 .../sql/catalyst/analysis/v2ResolutionPlans.scala  |  14 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  50 +++---
 .../sql/catalyst/plans/logical/statements.scala|  56 +++
 .../sql/catalyst/plans/logical/v2Commands.scala| 138 +++-
 .../sql/connector/catalog/CatalogV2Util.scala  |  14 +-
 .../spark/sql/catalyst/parser/DDLParserSuite.scala |  90 +--
 .../catalyst/analysis/ResolveSessionCatalog.scala  | 178 +++--
 .../spark/sql/execution/command/tables.scala   |   8 +
 .../datasources/v2/DataSourceV2Strategy.scala  |  14 +-
 .../sql-tests/results/change-column.sql.out|   4 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |   2 +-
 .../apache/spark/sql/execution/SQLViewSuite.scala  |   8 +-
 .../spark/sql/execution/command/DDLSuite.scala |   5 +-
 .../execution/command/PlanResolutionSuite.scala|  47 +++---
 .../sql/hive/execution/HiveCommandSuite.scala  |   2 +-
 19 files changed, 462 insertions(+), 324 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8e280ce -> 6dfaa07)

2020-01-22 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8e280ce  [SPARK-30592][SQL] Interval support for csv and json funtions
 add 6dfaa07  [SPARK-30549][SQL] Fix the subquery shown issue in UI When 
enable AQE

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 51 +-
 .../sql/execution/ui/SQLAppStatusListener.scala|  9 
 .../spark/sql/execution/ui/SQLListener.scala   |  6 +++
 3 files changed, 55 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8a926e4 -> 883ae33)

2020-01-15 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8a926e4  [SPARK-26736][SQL] Partition pruning through nondeterministic 
expressions in Hive tables
 add 883ae33  [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  2 +-
 .../spark/sql/catalyst/analysis/Analyzer.scala | 45 -
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 10 +--
 .../sql/catalyst/analysis/ResolveCatalogs.scala|  8 ---
 .../spark/sql/catalyst/analysis/namespace.scala| 33 --
 .../apache/spark/sql/catalyst/analysis/table.scala | 33 --
 .../sql/catalyst/analysis/v2ResolutionPlans.scala  | 76 ++
 .../spark/sql/catalyst/parser/AstBuilder.scala |  8 +--
 .../sql/catalyst/plans/logical/statements.scala|  8 ---
 .../sql/catalyst/plans/logical/v2Commands.scala| 12 ++--
 .../sql/connector/catalog/CatalogV2Implicits.scala |  8 +++
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 10 +--
 .../catalyst/analysis/ResolveSessionCatalog.scala  | 25 ++-
 .../datasources/v2/DataSourceV2Strategy.scala  |  9 ++-
 .../datasources/v2/V2SessionCatalog.scala  |  2 +-
 .../resources/sql-tests/results/describe.sql.out   |  3 +-
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  2 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  5 ++
 .../spark/sql/execution/SparkSqlParserSuite.scala  |  4 +-
 .../execution/command/PlanResolutionSuite.scala| 62 --
 .../sql/hive/execution/HiveComparisonTest.scala|  2 +-
 21 files changed, 186 insertions(+), 181 deletions(-)
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/namespace.scala
 delete mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/table.scala
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c49abf8 -> af2d3d0)

2020-01-08 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c49abf8  [SPARK-30417][CORE] Task speculation numTaskThreshold should 
be greater than 0 even EXECUTOR_CORES is not set under Standalone mode
 add af2d3d0  [SPARK-30315][SQL] Add adaptive execution context

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/QueryExecution.scala   |  4 +--
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 42 +++---
 .../adaptive/InsertAdaptiveSparkPlan.scala | 20 ---
 3 files changed, 38 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e645125 -> be4faaf)

2020-01-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e645125  [SPARK-30267][SQL] Avro arrays can be of any List
 add be4faaf  Revert "[SPARK-23264][SQL] Make INTERVAL keyword optional 
when ANSI enabled"

No new revisions were added by this update.

Summary of changes:
 docs/sql-keywords.md   |  14 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  52 +---
 .../catalyst/parser/ExpressionParserSuite.scala|  36 +
 .../parser/TableIdentifierParserSuite.scala|  28 +---
 .../resources/sql-tests/inputs/ansi/interval.sql   |  18 +--
 .../sql-tests/results/ansi/interval.sql.out| 148 +
 .../resources/sql-tests/results/interval.sql.out   |   8 +-
 7 files changed, 16 insertions(+), 288 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e645125 -> be4faaf)

2020-01-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e645125  [SPARK-30267][SQL] Avro arrays can be of any List
 add be4faaf  Revert "[SPARK-23264][SQL] Make INTERVAL keyword optional 
when ANSI enabled"

No new revisions were added by this update.

Summary of changes:
 docs/sql-keywords.md   |  14 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  52 +---
 .../catalyst/parser/ExpressionParserSuite.scala|  36 +
 .../parser/TableIdentifierParserSuite.scala|  28 +---
 .../resources/sql-tests/inputs/ansi/interval.sql   |  18 +--
 .../sql-tests/results/ansi/interval.sql.out| 148 +
 .../resources/sql-tests/results/interval.sql.out   |   8 +-
 7 files changed, 16 insertions(+), 288 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Update the code freeze date of SPARK 3.0 (#247)

2020-01-02 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 03347c3  Update the code freeze date of SPARK 3.0 (#247)
03347c3 is described below

commit 03347c31d283d86c7f3c7fe046678f7f0c0603da
Author: Xiao Li 
AuthorDate: Thu Jan 2 16:03:55 2020 -0800

Update the code freeze date of SPARK 3.0 (#247)

This PR is to update the code freeze date of SPARK 3.0 based on the 
[discussion](http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-3-0-branch-cut-and-code-freeze-on-Jan-31-td28575.html)
 in the mailing list
---
 site/versioning-policy.html | 6 +++---
 versioning-policy.md| 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/site/versioning-policy.html b/site/versioning-policy.html
index 9336430..d9d98bd 100644
--- a/site/versioning-policy.html
+++ b/site/versioning-policy.html
@@ -266,15 +266,15 @@ in between feature releases. Major releases do not happen 
according to a fixed s
   Preview release
 
 
-  Early Dec 2019
+  01/31/2020
   Code freeze. Release branch cut.
 
 
-  Late Dec 2019
+  Early Feb 2020
   QA period. Focus on bug fixes, tests, stability and docs. Generally, 
no new features merged.
 
 
-  Jan 2020
+  Mid Feb 2020
   Release candidates (RC), voting, etc. until final release passes
 
   
diff --git a/versioning-policy.md b/versioning-policy.md
index c8ae5ce..8037a59 100644
--- a/versioning-policy.md
+++ b/versioning-policy.md
@@ -61,9 +61,9 @@ in between feature releases. Major releases do not happen 
according to a fixed s
 | Date  | Event |
 | - | - |
 | Late Oct 2019 | Preview release |
-| Early Dec 2019 | Code freeze. Release branch cut.|
-| Late Dec 2019 | QA period. Focus on bug fixes, tests, stability and docs. 
Generally, no new features merged.|
-| Jan 2020 | Release candidates (RC), voting, etc. until final release passes|
+| 01/31/2020 | Code freeze. Release branch cut.|
+| Early Feb 2020 | QA period. Focus on bug fixes, tests, stability and docs. 
Generally, no new features merged.|
+| Mid Feb 2020 | Release candidates (RC), voting, etc. until final release 
passes|
 
 Maintenance Releases and EOL
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (724dcf0 -> 919d551)

2019-12-29 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 724dcf0  [SPARK-30342][SQL][DOC] Update LIST FILE/JAR command 
Documentation
 add 919d551  Revert "[SPARK-29390][SQL] Add the justify_days(), 
justify_hours() and justif_interval() functions"

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/FunctionRegistry.scala   |   3 -
 .../catalyst/expressions/intervalExpressions.scala |  68 ---
 .../spark/sql/catalyst/util/IntervalUtils.scala|  36 --
 .../sql/catalyst/util/IntervalUtilsSuite.scala |  25 -
 .../test/resources/sql-tests/inputs/interval.sql   |  14 -
 .../sql-tests/inputs/postgreSQL/interval.sql   |   8 +-
 .../sql-tests/results/ansi/interval.sql.out| 570 +
 .../resources/sql-tests/results/interval.sql.out   | 498 --
 .../sql-tests/results/postgreSQL/interval.sql.out  | 186 +++
 9 files changed, 524 insertions(+), 884 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Update the release note of Spark 3.0 preview-2 (#246)

2019-12-24 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new e0c5ca5  Update the release note of Spark 3.0 preview-2  (#246)
e0c5ca5 is described below

commit e0c5ca50df47227d890106d8a3ab33af005b0a87
Author: Xiao Li 
AuthorDate: Tue Dec 24 15:21:27 2019 -0800

Update the release note of Spark 3.0 preview-2  (#246)

This PR is to address the comments in 
https://github.com/apache/spark-website/pull/245 and update the news of Spark 
3.0 preview-2 release.
---
 news/_posts/2019-12-23-spark-3.0.0-preview2.md | 4 ++--
 site/news/spark-3.0.0-preview2.html| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/news/_posts/2019-12-23-spark-3.0.0-preview2.md 
b/news/_posts/2019-12-23-spark-3.0.0-preview2.md
index d6ae930..63af801 100644
--- a/news/_posts/2019-12-23-spark-3.0.0-preview2.md
+++ b/news/_posts/2019-12-23-spark-3.0.0-preview2.md
@@ -11,6 +11,6 @@ meta:
   _edit_last: '4'
   _wpas_done_all: '1'
 ---
-To enable wide-scale community testing of the upcoming Spark 3.0 release, the 
Apache Spark community has posted a https://archive.apache.org/dist/spark/spark-3.0.0-preview2/;>Spark 3.0.0 
preview2 released. This preview is not a stable release in terms of 
either API or functionality, but it is meant to give the community early 
access to try the code that will become Spark 3.0. If you would like to test 
the release, please download it, and send feedback using either the [...]
+To enable wide-scale community testing of the upcoming Spark 3.0 release, the 
Apache Spark community has posted a https://archive.apache.org/dist/spark/spark-3.0.0-preview2/;>Spark 3.0.0 
preview2 release. This preview is not a stable release in terms of 
either API or functionality, but it is meant to give the community early 
access to try the code that will become Spark 3.0. If you would like to test 
the release, please download it, and send feedback using either the  [...]
 
-The Spark issue tracker already contains a list of https://issues.apache.org/jira/browse/SPARK-26078?jql=statusCategory%20%3D%20done%20AND%20project%20%3D%2012315420%20AND%20fixVersion%20%3D%2012339177%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC;>features
 in 3.0.
\ No newline at end of file
+The Spark issue tracker already contains a list of https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-printable/temp/SearchRequest.html?jqlQuery=statusCategory+%3D+done+AND+project+%3D+12315420+AND+fixVersion+%3D+12339177+ORDER+BY+priority+DESC%2C+key+ASC=1000;>features
 in 3.0.
\ No newline at end of file
diff --git a/site/news/spark-3.0.0-preview2.html 
b/site/news/spark-3.0.0-preview2.html
index 7cca75e..a6ebb52 100644
--- a/site/news/spark-3.0.0-preview2.html
+++ b/site/news/spark-3.0.0-preview2.html
@@ -203,9 +203,9 @@
 Preview release of Spark 3.0
 
 
-To enable wide-scale community testing of the upcoming Spark 3.0 release, 
the Apache Spark community has posted a https://archive.apache.org/dist/spark/spark-3.0.0-preview2/;>Spark 3.0.0 
preview2 released. This preview is not a stable release in terms of 
either API or functionality, but it is meant to give the community early 
access to try the code that will become Spark 3.0. If you would like to test 
the release, please download it, and send feedback using either  [...]
+To enable wide-scale community testing of the upcoming Spark 3.0 release, 
the Apache Spark community has posted a https://archive.apache.org/dist/spark/spark-3.0.0-preview2/;>Spark 3.0.0 
preview2 release. This preview is not a stable release in terms of 
either API or functionality, but it is meant to give the community early 
access to try the code that will become Spark 3.0. If you would like to test 
the release, please download it, and send feedback using either t [...]
 
-The Spark issue tracker already contains a list of https://issues.apache.org/jira/browse/SPARK-26078?jql=statusCategory%20%3D%20done%20AND%20project%20%3D%2012315420%20AND%20fixVersion%20%3D%2012339177%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC;>features
 in 3.0.
+The Spark issue tracker already contains a list of https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-printable/temp/SearchRequest.html?jqlQuery=statusCategory+%3D+done+AND+project+%3D+12315420+AND+fixVersion+%3D+12339177+ORDER+BY+priority+DESC%2C+key+ASCtempMax=1000;>features
 in 3.0.
 
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (18e8d1d -> a296d15)

2019-12-20 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 18e8d1d  [SPARK-30307][SQL] remove ReusedQueryStageExec
 add a296d15  [SPARK-30291] catch the exception when doing materialize in 
AQE

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 18 ++--
 .../adaptive/AdaptiveQueryExecSuite.scala  | 25 ++
 2 files changed, 36 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (18e8d1d -> a296d15)

2019-12-20 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 18e8d1d  [SPARK-30307][SQL] remove ReusedQueryStageExec
 add a296d15  [SPARK-30291] catch the exception when doing materialize in 
AQE

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 18 ++--
 .../adaptive/AdaptiveQueryExecSuite.scala  | 25 ++
 2 files changed, 36 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (726f6d3 -> 18e8d1d)

2019-12-19 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 726f6d3  [SPARK-30184][SQL] Implement a helper method for aliasing 
functions
 add 18e8d1d  [SPARK-30307][SQL] remove ReusedQueryStageExec

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  17 +--
 .../adaptive/DemoteBroadcastHashJoin.scala |   4 +-
 .../adaptive/LogicalQueryStageStrategy.scala   |   4 +-
 .../adaptive/OptimizeLocalShuffleReader.scala  |  56 ++
 .../sql/execution/adaptive/QueryStageExec.scala| 116 -
 .../adaptive/ReduceNumShufflePartitions.scala  |  22 ++--
 .../spark/sql/execution/exchange/Exchange.scala|   2 +-
 .../ReduceNumShufflePartitionsSuite.scala  |   9 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  |   9 +-
 9 files changed, 108 insertions(+), 131 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9cd174a -> 9459833)

2019-11-27 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9cd174a  Revert "[SPARK-28461][SQL] Pad Decimal numbers with trailing 
zeros to the scale of the column"
 add 9459833  [SPARK-29989][INFRA] Add `hadoop-2.7/hive-2.3` pre-built 
distribution

No new revisions were added by this update.

Summary of changes:
 dev/create-release/release-build.sh | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2dd6807 -> 6e581cf)

2019-11-22 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2dd6807  [SPARK-28023][SQL] Add trim logic in UTF8String's 
toInt/toLong to make it consistent with other string-numeric casting
 add 6e581cf  [SPARK-29893][SQL][FOLLOWUP] code cleanup for local shuffle 
reader

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|  4 +-
 .../execution/adaptive/LocalShuffledRowRDD.scala   | 32 ++-
 .../adaptive/OptimizeLocalShuffleReader.scala  | 98 +++---
 .../execution/exchange/ShuffleExchangeExec.scala   |  4 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 13 ++-
 5 files changed, 87 insertions(+), 64 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2dd6807 -> 6e581cf)

2019-11-22 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2dd6807  [SPARK-28023][SQL] Add trim logic in UTF8String's 
toInt/toLong to make it consistent with other string-numeric casting
 add 6e581cf  [SPARK-29893][SQL][FOLLOWUP] code cleanup for local shuffle 
reader

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|  4 +-
 .../execution/adaptive/LocalShuffledRowRDD.scala   | 32 ++-
 .../adaptive/OptimizeLocalShuffleReader.scala  | 98 +++---
 .../execution/exchange/ShuffleExchangeExec.scala   |  4 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 13 ++-
 5 files changed, 87 insertions(+), 64 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6fb8b86 -> 3d2a6f4)

2019-11-19 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6fb8b86  [SPARK-29913][SQL] Improve Exception in postgreCastToBoolean
 add 3d2a6f4  [SPARK-29906][SQL] AQE should not introduce extra shuffle for 
outermost limit

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 23 ++
 .../adaptive/AdaptiveQueryExecSuite.scala  | 21 
 2 files changed, 36 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6fb8b86 -> 3d2a6f4)

2019-11-19 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6fb8b86  [SPARK-29913][SQL] Improve Exception in postgreCastToBoolean
 add 3d2a6f4  [SPARK-29906][SQL] AQE should not introduce extra shuffle for 
outermost limit

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 23 ++
 .../adaptive/AdaptiveQueryExecSuite.scala  | 21 
 2 files changed, 36 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7cfd589 -> 1e2d76e)

2019-11-08 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7cfd589  [SPARK-28893][SQL] Support MERGE INTO in the parser and add 
the corresponding logical plan
 add 1e2d76e  [HOT-FIX] Fix the SQLBase.g4

No new revisions were added by this update.

Summary of changes:
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4   | 4 
 1 file changed, 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (782992c -> 1f3863c)

2019-11-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 782992c  [SPARK-29642][SS] Change the element type of underlying array 
to UnsafeRow for ContinuousRecordEndpoint
 add 1f3863c  [SPARK-29759][SQL] LocalShuffleReaderExec.outputPartitioning 
should use the corrected attributes

No new revisions were added by this update.

Summary of changes:
 .../adaptive/OptimizeLocalShuffleReader.scala  | 36 +-
 .../sql/execution/adaptive/QueryStageExec.scala|  8 +++--
 .../adaptive/ReduceNumShufflePartitions.scala  | 11 +--
 3 files changed, 28 insertions(+), 27 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4615769 -> 4110153)

2019-11-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4615769  [SPARK-29603][YARN] Support application priority for YARN 
priority scheduling
 add 4110153  [SPARK-29752][SQL][TEST] make AdaptiveQueryExecSuite more 
robust

No new revisions were added by this update.

Summary of changes:
 .../adaptive/AdaptiveQueryExecSuite.scala  | 69 +++---
 1 file changed, 34 insertions(+), 35 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (948a6e8 -> ef1e849)

2019-10-08 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 948a6e8  [SPARK-28892][SQL][FOLLOWUP] add resolved logical plan for 
UPDATE TABLE
 add ef1e849  [SPARK-29366][SQL] Subqueries created for DPP are not printed 
in EXPLAIN FORMATTED

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/ExplainUtils.scala  |  4 +-
 .../scala/org/apache/spark/sql/ExplainSuite.scala  | 49 ++
 2 files changed, 51 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ffddfc8 -> 948a6e8)

2019-10-08 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ffddfc8  [SPARK-29269][PYTHON][ML] Pyspark ALSModel support 
getters/setters
 add 948a6e8  [SPARK-28892][SQL][FOLLOWUP] add resolved logical plan for 
UPDATE TABLE

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala |  4 +-
 .../sql/catalyst/analysis/CheckAnalysis.scala  |  8 +--
 .../sql/catalyst/analysis/ResolveCatalogs.scala|  8 ++-
 .../spark/sql/catalyst/expressions/literals.scala  | 10 +++
 .../plans/logical/basicLogicalOperators.scala  | 19 -
 .../plans/logical/sql/UpdateTableStatement.scala   |  2 +-
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |  2 +-
 .../spark/sql/execution/SparkStrategies.scala  |  2 +
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 76 +++-
 .../execution/command/PlanResolutionSuite.scala| 82 +++---
 10 files changed, 154 insertions(+), 59 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (aedf090a -> 8fabbab)

2019-10-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aedf090a [SPARK-25468][WEBUI][FOLLOWUP] Current page index keep style 
with dataTable in the spark UI
 add 8fabbab  [SPARK-29350] Fix BroadcastExchange reuse in Dynamic 
Partition Pruning

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/exchange/Exchange.scala| 18 +---
 .../spark/sql/DynamicPartitionPruningSuite.scala   | 25 +-
 2 files changed, 35 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (67d5b9b -> 3170011)

2019-09-29 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 67d5b9b  [SPARK-29172][SQL] Fix some exception issue of explain 
commands
 add 3170011  [SPARK-28476][SQL] Support ALTER DATABASE SET LOCATION

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  2 ++
 .../spark/sql/execution/SparkSqlParser.scala   | 16 ++
 .../apache/spark/sql/execution/command/ddl.scala   | 21 ++
 .../sql/execution/command/DDLParserSuite.scala |  9 
 .../spark/sql/execution/command/DDLSuite.scala | 25 +-
 .../spark/sql/hive/client/HiveClientImpl.scala |  7 ++
 .../spark/sql/hive/client/VersionsSuite.scala  | 17 +++
 7 files changed, 96 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (eee2e02 -> d3eb4c9)

2019-09-19 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from eee2e02  [SPARK-29165][SQL][TEST] Set log level of log generated code 
as ERROR in case of compile error on generated code in UT
 add d3eb4c9  [SPARK-28822][DOC][SQL] Document USE DATABASE in SQL Reference

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml|  2 ++
 docs/sql-ref-syntax-qry-select-usedb.md | 60 +
 2 files changed, 62 insertions(+)
 create mode 100644 docs/sql-ref-syntax-qry-select-usedb.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a6a663c -> b917a65)

2019-09-18 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a6a663c  [SPARK-29141][SQL][TEST] Use SqlBasedBenchmark in SQL 
benchmarks
 add b917a65  [SPARK-28989][SQL] Add a SQLConf `spark.sql.ansi.enabled`

No new revisions were added by this update.

Summary of changes:
 docs/sql-keywords.md   |  8 ++---
 .../sql/catalyst/CatalystTypeConverters.scala  |  2 +-
 .../spark/sql/catalyst/SerializerBuildHelper.scala |  2 +-
 .../sql/catalyst/analysis/DecimalPrecision.scala   |  2 +-
 .../spark/sql/catalyst/encoders/RowEncoder.scala   |  2 +-
 .../spark/sql/catalyst/expressions/Cast.scala  |  8 ++---
 .../sql/catalyst/expressions/aggregate/Sum.scala   |  2 +-
 .../sql/catalyst/expressions/arithmetic.scala  |  4 +--
 .../catalyst/expressions/decimalExpressions.scala  |  2 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  2 +-
 .../spark/sql/catalyst/parser/ParseDriver.scala|  4 +--
 .../org/apache/spark/sql/internal/SQLConf.scala| 41 ++
 .../catalyst/encoders/ExpressionEncoderSuite.scala |  8 ++---
 .../sql/catalyst/encoders/RowEncoderSuite.scala|  4 +--
 .../expressions/ArithmeticExpressionSuite.scala| 24 ++---
 .../spark/sql/catalyst/expressions/CastSuite.scala | 12 +++
 .../expressions/DecimalExpressionSuite.scala   |  4 +--
 .../sql/catalyst/expressions/ScalaUDFSuite.scala   |  4 +--
 .../catalyst/parser/ExpressionParserSuite.scala| 10 +++---
 .../parser/TableIdentifierParserSuite.scala|  2 +-
 .../resources/sql-tests/inputs/ansi/interval.sql   |  4 +--
 .../inputs/decimalArithmeticOperations.sql |  2 +-
 .../test/resources/sql-tests/inputs/pgSQL/text.sql |  6 ++--
 .../sql-tests/results/ansi/interval.sql.out|  8 ++---
 .../results/decimalArithmeticOperations.sql.out|  4 +--
 .../resources/sql-tests/results/pgSQL/text.sql.out |  8 ++---
 .../org/apache/spark/sql/DataFrameSuite.scala  |  6 ++--
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  8 ++---
 .../thriftserver/ThriftServerQueryTestSuite.scala  |  2 +-
 29 files changed, 86 insertions(+), 109 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3fc52b5 -> c6ca661)

2019-09-17 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3fc52b5  [SPARK-28950][SQL] Refine the code of DELETE
 add c6ca661  [SPARK-28814][SQL][DOC] Document SET/RESET in SQL Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-conf-mgmt-reset.md | 18 ++-
 docs/sql-ref-syntax-aux-conf-mgmt-set.md   | 49 +-
 2 files changed, 65 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (13b77e5 -> d334fee)

2019-09-14 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 13b77e5  Revert "[SPARK-29046][SQL] Fix NPE in SQLConf.get when active 
SparkContext is stopping"
 add d334fee  [SPARK-28373][DOCS][WEBUI] JDBC/ODBC Server Tab

No new revisions were added by this update.

Summary of changes:
 docs/img/JDBCServer1.png | Bin 0 -> 14763 bytes
 docs/img/JDBCServer2.png | Bin 0 -> 45084 bytes
 docs/img/JDBCServer3.png | Bin 0 -> 108360 bytes
 docs/web-ui.md   |  41 +
 4 files changed, 41 insertions(+)
 create mode 100644 docs/img/JDBCServer1.png
 create mode 100644 docs/img/JDBCServer2.png
 create mode 100644 docs/img/JDBCServer3.png


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b83304f -> d599807)

2019-09-13 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b83304f  [SPARK-28796][DOC] Document DROP DATABASE statement in SQL 
Reference
 add d599807  [SPARK-28795][DOC][SQL] Document CREATE VIEW statement in SQL 
Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-ddl-create-view.md | 62 +-
 1 file changed, 61 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ee63031 -> b83304f)

2019-09-13 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ee63031  [SPARK-28828][DOC] Document REFRESH TABLE command
 add b83304f  [SPARK-28796][DOC] Document DROP DATABASE statement in SQL 
Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-ddl-drop-database.md | 60 +++-
 1 file changed, 59 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5631a96 -> ee63031)

2019-09-13 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5631a96  [SPARK-29048] Improve performance on Column.isInCollection() 
with a large size collection
 add ee63031  [SPARK-28828][DOC] Document REFRESH TABLE command

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml |  2 ++
 docs/sql-ref-syntax-aux-refresh-table.md | 58 
 2 files changed, 60 insertions(+)
 create mode 100644 docs/sql-ref-syntax-aux-refresh-table.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c56a012 -> 5631a96)

2019-09-12 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c56a012  [SPARK-29060][SQL] Add tree traversal helper for adaptive 
spark plans
 add 5631a96  [SPARK-29048] Improve performance on Column.isInCollection() 
with a large size collection

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/Column.scala   | 10 -
 .../apache/spark/sql/ColumnExpressionSuite.scala   | 45 ++
 2 files changed, 37 insertions(+), 18 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1593 matches

Mail list logo