date:20211112

[spark] branch master updated: [SPARK-37139][PYTHON][FOLLOWUP] Fix class variable type hints in taskcontext.py

2021-11-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fd6079d  [SPARK-37139][PYTHON][FOLLOWUP] Fix class variable type hints 
in taskcontext.py
fd6079d is described below

commit fd6079d72e6579d85066a430d30d082259165c0e
Author: Takuya UESHIN 
AuthorDate: Sat Nov 13 15:33:45 2021 +0900

[SPARK-37139][PYTHON][FOLLOWUP] Fix class variable type hints in 
taskcontext.py

### What changes were proposed in this pull request?

Fixes class variable type hints in taskcontext.py.

### Why are the changes needed?

In `taskcontext.py`, a variable `_taskContext` should be annotated as 
`ClassVar` of `TaskContext`.
Also, `_port` and `_secret` of `BarrierTaskContext ` should be annotated.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`lint-python` should pass.

Closes #34574 from ueshin/issues/SPARK-37139/taskcontext.

Authored-by: Takuya UESHIN 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/taskcontext.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/taskcontext.py b/python/pyspark/taskcontext.py
index ebfcfcd..7333a68 100644
--- a/python/pyspark/taskcontext.py
+++ b/python/pyspark/taskcontext.py
@@ -14,7 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-from typing import Type, Dict, List, Optional, Union, cast
+from typing import ClassVar, Type, Dict, List, Optional, Union, cast
 
 from pyspark.java_gateway import local_connect_and_auth
 from pyspark.resource import ResourceInformation
@@ -29,7 +29,7 @@ class TaskContext(object):
 :meth:`TaskContext.get`.
 """
 
-_taskContext: Optional["TaskContext"] = None
+_taskContext: ClassVar[Optional["TaskContext"]] = None
 
 _attemptNumber: Optional[int] = None
 _partitionId: Optional[int] = None
@@ -171,8 +171,8 @@ class BarrierTaskContext(TaskContext):
 This API is experimental
 """
 
-_port = None
-_secret = None
+_port: ClassVar[Optional[Union[str, int]]] = None
+_secret: ClassVar[Optional[str]] = None
 
 @classmethod
 def _getOrCreate(cls: Type["BarrierTaskContext"]) -> "BarrierTaskContext":

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37282][TESTS][FOLLOWUP] Mark `YarnShuffleServiceSuite` as ExtendedLevelDBTest

2021-11-12 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8b45a08  [SPARK-37282][TESTS][FOLLOWUP] Mark `YarnShuffleServiceSuite` 
as ExtendedLevelDBTest
8b45a08 is described below

commit 8b45a08b763e9ee6c75b039893af3de5e5167643
Author: Dongjoon Hyun 
AuthorDate: Sat Nov 13 15:21:59 2021 +0900

[SPARK-37282][TESTS][FOLLOWUP] Mark `YarnShuffleServiceSuite` as 
ExtendedLevelDBTest

### What changes were proposed in this pull request?

This PR is a follow-up of #34548. This is missed due to `-Pyarn` profile.

### Why are the changes needed?

This is required to pass `yarn` module on Apple Silicon.

**BEFORE**
```
$ build/sbt "yarn/test"
...
[info] YarnShuffleServiceSuite:
[info] org.apache.spark.network.yarn.YarnShuffleServiceSuite *** ABORTED 
*** (20 milliseconds)
[info]   java.lang.UnsatisfiedLinkError: Could not load library. Reasons: 
[no leveldbjni64-1.8
...
```

**AFTER**
```
$ build/sbt "yarn/test" -Pyarn 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest
...
[info] Run completed in 4 minutes, 57 seconds.
[info] Total number of tests run: 135
[info] Suites: completed 18, aborted 0
[info] Tests: succeeded 135, failed 0, canceled 1, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 319 s (05:19), completed Nov 12, 2021, 4:53:14 PM
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

A manual test on Apple Silicon.
```
$ build/sbt "yarn/test" -Pyarn 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest
```

Closes #34576 from dongjoon-hyun/SPARK-37282-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Kousuke Saruta 
---
 .../scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala   | 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
 
b/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
index b2025aa..38d2247 100644
--- 
a/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
+++ 
b/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
@@ -46,8 +46,10 @@ import org.apache.spark.internal.config._
 import org.apache.spark.network.shuffle.{NoOpMergedShuffleFileManager, 
RemoteBlockPushResolver, ShuffleTestAccessor}
 import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo
 import org.apache.spark.network.util.TransportConf
+import org.apache.spark.tags.ExtendedLevelDBTest
 import org.apache.spark.util.Utils
 
+@ExtendedLevelDBTest
 class YarnShuffleServiceSuite extends SparkFunSuite with Matchers with 
BeforeAndAfterEach {
   private[yarn] var yarnConfig: YarnConfiguration = null
   private[yarn] val SORT_MANAGER = 
"org.apache.spark.shuffle.sort.SortShuffleManager"

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37312][TESTS] Add `.java-version` to `.gitignore` and `.rat-excludes`

2021-11-12 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d0eb621  [SPARK-37312][TESTS] Add `.java-version` to `.gitignore` and 
`.rat-excludes`
d0eb621 is described below

commit d0eb62179822c82596c4feaa412f2fdf5b83c02a
Author: Dongjoon Hyun 
AuthorDate: Sat Nov 13 14:43:52 2021 +0900

[SPARK-37312][TESTS] Add `.java-version` to `.gitignore` and `.rat-excludes`

### What changes were proposed in this pull request?

To support Java 8/11/17 test more easily, this PR aims to add 
`.java-version` to `.gitignore` and `.rat-excludes`.

### Why are the changes needed?

When we use `jenv`, `dev/check-license` and `dev/run-tests` fails.

```

Running Apache RAT checks

Could not find Apache license headers in the following files:
 !? /Users/dongjoon/APACHE/spark-merge/.java-version
[error] running /Users/dongjoon/APACHE/spark-merge/dev/check-license ; 
received return code 1
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

```
$ jenv local 17
$ dev/check-license
```

Closes #34577 from dongjoon-hyun/SPARK-37312.

Authored-by: Dongjoon Hyun 
Signed-off-by: Kousuke Saruta 
---
 .gitignore| 1 +
 dev/.rat-excludes | 1 +
 2 files changed, 2 insertions(+)

diff --git a/.gitignore b/.gitignore
index 1a7881a..560265e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -7,6 +7,7 @@
 *.pyo
 *.swp
 *~
+.java-version
 .DS_Store
 .bsp/
 .cache
diff --git a/dev/.rat-excludes b/dev/.rat-excludes
index a35d4ce..7932c5d 100644
--- a/dev/.rat-excludes
+++ b/dev/.rat-excludes
@@ -10,6 +10,7 @@ cache
 .generated-mima-member-excludes
 .rat-excludes
 .*md
+.java-version
 derby.log
 licenses/*
 licenses-binary/*

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (aaf0e5e -> 8e05c78)

2021-11-12 Thread ueshin

This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aaf0e5e  [SPARK-37292][SQL][FOLLOWUP] Simplify the condition when 
removing outer join if it only has DISTINCT on streamed side
 add 8e05c78  [SPARK-37298][SQL] Use unique exprIds in RewriteAsOfJoin

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/RewriteAsOfJoin.scala  | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37292][SQL][FOLLOWUP] Simplify the condition when removing outer join if it only has DISTINCT on streamed side

2021-11-12 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new aaf0e5e  [SPARK-37292][SQL][FOLLOWUP] Simplify the condition when 
removing outer join if it only has DISTINCT on streamed side
aaf0e5e is described below

commit aaf0e5e71509a2324e110e45366b753c7926c64b
Author: Yuming Wang 
AuthorDate: Fri Nov 12 14:30:47 2021 -0800

[SPARK-37292][SQL][FOLLOWUP] Simplify the condition when removing outer 
join if it only has DISTINCT on streamed side

### What changes were proposed in this pull request?

Simplify the condition when removing outer join if it only has DISTINCT on 
streamed side with alias. See: 
https://github.com/apache/spark/pull/34557#discussion_r748005299.

### Why are the changes needed?

Simplify the code.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit test.

Closes #34573 from wangyum/SPARK-37292-2.

Authored-by: Yuming Wang 
Signed-off-by: Liang-Chi Hsieh 
---
 .../main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala  | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
index f9f2e83..e03360d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
@@ -179,12 +179,10 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with 
PredicateHelper {
 if a.groupOnly && a.references.subsetOf(right.outputSet) =>
   a.copy(child = right)
 case a @ Aggregate(_, _, p @ Project(_, Join(left, _, LeftOuter, _, _)))
-if a.groupOnly && p.outputSet.subsetOf(a.references) &&
-  
AttributeSet(p.projectList.flatMap(_.references)).subsetOf(left.outputSet) =>
+if a.groupOnly && p.references.subsetOf(left.outputSet) =>
   a.copy(child = p.copy(child = left))
 case a @ Aggregate(_, _, p @ Project(_, Join(_, right, RightOuter, _, _)))
-if a.groupOnly && p.outputSet.subsetOf(a.references) &&
-  
AttributeSet(p.projectList.flatMap(_.references)).subsetOf(right.outputSet) =>
+if a.groupOnly && p.references.subsetOf(right.outputSet) =>
   a.copy(child = p.copy(child = right))
   }
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-37302][BUILD] Explicitly downloads guava and jetty-io in test-dependencies.sh

2021-11-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 0c99f13  [SPARK-37302][BUILD] Explicitly downloads guava and jetty-io 
in test-dependencies.sh
0c99f13 is described below

commit 0c99f137dac34de98d647d2919f4ca14f683b890
Author: Kousuke Saruta 
AuthorDate: Fri Nov 12 11:14:29 2021 -0800

[SPARK-37302][BUILD] Explicitly downloads guava and jetty-io in 
test-dependencies.sh

### What changes were proposed in this pull request?

This PR change `dev/test-dependencies.sh` to download `guava` and 
`jetty-io` explicitly.

`dev/run-tests.py` fails if Scala 2.13 is used and `guava` or `jetty-io` is 
not in the both of Maven and Coursier local repository.

```
$ rm -rf ~/.m2/repository/*
$ # For Linux
$ rm -rf ~/.cache/coursier/v1/*
$ # For macOS
$ rm -rf ~/Library/Caches/Coursier/v1/*
$ dev/change-scala-version.sh 2.13
$ dev/test-dependencies.sh
$ build/sbt -Pscala-2.13 clean compile
...
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:24:1:
  error: package com.google.common.primitives does not exist
[error] import com.google.common.primitives.Ints;
[error]^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:30:1:
  error: package com.google.common.annotations does not exist
[error] import com.google.common.annotations.VisibleForTesting;
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:31:1:
  error: package com.google.common.base does not exist
[error] import com.google.common.base.Preconditions;
...
```
```
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:25:
 Class org.eclipse.jetty.io.ByteBufferPool not found - continuing with a stub.
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:21:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with (org.eclipse.jetty.server.Server, Null, 
org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, 
org.eclipse.jetty.server.HttpConnectionFactory)
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:207:13:
 Class org.eclipse.jetty.io.ClientConnectionFactory not found - continuing with 
a stub.
[error] new HttpClient(new 
HttpClientTransportOverHTTP(numSelectors), null)
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:287:25:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with

[spark] branch master updated: [SPARK-37302][BUILD] Explicitly downloads guava and jetty-io in test-dependencies.sh

2021-11-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 01ab0be  [SPARK-37302][BUILD] Explicitly downloads guava and jetty-io 
in test-dependencies.sh
01ab0be is described below

commit 01ab0bedb2ad423963ebff78272e0d045b0b9107
Author: Kousuke Saruta 
AuthorDate: Fri Nov 12 11:14:29 2021 -0800

[SPARK-37302][BUILD] Explicitly downloads guava and jetty-io in 
test-dependencies.sh

### What changes were proposed in this pull request?

This PR change `dev/test-dependencies.sh` to download `guava` and 
`jetty-io` explicitly.

`dev/run-tests.py` fails if Scala 2.13 is used and `guava` or `jetty-io` is 
not in the both of Maven and Coursier local repository.

```
$ rm -rf ~/.m2/repository/*
$ # For Linux
$ rm -rf ~/.cache/coursier/v1/*
$ # For macOS
$ rm -rf ~/Library/Caches/Coursier/v1/*
$ dev/change-scala-version.sh 2.13
$ dev/test-dependencies.sh
$ build/sbt -Pscala-2.13 clean compile
...
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:24:1:
  error: package com.google.common.primitives does not exist
[error] import com.google.common.primitives.Ints;
[error]^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:30:1:
  error: package com.google.common.annotations does not exist
[error] import com.google.common.annotations.VisibleForTesting;
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:31:1:
  error: package com.google.common.base does not exist
[error] import com.google.common.base.Preconditions;
...
```
```
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:25:
 Class org.eclipse.jetty.io.ByteBufferPool not found - continuing with a stub.
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala:87:21:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with (org.eclipse.jetty.server.Server, Null, 
org.eclipse.jetty.util.thread.ScheduledExecutorScheduler, Null, Int, Int, 
org.eclipse.jetty.server.HttpConnectionFactory)
[error] val connector = new ServerConnector(
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:207:13:
 Class org.eclipse.jetty.io.ClientConnectionFactory not found - continuing with 
a stub.
[error] new HttpClient(new 
HttpClientTransportOverHTTP(numSelectors), null)
[error] ^
[error] 
/home/kou/work/oss/spark-scala-2.13/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala:287:25:
 multiple constructors for ServerConnector with alternatives:
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
java.util.concurrent.Executor,x$3: org.eclipse.jetty.util.thread.Scheduler,x$4: 
org.eclipse.jetty.io.ByteBufferPool,x$5: Int,x$6: Int,x$7: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.util.ssl.SslContextFactory,x$3: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
 
[error]   (x$1: org.eclipse.jetty.server.Server,x$2: Int,x$3: Int,x$4: 
org.eclipse.jetty.server.ConnectionFactory*)org.eclipse.jetty.server.ServerConnector
[error]  cannot be invoked with

[spark] branch branch-3.2 updated: [SPARK-37702][SQL] Use AnalysisContext to track referred temp functions

2021-11-12 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new b7ec10a  [SPARK-37702][SQL] Use AnalysisContext to track referred temp 
functions
b7ec10a is described below

commit b7ec10afec4bd8a53e0ec06f90be1216b44e9538
Author: Linhong Liu 
AuthorDate: Fri Nov 12 22:20:02 2021 +0800

[SPARK-37702][SQL] Use AnalysisContext to track referred temp functions

This PR uses `AnalysisContext` to track the referred temp functions in 
order to fix a temp
function resolution issue when it's registered with a `FunctionBuilder` and 
referred by a temp view.

During temporary view creation, we need to collect all the temp functions 
and save them
to the metadata. So that next time when resolving the view SQL text, the 
functions can be
resolved correctly. But if the temp function is registered with a 
`FunctionBuilder`, it's not a
`UserDefinedExpression` so it cannot be collected as a temp function. As a 
result, the next time
when the analyzer resolves a temp view, the registered function couldn't be 
found.

Example:
```scala
val func = CatalogFunction(FunctionIdentifier("tempFunc", None), ...)
val builder = (e: Seq[Expression]) => e.head
spark.sessionState.catalog.registerFunction(func, Some(builder))
sql("create temp view tv as select tempFunc(a, b) from values (1, 2) t(a, 
b)")
sql("select * from tv").collect()
```

bug-fix

no

newly added test cases.

Closes #34546 from linhongliu-db/SPARK-37702-ver3.

Authored-by: Linhong Liu 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 68a0ab5960e847e0fa1a59da0316d0c111574af4)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 21 --
 .../sql/catalyst/catalog/SessionCatalog.scala  |  5 ++
 .../spark/sql/catalyst/plans/logical/Command.scala |  5 +-
 .../sql/catalyst/plans/logical/v2Commands.scala|  8 +--
 .../apache/spark/sql/execution/command/views.scala | 82 --
 .../spark/sql/execution/datasources/ddl.scala  |  6 +-
 .../spark/sql/SparkSessionExtensionSuite.scala | 10 +++
 .../spark/sql/execution/SQLViewTestSuite.scala | 26 ++-
 .../execution/command/PlanResolutionSuite.scala|  6 +-
 9 files changed, 111 insertions(+), 58 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index fa6b247..89c7b5f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -125,7 +125,11 @@ case class AnalysisContext(
 maxNestedViewDepth: Int = -1,
 relationCache: mutable.Map[Seq[String], LogicalPlan] = mutable.Map.empty,
 referredTempViewNames: Seq[Seq[String]] = Seq.empty,
-referredTempFunctionNames: Seq[String] = Seq.empty,
+// 1. If we are resolving a view, this field will be restored from the 
view metadata,
+//by calling `AnalysisContext.withAnalysisContext(viewDesc)`.
+// 2. If we are not resolving a view, this field will be updated everytime 
the analyzer
+//lookup a temporary function. And export to the view metadata.
+referredTempFunctionNames: mutable.Set[String] = mutable.Set.empty,
 outerPlan: Option[LogicalPlan] = None)
 
 object AnalysisContext {
@@ -152,11 +156,17 @@ object AnalysisContext {
   maxNestedViewDepth,
   originContext.relationCache,
   viewDesc.viewReferredTempViewNames,
-  viewDesc.viewReferredTempFunctionNames)
+  mutable.Set(viewDesc.viewReferredTempFunctionNames: _*))
 set(context)
 try f finally { set(originContext) }
   }
 
+  def withNewAnalysisContext[A](f: => A): A = {
+val originContext = value.get()
+reset()
+try f finally { set(originContext) }
+  }
+
   def withOuterPlan[A](outerPlan: LogicalPlan)(f: => A): A = {
 val originContext = value.get()
 val context = originContext.copy(outerPlan = Some(outerPlan))
@@ -204,11 +214,8 @@ class Analyzer(override val catalogManager: CatalogManager)
   }
 
   override def execute(plan: LogicalPlan): LogicalPlan = {
-AnalysisContext.reset()
-try {
+AnalysisContext.withNewAnalysisContext {
   executeSameContext(plan)
-} finally {
-  AnalysisContext.reset()
 }
   }
 
@@ -3741,7 +3748,7 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   _.containsPattern(COMMAND)) {
   case c: AnalysisOnlyCommand if c.resolved =>
 checkAnalysis(c)
-c.markAsAnalyzed()
+c.markAsAnalyzed(AnalysisContext.get)
 }
   }
 }
diff --git

[spark] branch master updated: [SPARK-37304][SQL] Allow ANSI intervals in v2 `ALTER TABLE .. REPLACE COLUMNS`

2021-11-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 71f4ee3  [SPARK-37304][SQL] Allow ANSI intervals in v2 `ALTER TABLE .. 
REPLACE COLUMNS`
71f4ee3 is described below

commit 71f4ee38c71734128c5653b8f18a7d0bf1014b6b
Author: Max Gekk 
AuthorDate: Fri Nov 12 17:23:36 2021 +0300

[SPARK-37304][SQL] Allow ANSI intervals in v2 `ALTER TABLE .. REPLACE 
COLUMNS`

### What changes were proposed in this pull request?
In the PR, I propose to allow ANSI intervals: year-month and day-time 
intervals in the `ALTER TABLE .. REPLACE COLUMNS` command for tables in v2 
catalogs (v1 catalogs don't support the command). Also added unified test suite 
to migrate related tests in the future.

### Why are the changes needed?
To improve user experience with Spark SQL. After the changes, users can 
replace columns with ANSI intervals instead of removing and adding such columns.

### Does this PR introduce _any_ user-facing change?
In some sense, yes. After the changes, the command doesn't output any error 
message.

### How was this patch tested?
By running new test suite:
```
$ build/sbt "test:testOnly *AlterTableReplaceColumnsSuite"
```

Closes #34571 from MaxGekk/add-replace-ansi-interval-col.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../plans/logical/v2AlterTableCommands.scala   |  2 +-
 .../AlterTableReplaceColumnsSuiteBase.scala| 54 ++
 .../command/v2/AlterTableReplaceColumnsSuite.scala | 28 +++
 3 files changed, 83 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
index 302a810..2eb828e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
@@ -134,7 +134,7 @@ case class ReplaceColumns(
 table: LogicalPlan,
 columnsToAdd: Seq[QualifiedColType]) extends AlterTableCommand {
   columnsToAdd.foreach { c =>
-TypeUtils.failWithIntervalType(c.dataType)
+TypeUtils.failWithIntervalType(c.dataType, forbidAnsiIntervals = false)
   }
 
   override lazy val resolved: Boolean = table.resolved && 
columnsToAdd.forall(_.resolved)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableReplaceColumnsSuiteBase.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableReplaceColumnsSuiteBase.scala
new file mode 100644
index 000..fed4076
--- /dev/null
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableReplaceColumnsSuiteBase.scala
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import java.time.{Duration, Period}
+
+import org.apache.spark.sql.{QueryTest, Row}
+
+/**
+ * This base suite contains unified tests for the `ALTER TABLE .. REPLACE 
COLUMNS` command that
+ * check the V2 table catalog. The tests that cannot run for all supported 
catalogs are
+ * located in more specific test suites:
+ *
+ *   - V2 table catalog tests:
+ * 
`org.apache.spark.sql.execution.command.v2.AlterTableReplaceColumnsSuite`
+ */
+trait AlterTableReplaceColumnsSuiteBase extends QueryTest with 
DDLCommandTestUtils {
+  override val command = "ALTER TABLE .. REPLACE COLUMNS"
+
+  test("SPARK-37304: Replace columns by ANSI intervals") {
+withNamespaceAndTable("ns", "tbl") { t =>
+  sql(s"CREATE TABLE $t (ym INTERVAL MONTH, dt INTERVAL HOUR, data STRING) 
$defaultUsing")
+  // TODO(SPARK-37303): Uncomment the command below after REPLACE COLUMNS 
is fixed
+  // sql(s"INSERT INTO $t SELECT INTERVAL '1' MONTH, INTERVAL '2' HOUR, 
'abc'")
+  sql(
+s"""
+   |ALTER TABLE $t REPLACE COLUMNS (
+   | new_ym INTERVAL

[spark] branch master updated: [SPARK-37702][SQL] Use AnalysisContext to track referred temp functions

2021-11-12 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 68a0ab5  [SPARK-37702][SQL] Use AnalysisContext to track referred temp 
functions
68a0ab5 is described below

commit 68a0ab5960e847e0fa1a59da0316d0c111574af4
Author: Linhong Liu 
AuthorDate: Fri Nov 12 22:20:02 2021 +0800

[SPARK-37702][SQL] Use AnalysisContext to track referred temp functions

### What changes were proposed in this pull request?
This PR uses `AnalysisContext` to track the referred temp functions in 
order to fix a temp
function resolution issue when it's registered with a `FunctionBuilder` and 
referred by a temp view.

During temporary view creation, we need to collect all the temp functions 
and save them
to the metadata. So that next time when resolving the view SQL text, the 
functions can be
resolved correctly. But if the temp function is registered with a 
`FunctionBuilder`, it's not a
`UserDefinedExpression` so it cannot be collected as a temp function. As a 
result, the next time
when the analyzer resolves a temp view, the registered function couldn't be 
found.

Example:
```scala
val func = CatalogFunction(FunctionIdentifier("tempFunc", None), ...)
val builder = (e: Seq[Expression]) => e.head
spark.sessionState.catalog.registerFunction(func, Some(builder))
sql("create temp view tv as select tempFunc(a, b) from values (1, 2) t(a, 
b)")
sql("select * from tv").collect()
```

### Why are the changes needed?
bug-fix

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
newly added test cases.

Closes #34546 from linhongliu-db/SPARK-37702-ver3.

Authored-by: Linhong Liu 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 21 --
 .../sql/catalyst/catalog/SessionCatalog.scala  |  5 ++
 .../spark/sql/catalyst/plans/logical/Command.scala |  5 +-
 .../sql/catalyst/plans/logical/v2Commands.scala|  8 +--
 .../apache/spark/sql/execution/command/views.scala | 82 --
 .../spark/sql/execution/datasources/ddl.scala  |  6 +-
 .../spark/sql/SparkSessionExtensionSuite.scala | 10 +++
 .../spark/sql/execution/SQLViewTestSuite.scala | 26 ++-
 .../execution/command/PlanResolutionSuite.scala|  6 +-
 9 files changed, 111 insertions(+), 58 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 068886e..26d206f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -125,7 +125,11 @@ case class AnalysisContext(
 maxNestedViewDepth: Int = -1,
 relationCache: mutable.Map[Seq[String], LogicalPlan] = mutable.Map.empty,
 referredTempViewNames: Seq[Seq[String]] = Seq.empty,
-referredTempFunctionNames: Seq[String] = Seq.empty,
+// 1. If we are resolving a view, this field will be restored from the 
view metadata,
+//by calling `AnalysisContext.withAnalysisContext(viewDesc)`.
+// 2. If we are not resolving a view, this field will be updated everytime 
the analyzer
+//lookup a temporary function. And export to the view metadata.
+referredTempFunctionNames: mutable.Set[String] = mutable.Set.empty,
 outerPlan: Option[LogicalPlan] = None)
 
 object AnalysisContext {
@@ -152,11 +156,17 @@ object AnalysisContext {
   maxNestedViewDepth,
   originContext.relationCache,
   viewDesc.viewReferredTempViewNames,
-  viewDesc.viewReferredTempFunctionNames)
+  mutable.Set(viewDesc.viewReferredTempFunctionNames: _*))
 set(context)
 try f finally { set(originContext) }
   }
 
+  def withNewAnalysisContext[A](f: => A): A = {
+val originContext = value.get()
+reset()
+try f finally { set(originContext) }
+  }
+
   def withOuterPlan[A](outerPlan: LogicalPlan)(f: => A): A = {
 val originContext = value.get()
 val context = originContext.copy(outerPlan = Some(outerPlan))
@@ -204,11 +214,8 @@ class Analyzer(override val catalogManager: CatalogManager)
   }
 
   override def execute(plan: LogicalPlan): LogicalPlan = {
-AnalysisContext.reset()
-try {
+AnalysisContext.withNewAnalysisContext {
   executeSameContext(plan)
-} finally {
-  AnalysisContext.reset()
 }
   }
 
@@ -3651,7 +3658,7 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   _.containsPattern(COMMAND)) {
   case c: AnalysisOnlyCommand if c.resolved =>
 checkAnalysis(c)
-c.markAsAnalyzed()
+c.markAsAnalyzed(AnalysisContext.get)
 }

[spark] branch master updated (9191632 -> a4b8a8d)

2021-11-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9191632  [SPARK-36825][FOLLOWUP] Move the test code from 
`ParquetIOSuite` to `ParquetFileFormatSuite`
 add a4b8a8d  [SPARK-37294][SQL][TESTS] Check inserting of ANSI intervals 
into a table partitioned by the interval columns

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 35 +-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 34 +
 2 files changed, 68 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37139][PYTHON][FOLLOWUP] Fix class variable type hints in taskcontext.py

[spark] branch master updated: [SPARK-37282][TESTS][FOLLOWUP] Mark `YarnShuffleServiceSuite` as ExtendedLevelDBTest

[spark] branch master updated: [SPARK-37312][TESTS] Add `.java-version` to `.gitignore` and `.rat-excludes`

[spark] branch master updated (aaf0e5e -> 8e05c78)

[spark] branch master updated: [SPARK-37292][SQL][FOLLOWUP] Simplify the condition when removing outer join if it only has DISTINCT on streamed side

[spark] branch branch-3.2 updated: [SPARK-37302][BUILD] Explicitly downloads guava and jetty-io in test-dependencies.sh

[spark] branch master updated: [SPARK-37302][BUILD] Explicitly downloads guava and jetty-io in test-dependencies.sh

[spark] branch branch-3.2 updated: [SPARK-37702][SQL] Use AnalysisContext to track referred temp functions

[spark] branch master updated: [SPARK-37304][SQL] Allow ANSI intervals in v2 `ALTER TABLE .. REPLACE COLUMNS`

[spark] branch master updated: [SPARK-37702][SQL] Use AnalysisContext to track referred temp functions

[spark] branch master updated (9191632 -> a4b8a8d)

11 matches

Site Navigation

Mail list logo

Footer information