[spark] branch branch-3.1 updated: [SPARK-37049][K8S] executorIdleTimeout should check `creationTimestamp` instead of `startTime`

2021-10-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 74fe4fa  [SPARK-37049][K8S] executorIdleTimeout should check 
`creationTimestamp` instead of `startTime`
74fe4fa is described below

commit 74fe4fadfa76b15560a78ce53f53319f015819e7
Author: Weiwei Yang 
AuthorDate: Tue Oct 19 22:42:06 2021 -0700

[SPARK-37049][K8S] executorIdleTimeout should check `creationTimestamp` 
instead of `startTime`

SPARK-33099 added the support to respect 
`spark.dynamicAllocation.executorIdleTimeout` in `ExecutorPodsAllocator`. 
However, when it checks if a pending executor pod is timed out, it checks 
against the pod's 
[startTime](https://github.com/kubernetes/api/blob/2a5dae08c42b1e8fdc1379432d8898efece65363/core/v1/types.go#L3664-L3667),
 see code 
[here](https://github.com/apache/spark/blob/c2ba498ff678ddda034cedf45cc17fbeefe922fd/resource-managers/kubernetes/core/src/main/scala/org/apache/spark
 [...]

This can be reproduced locally, run the following job

```
${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 
--deploy-mode cluster --name spark-group-example \
  --master k8s://http://localhost:8001 --deploy-mode cluster \
  --class org.apache.spark.examples.GroupByTest \
  --conf spark.executor.instances=1 \
  --conf spark.kubernetes.namespace=spark-test \
  --conf spark.kubernetes.executor.request.cores=1 \
  --conf spark.dynamicAllocation.enabled=true \
  --conf spark.shuffle.service.enabled=true \
  --conf spark.dynamicAllocation.shuffleTracking.enabled=true \
  --conf spark.shuffle.service.enabled=false \
  --conf spark.kubernetes.container.image=local/spark:3.3.0 \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar \
  1000 1000 100 1000
```

the local cluster doesn't have enough resources to run more than 4 
executors, the rest of the executor pods will be pending. The job will have 
task backlogs and triggers to request more executors from K8s:

```
21/10/19 22:51:45 INFO ExecutorPodsAllocator: Going to request 1 executors 
from Kubernetes for ResourceProfile Id: 0, target: 1 running: 0.
21/10/19 22:51:51 INFO ExecutorPodsAllocator: Going to request 1 executors 
from Kubernetes for ResourceProfile Id: 0, target: 2 running: 1.
21/10/19 22:51:52 INFO ExecutorPodsAllocator: Going to request 2 executors 
from Kubernetes for ResourceProfile Id: 0, target: 4 running: 2.
21/10/19 22:51:53 INFO ExecutorPodsAllocator: Going to request 4 executors 
from Kubernetes for ResourceProfile Id: 0, target: 8 running: 4.
...
21/10/19 22:52:14 INFO ExecutorPodsAllocator: Deleting 39 excess pod 
requests 
(23,59,32,41,50,68,35,44,17,8,53,62,26,71,11,56,29,38,47,20,65,5,14,46,64,73,55,49,40,67,58,13,22,31,7,16,52,70,43).
21/10/19 22:52:18 INFO ExecutorPodsAllocator: Deleting 28 excess pod 
requests 
(25,34,61,37,10,19,28,60,69,63,45,54,72,36,18,9,27,21,57,12,48,30,39,66,15,42,24,33).
```

At `22:51:45`, it starts to request executors; and at  `22:52:14` it starts 
to delete excess executor pods. This is 29s but 
spark.dynamicAllocation.executorIdleTimeout is set to 60s. The config was not 
honored.

### What changes were proposed in this pull request?
Change the check from using pod's `startTime` to `creationTimestamp`. 
[creationTimestamp](https://github.com/kubernetes/apimachinery/blob/e6c90c4366be1504309a6aafe0d816856450f36a/pkg/apis/meta/v1/types.go#L193-L201)
 is the timestamp when a pod gets created on K8s:

```
// CreationTimestamp is a timestamp representing the server time when this 
object was
// created. It is not guaranteed to be set in happens-before order across 
separate operations.
// Clients may not set this value. It is represented in RFC3339 form and is 
in UTC.
```


[startTime](https://github.com/kubernetes/api/blob/2a5dae08c42b1e8fdc1379432d8898efece65363/core/v1/types.go#L3664-L3667)
 is the timestamp when pod gets started:

```
// RFC 3339 date and time at which the object was acknowledged by the 
Kubelet.
// This is before the Kubelet pulled the container image(s) for the pod.
// +optional
```

a pending pod's startTime is empty. Here is a example of a pending pod:

```
NAMESPACE NAME READY   STATUS
RESTARTS   AGE
default   pending-pod-example  0/1 Pending   0  
2s

kubectl get pod pending-pod-example -o yaml | grep creationTimestamp
--->  creationTimestamp: "2021-10-19T16:17:52Z"

// pending pod has no startTime
kubectl get pod pending-pod-example -o yaml | grep 

[spark] branch branch-3.2 updated: [SPARK-37049][K8S] executorIdleTimeout should check `creationTimestamp` instead of `startTime`

2021-10-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 10a958b3 [SPARK-37049][K8S] executorIdleTimeout should check 
`creationTimestamp` instead of `startTime`
10a958b3 is described below

commit 10a958b3480dd536c869247fe3e83e823a51
Author: Weiwei Yang 
AuthorDate: Tue Oct 19 22:42:06 2021 -0700

[SPARK-37049][K8S] executorIdleTimeout should check `creationTimestamp` 
instead of `startTime`

SPARK-33099 added the support to respect 
`spark.dynamicAllocation.executorIdleTimeout` in `ExecutorPodsAllocator`. 
However, when it checks if a pending executor pod is timed out, it checks 
against the pod's 
[startTime](https://github.com/kubernetes/api/blob/2a5dae08c42b1e8fdc1379432d8898efece65363/core/v1/types.go#L3664-L3667),
 see code 
[here](https://github.com/apache/spark/blob/c2ba498ff678ddda034cedf45cc17fbeefe922fd/resource-managers/kubernetes/core/src/main/scala/org/apache/spark
 [...]

This can be reproduced locally, run the following job

```
${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 
--deploy-mode cluster --name spark-group-example \
  --master k8s://http://localhost:8001 --deploy-mode cluster \
  --class org.apache.spark.examples.GroupByTest \
  --conf spark.executor.instances=1 \
  --conf spark.kubernetes.namespace=spark-test \
  --conf spark.kubernetes.executor.request.cores=1 \
  --conf spark.dynamicAllocation.enabled=true \
  --conf spark.shuffle.service.enabled=true \
  --conf spark.dynamicAllocation.shuffleTracking.enabled=true \
  --conf spark.shuffle.service.enabled=false \
  --conf spark.kubernetes.container.image=local/spark:3.3.0 \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar \
  1000 1000 100 1000
```

the local cluster doesn't have enough resources to run more than 4 
executors, the rest of the executor pods will be pending. The job will have 
task backlogs and triggers to request more executors from K8s:

```
21/10/19 22:51:45 INFO ExecutorPodsAllocator: Going to request 1 executors 
from Kubernetes for ResourceProfile Id: 0, target: 1 running: 0.
21/10/19 22:51:51 INFO ExecutorPodsAllocator: Going to request 1 executors 
from Kubernetes for ResourceProfile Id: 0, target: 2 running: 1.
21/10/19 22:51:52 INFO ExecutorPodsAllocator: Going to request 2 executors 
from Kubernetes for ResourceProfile Id: 0, target: 4 running: 2.
21/10/19 22:51:53 INFO ExecutorPodsAllocator: Going to request 4 executors 
from Kubernetes for ResourceProfile Id: 0, target: 8 running: 4.
...
21/10/19 22:52:14 INFO ExecutorPodsAllocator: Deleting 39 excess pod 
requests 
(23,59,32,41,50,68,35,44,17,8,53,62,26,71,11,56,29,38,47,20,65,5,14,46,64,73,55,49,40,67,58,13,22,31,7,16,52,70,43).
21/10/19 22:52:18 INFO ExecutorPodsAllocator: Deleting 28 excess pod 
requests 
(25,34,61,37,10,19,28,60,69,63,45,54,72,36,18,9,27,21,57,12,48,30,39,66,15,42,24,33).
```

At `22:51:45`, it starts to request executors; and at  `22:52:14` it starts 
to delete excess executor pods. This is 29s but 
spark.dynamicAllocation.executorIdleTimeout is set to 60s. The config was not 
honored.

### What changes were proposed in this pull request?
Change the check from using pod's `startTime` to `creationTimestamp`. 
[creationTimestamp](https://github.com/kubernetes/apimachinery/blob/e6c90c4366be1504309a6aafe0d816856450f36a/pkg/apis/meta/v1/types.go#L193-L201)
 is the timestamp when a pod gets created on K8s:

```
// CreationTimestamp is a timestamp representing the server time when this 
object was
// created. It is not guaranteed to be set in happens-before order across 
separate operations.
// Clients may not set this value. It is represented in RFC3339 form and is 
in UTC.
```


[startTime](https://github.com/kubernetes/api/blob/2a5dae08c42b1e8fdc1379432d8898efece65363/core/v1/types.go#L3664-L3667)
 is the timestamp when pod gets started:

```
// RFC 3339 date and time at which the object was acknowledged by the 
Kubelet.
// This is before the Kubelet pulled the container image(s) for the pod.
// +optional
```

a pending pod's startTime is empty. Here is a example of a pending pod:

```
NAMESPACE NAME READY   STATUS
RESTARTS   AGE
default   pending-pod-example  0/1 Pending   0  
2s

kubectl get pod pending-pod-example -o yaml | grep creationTimestamp
--->  creationTimestamp: "2021-10-19T16:17:52Z"

// pending pod has no startTime
kubectl get pod pending-pod-example -o yaml | 

[spark] branch master updated (b07dd1a -> 041cd5d)

2021-10-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b07dd1a  [SPARK-36348][TEST] Complete test_astype for index
 add 041cd5d  [SPARK-37049][K8S] executorIdleTimeout should check 
`creationTimestamp` instead of `startTime`

No new revisions were added by this update.

Summary of changes:
 .../spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala   | 8 
 .../spark/scheduler/cluster/k8s/ExecutorLifecycleTestUtils.scala  | 4 +++-
 2 files changed, 7 insertions(+), 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang opened a new pull request #365: Update Spark version in "Link with Spark" section of download page

2021-10-19 Thread GitBox


gengliangwang opened a new pull request #365:
URL: https://github.com/apache/spark-website/pull/365


   
   
   Update the Spark version from 3.1.2 to 3.2.0 in "Link with Spark" section of 
download page
   Before:
   
![image](https://user-images.githubusercontent.com/1097932/138031706-cb0dc5a3-6978-48bf-8c7d-f71198652677.png)
   
   After:
   
![image](https://user-images.githubusercontent.com/1097932/138031744-32ac46c4-e171-4ad7-92cc-3c4fe77b7fd2.png)
   
   I also run `grep 3.1.2 *.md` to find if there is any place we need to update.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36348][TEST] Complete test_astype for index

2021-10-19 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b07dd1a  [SPARK-36348][TEST] Complete test_astype for index
b07dd1a is described below

commit b07dd1aacab9cf2df4ba3e88842a408e3c5c26a8
Author: Yikun Jiang 
AuthorDate: Wed Oct 20 12:14:06 2021 +0900

[SPARK-36348][TEST] Complete test_astype for index

### What changes were proposed in this pull request?
Before 3.2, there was a bug:
```
pidx = pd.Index([10, 20, 15, 30, 45, None], name="x")
psidx = ps.Index(pidx)
self.assert_eq(psidx.astype(str), pidx.astype(str))

[left pandas on spark]: Index(['10.0', '20.0', '15.0', '30.0', '45.0', 
'nan'], dtype='object', name='x')
[right pandas]: Index(['10', '20', '15', '30', '45', 'None'], 
dtype='object', name='x')
```
So, we didn't add any test on [test_base.py 
int_with_nan]https://github.com/apache/spark/blob/bcc595c112a23d8e3024ace50f0dbc7eab7144b2/python/pyspark/pandas/tests/indexes/test_base.py#L2249

Now, the bug had been resolved, we complete the testcase in here.

### Why are the changes needed?
regression for SPARK-36348 and complete testcase.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Test only

Closes #34335 from Yikun/SPARK-36348.

Authored-by: Yikun Jiang 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/tests/indexes/test_base.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/pandas/tests/indexes/test_base.py 
b/python/pyspark/pandas/tests/indexes/test_base.py
index 4003998..a7f19a7 100644
--- a/python/pyspark/pandas/tests/indexes/test_base.py
+++ b/python/pyspark/pandas/tests/indexes/test_base.py
@@ -2243,12 +2243,14 @@ class IndexesTest(PandasOnSparkTestCase, TestUtils):
 
 pidx = pd.Index([10, 20, 15, 30, 45, None], name="x")
 psidx = ps.Index(pidx)
+self.assert_eq(psidx.astype(bool), pidx.astype(bool))
+self.assert_eq(psidx.astype(str), pidx.astype(str))
 
 pidx = pd.Index(["hi", "hi ", " ", " \t", "", None], name="x")
 psidx = ps.Index(pidx)
 
 self.assert_eq(psidx.astype(bool), pidx.astype(bool))
-self.assert_eq(psidx.astype(str).to_numpy(), ["hi", "hi ", " ", " \t", 
"", "None"])
+self.assert_eq(psidx.astype(str), pidx.astype(str))
 
 pidx = pd.Index([True, False, None], name="x")
 psidx = ps.Index(pidx)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b1aaefb -> 48b3510)

2021-10-19 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b1aaefb  [SPARK-37002][PYTHON] Introduce the 'compute.eager_check' 
option
 add 48b3510  [SPARK-37044][PYTHON] Add Row to __all__ in pyspark.sql.types

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/types.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (81aa514 -> b1aaefb)

2021-10-19 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 81aa514  [SPARK-37059][PYTHON][TESTS] Ensure the sort order of the 
output in the PySpark doctests
 add b1aaefb  [SPARK-37002][PYTHON] Introduce the 'compute.eager_check' 
option

No new revisions were added by this update.

Summary of changes:
 python/docs/source/user_guide/pandas_on_spark/options.rst |  7 +++
 python/pyspark/pandas/config.py   | 11 +++
 2 files changed, 18 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (40f1494 -> 81aa514)

2021-10-19 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 40f1494  [SPARK-37041][SQL] Backport HIVE-15025: Secure-Socket-Layer 
(SSL) support for HMS
 add 81aa514  [SPARK-37059][PYTHON][TESTS] Ensure the sort order of the 
output in the PySpark doctests

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/fpm.py| 20 ++--
 python/pyspark/sql/functions.py |  4 ++--
 python/run-tests.py |  2 +-
 3 files changed, 13 insertions(+), 13 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-37041][SQL] Backport HIVE-15025: Secure-Socket-Layer (SSL) support for HMS

2021-10-19 Thread yumwang
This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 40f1494  [SPARK-37041][SQL] Backport HIVE-15025: Secure-Socket-Layer 
(SSL) support for HMS
40f1494 is described below

commit 40f14942a97d4572178974bcbeea207abb518571
Author: Yuming Wang 
AuthorDate: Wed Oct 20 08:28:27 2021 +0800

[SPARK-37041][SQL] Backport HIVE-15025: Secure-Socket-Layer (SSL) support 
for HMS

### What changes were proposed in this pull request?

This pr backport HIVE-15025: Secure-Socket-Layer (SSL) support for HMS.

### Why are the changes needed?

To make it easy upgrade Thrift:
```
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java:254:1:
  error: incompatible types: String cannot be converted to TConfiguration
[error] return new TSocket(host, port, loginTimeout);
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test.

Closes #34312 from wangyum/SPARK-37041.

Authored-by: Yuming Wang 
Signed-off-by: Yuming Wang 
---
 .../apache/hive/service/auth/HiveAuthFactory.java  | 77 --
 1 file changed, 77 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java
 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java
index fbb5230..8d77b23 100644
--- 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java
+++ 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java
@@ -19,17 +19,10 @@ package org.apache.hive.service.auth;
 import java.io.IOException;
 import java.lang.reflect.Field;
 import java.lang.reflect.Method;
-import java.net.InetSocketAddress;
-import java.net.UnknownHostException;
-import java.util.ArrayList;
-import java.util.Arrays;
 import java.util.HashMap;
-import java.util.List;
-import java.util.Locale;
 import java.util.Map;
 import java.util.Objects;
 
-import javax.net.ssl.SSLServerSocket;
 import javax.security.auth.login.LoginException;
 import javax.security.sasl.Sasl;
 
@@ -50,10 +43,6 @@ import org.apache.hadoop.security.authorize.ProxyUsers;
 import org.apache.hive.service.cli.HiveSQLException;
 import org.apache.hive.service.cli.thrift.ThriftCLIService;
 import org.apache.thrift.TProcessorFactory;
-import org.apache.thrift.transport.TSSLTransportFactory;
-import org.apache.thrift.transport.TServerSocket;
-import org.apache.thrift.transport.TSocket;
-import org.apache.thrift.transport.TTransport;
 import org.apache.thrift.transport.TTransportException;
 import org.apache.thrift.transport.TTransportFactory;
 import org.slf4j.Logger;
@@ -250,72 +239,6 @@ public class HiveAuthFactory {
 }
   }
 
-  public static TTransport getSocketTransport(String host, int port, int 
loginTimeout) {
-return new TSocket(host, port, loginTimeout);
-  }
-
-  public static TTransport getSSLSocket(String host, int port, int 
loginTimeout)
-throws TTransportException {
-return TSSLTransportFactory.getClientSocket(host, port, loginTimeout);
-  }
-
-  public static TTransport getSSLSocket(String host, int port, int 
loginTimeout,
-String trustStorePath, String trustStorePassWord) throws 
TTransportException {
-TSSLTransportFactory.TSSLTransportParameters params =
-  new TSSLTransportFactory.TSSLTransportParameters();
-params.setTrustStore(trustStorePath, trustStorePassWord);
-params.requireClientAuth(true);
-return TSSLTransportFactory.getClientSocket(host, port, loginTimeout, 
params);
-  }
-
-  public static TServerSocket getServerSocket(String hiveHost, int portNum)
-throws TTransportException {
-InetSocketAddress serverAddress;
-if (hiveHost == null || hiveHost.isEmpty()) {
-  // Wildcard bind
-  serverAddress = new InetSocketAddress(portNum);
-} else {
-  serverAddress = new InetSocketAddress(hiveHost, portNum);
-}
-return new TServerSocket(serverAddress);
-  }
-
-  public static TServerSocket getServerSSLSocket(String hiveHost, int portNum, 
String keyStorePath,
-  String keyStorePassWord, List sslVersionBlacklist) throws 
TTransportException,
-  UnknownHostException {
-TSSLTransportFactory.TSSLTransportParameters params =
-new TSSLTransportFactory.TSSLTransportParameters();
-params.setKeyStore(keyStorePath, keyStorePassWord);
-InetSocketAddress serverAddress;
-if (hiveHost == null || hiveHost.isEmpty()) {
-  // Wildcard bind
-  serverAddress = new InetSocketAddress(portNum);
-} else {
-  serverAddress = new InetSocketAddress(hiveHost, portNum);
-}
-TServerSocket thriftServerSocket =
-   

[GitHub] [spark-website] srowen commented on pull request #363: Fix Spark 3.2.0 download URL

2021-10-19 Thread GitBox


srowen commented on pull request #363:
URL: https://github.com/apache/spark-website/pull/363#issuecomment-946915197


   Let's not make the release differ from the profiles, I think. We can rename 
the profile and leave hadoop-3.2 as a no-op profile. By 3.3.0, who knows, maybe 
we're on 3.4 or something


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] sunchao edited a comment on pull request #363: Fix Spark 3.2.0 download URL

2021-10-19 Thread GitBox


sunchao edited a comment on pull request #363:
URL: https://github.com/apache/spark-website/pull/363#issuecomment-946914494


   I'll pick up https://github.com/apache/spark/pull/30891 again soon and make 
sure we'll get this done before Spark 3.3.0
   
   > Should we keep it as it is? Or figure out a way to fix it? 
   
   I think we can fix this in the same PR above, perhaps by just changing 
`make-distribution.sh`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] sunchao commented on pull request #363: Fix Spark 3.2.0 download URL

2021-10-19 Thread GitBox


sunchao commented on pull request #363:
URL: https://github.com/apache/spark-website/pull/363#issuecomment-946914494


   I'll pick up #30891 again soon and make sure we'll get this done before 
Spark 3.3.0
   
   > Should we keep it as it is? Or figure out a way to fix it? 
   
   I think we can fix this in the same PR above, perhaps by just changing 
`make-distribution.sh`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] srowen opened a new pull request #364: Add Scala 2.13 build download link

2021-10-19 Thread GitBox


srowen opened a new pull request #364:
URL: https://github.com/apache/spark-website/pull/364


   We have a Scala 2.13 build for Hadoop 3.3 - add it as an option to download


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] limansky commented on pull request #361: Add 3.2.0 release note and news and update links

2021-10-19 Thread GitBox


limansky commented on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-946857781


   I think it would be nice to have both Hadoop 3 and build without Hadoop for 
2.13.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] srowen commented on pull request #363: Fix Spark 3.2.0 download URL

2021-10-19 Thread GitBox


srowen commented on pull request #363:
URL: https://github.com/apache/spark-website/pull/363#issuecomment-946847970


   Change at Spark 3.3 IMHO. It's just a naming thing. hadoop-3.2 really means 
"3.2 or later" so it's not crazy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] srowen commented on pull request #361: Add 3.2.0 release note and news and update links

2021-10-19 Thread GitBox


srowen commented on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-946847108


   Fair point - I think the problem is the explosion of combinations of 
artifacts if there are sets for each scala version, but we did publish a binary 
release for 2.13 and should be in the UI. Unless someone's on that already I 
can hack in an option maybe. Probably anyone on Scala 2.13 is generally on 
newer versions of things, so not as much point in building for old Hadoop 2 and 
2.13. (People can create whatever build they like from the source release 
though)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang commented on pull request #363: Fix Spark 3.2.0 download URL

2021-10-19 Thread GitBox


gengliangwang commented on pull request #363:
URL: https://github.com/apache/spark-website/pull/363#issuecomment-946847052


   There are some discussions about this on dev list: 
https://www.mail-archive.com/dev@spark.apache.org/msg28025.html
   The conclusion was to make it at Spark 3.3.0.  
   We could try renaming the tarball on finalizing the release, but I was using 
release script...
   Should we keep it as it is? Or figure out a way to fix it? cc @dongjoon-hyun 
@sunchao


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] limansky commented on pull request #361: Add 3.2.0 release note and news and update links

2021-10-19 Thread GitBox


limansky commented on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-946842912


   BTW, why it's the only build available for Scala 2.13?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] yaooqinn commented on pull request #363: Fix Spark 3.2.0 download URL

2021-10-19 Thread GitBox


yaooqinn commented on pull request #363:
URL: https://github.com/apache/spark-website/pull/363#issuecomment-946828748


   > We should really rename the maven profile name...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] limansky commented on pull request #361: Add 3.2.0 release note and news and update links

2021-10-19 Thread GitBox


limansky commented on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-946826339


   Hi, I've just found that there is no link for Hadoop 3.3 + Scala 2.13 build 
on download page.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] cloud-fan commented on pull request #363: Fix Spark 3.2.0 download URL

2021-10-19 Thread GitBox


cloud-fan commented on pull request #363:
URL: https://github.com/apache/spark-website/pull/363#issuecomment-946820514


   We should really rename the maven profile name...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang merged pull request #363: Fix Spark 3.2.0 download URL

2021-10-19 Thread GitBox


gengliangwang merged pull request #363:
URL: https://github.com/apache/spark-website/pull/363


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: Fix Spark 3.2.0 download URL (#363)

2021-10-19 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 920bafa  Fix Spark 3.2.0 download URL (#363)
920bafa is described below

commit 920bafabcbe4f2494085e2e223e3444d2f34e5f8
Author: Gengliang Wang 
AuthorDate: Tue Oct 19 23:07:00 2021 +0800

Fix Spark 3.2.0 download URL (#363)
---
 js/downloads.js  | 2 +-
 site/js/downloads.js | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/js/downloads.js b/js/downloads.js
index 6ab9447..57003bb 100644
--- a/js/downloads.js
+++ b/js/downloads.js
@@ -15,7 +15,7 @@ var sources = {pretty: "Source Code", tag: "sources"};
 var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: 
"without-hadoop"};
 var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"};
 var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"};
-var hadoop3p3 = {pretty: "Pre-built for Apache Hadoop 3.3 and later", tag: 
"hadoop3.3"};
+var hadoop3p3 = {pretty: "Pre-built for Apache Hadoop 3.3 and later", tag: 
"hadoop3.2"};
 var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and 
user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"};
 
 // 3.0.0+
diff --git a/site/js/downloads.js b/site/js/downloads.js
index 6ab9447..57003bb 100644
--- a/site/js/downloads.js
+++ b/site/js/downloads.js
@@ -15,7 +15,7 @@ var sources = {pretty: "Source Code", tag: "sources"};
 var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: 
"without-hadoop"};
 var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"};
 var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"};
-var hadoop3p3 = {pretty: "Pre-built for Apache Hadoop 3.3 and later", tag: 
"hadoop3.3"};
+var hadoop3p3 = {pretty: "Pre-built for Apache Hadoop 3.3 and later", tag: 
"hadoop3.2"};
 var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and 
user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"};
 
 // 3.0.0+

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang opened a new pull request #363: Fix Spark 3.2.0 download URL

2021-10-19 Thread GitBox


gengliangwang opened a new pull request #363:
URL: https://github.com/apache/spark-website/pull/363


   
   
   I made a mistake on the download URL of Spark 3.2.0. This PR is to fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang merged pull request #361: Add 3.2.0 release note and news and update links

2021-10-19 Thread GitBox


gengliangwang merged pull request #361:
URL: https://github.com/apache/spark-website/pull/361


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang commented on pull request #361: Add 3.2.0 release note and news and update links

2021-10-19 Thread GitBox


gengliangwang commented on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-946762007


   I am merging this one. Thanks for the great suggestions, everyone!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (db89320 -> 3849340)

2021-10-19 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from db89320  [SPARK-37057][INFRA] Fix wrong DocSearch facet filter in 
release-tag.sh
 add 3849340  [SPARK-36796][BUILD][CORE][SQL] Pass all `sql/core` and 
dependent modules UTs with JDK 17 except one case in `postgreSQL/text.sql`

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/SparkContext.scala | 19 +
 .../apache/spark/launcher/JavaModuleOptions.java   | 47 ++
 .../spark/launcher/SparkSubmitCommandBuilder.java  |  2 +
 pom.xml| 20 -
 project/SparkBuild.scala   | 14 ++-
 sql/catalyst/pom.xml   |  2 +-
 sql/core/pom.xml   |  2 +-
 7 files changed, 101 insertions(+), 5 deletions(-)
 create mode 100644 
launcher/src/main/java/org/apache/spark/launcher/JavaModuleOptions.java

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [SPARK-37057][INFRA] Fix wrong DocSearch facet filter in release-tag.sh

2021-10-19 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 9c029fd  [SPARK-37057][INFRA] Fix wrong DocSearch facet filter in 
release-tag.sh
9c029fd is described below

commit 9c029fd5a4cd3dcf7159a2a4272ad6ed629a8596
Author: Gengliang Wang 
AuthorDate: Tue Oct 19 16:52:26 2021 +0800

[SPARK-37057][INFRA] Fix wrong DocSearch facet filter in release-tag.sh

### What changes were proposed in this pull request?

In release-tag.sh, the DocSearch facet filter should be updated as the 
release version before creating git tag.
Otherwise, the facet filter would be wrong in the new release doc: 
https://github.com/apache/spark/blame/v3.2.0/docs/_config.yml#L42

### Why are the changes needed?

Fix a bug in release-tag.sh

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manual test

Closes #34328 from gengliangwang/fixFacetFilters.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit db893207ba444a303b1915afeb90b82ef3808cf8)
Signed-off-by: Gengliang Wang 
---
 dev/create-release/release-tag.sh | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/dev/create-release/release-tag.sh 
b/dev/create-release/release-tag.sh
index d7e9bf2..55aa2e5 100755
--- a/dev/create-release/release-tag.sh
+++ b/dev/create-release/release-tag.sh
@@ -84,7 +84,8 @@ fi
 # Set the release version in docs
 sed -i".tmp1" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$RELEASE_VERSION"'/g' 
docs/_config.yml
 sed -i".tmp2" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$RELEASE_VERSION"'/g' docs/_config.yml
-sed -i".tmp3" 's/__version__ = .*$/__version__ = "'"$RELEASE_VERSION"'"/' 
python/pyspark/version.py
+sed -i".tmp3" "s/'facetFilters':.*$/'facetFilters': 
[\"version:$RELEASE_VERSION\"]/g" docs/_config.yml
+sed -i".tmp4" 's/__version__ = .*$/__version__ = "'"$RELEASE_VERSION"'"/' 
python/pyspark/version.py
 
 git commit -a -m "Preparing Spark release $RELEASE_TAG"
 echo "Creating tag $RELEASE_TAG at the head of $GIT_BRANCH"
@@ -94,18 +95,18 @@ git tag $RELEASE_TAG
 $MVN versions:set -DnewVersion=$NEXT_VERSION | grep -v "no value" # silence 
logs
 # Remove -SNAPSHOT before setting the R version as R expects version strings 
to only have numbers
 R_NEXT_VERSION=`echo $NEXT_VERSION | sed 's/-SNAPSHOT//g'`
-sed -i".tmp4" 's/Version.*$/Version: '"$R_NEXT_VERSION"'/g' R/pkg/DESCRIPTION
+sed -i".tmp5" 's/Version.*$/Version: '"$R_NEXT_VERSION"'/g' R/pkg/DESCRIPTION
 # Write out the R_NEXT_VERSION to PySpark version info we use dev0 instead of 
SNAPSHOT to be closer
 # to PEP440.
-sed -i".tmp5" 's/__version__ = .*$/__version__ = "'"$R_NEXT_VERSION.dev0"'"/' 
python/pyspark/version.py
+sed -i".tmp6" 's/__version__ = .*$/__version__ = "'"$R_NEXT_VERSION.dev0"'"/' 
python/pyspark/version.py
 
 
 # Update docs with next version
-sed -i".tmp6" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' 
docs/_config.yml
+sed -i".tmp7" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' 
docs/_config.yml
 # Use R version for short version
-sed -i".tmp7" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$R_NEXT_VERSION"'/g' docs/_config.yml
+sed -i".tmp8" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$R_NEXT_VERSION"'/g' docs/_config.yml
 # Update the version index of DocSearch as the short version
-sed -i".tmp8" "s/'facetFilters':.*$/'facetFilters': 
[\"version:$R_NEXT_VERSION\"]/g" docs/_config.yml
+sed -i".tmp9" "s/'facetFilters':.*$/'facetFilters': 
[\"version:$R_NEXT_VERSION\"]/g" docs/_config.yml
 
 git commit -a -m "Preparing development version $NEXT_VERSION"
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-37057][INFRA] Fix wrong DocSearch facet filter in release-tag.sh

2021-10-19 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new db89320  [SPARK-37057][INFRA] Fix wrong DocSearch facet filter in 
release-tag.sh
db89320 is described below

commit db893207ba444a303b1915afeb90b82ef3808cf8
Author: Gengliang Wang 
AuthorDate: Tue Oct 19 16:52:26 2021 +0800

[SPARK-37057][INFRA] Fix wrong DocSearch facet filter in release-tag.sh

### What changes were proposed in this pull request?

In release-tag.sh, the DocSearch facet filter should be updated as the 
release version before creating git tag.
Otherwise, the facet filter would be wrong in the new release doc: 
https://github.com/apache/spark/blame/v3.2.0/docs/_config.yml#L42

### Why are the changes needed?

Fix a bug in release-tag.sh

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manual test

Closes #34328 from gengliangwang/fixFacetFilters.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 dev/create-release/release-tag.sh | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/dev/create-release/release-tag.sh 
b/dev/create-release/release-tag.sh
index d7e9bf2..55aa2e5 100755
--- a/dev/create-release/release-tag.sh
+++ b/dev/create-release/release-tag.sh
@@ -84,7 +84,8 @@ fi
 # Set the release version in docs
 sed -i".tmp1" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$RELEASE_VERSION"'/g' 
docs/_config.yml
 sed -i".tmp2" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$RELEASE_VERSION"'/g' docs/_config.yml
-sed -i".tmp3" 's/__version__ = .*$/__version__ = "'"$RELEASE_VERSION"'"/' 
python/pyspark/version.py
+sed -i".tmp3" "s/'facetFilters':.*$/'facetFilters': 
[\"version:$RELEASE_VERSION\"]/g" docs/_config.yml
+sed -i".tmp4" 's/__version__ = .*$/__version__ = "'"$RELEASE_VERSION"'"/' 
python/pyspark/version.py
 
 git commit -a -m "Preparing Spark release $RELEASE_TAG"
 echo "Creating tag $RELEASE_TAG at the head of $GIT_BRANCH"
@@ -94,18 +95,18 @@ git tag $RELEASE_TAG
 $MVN versions:set -DnewVersion=$NEXT_VERSION | grep -v "no value" # silence 
logs
 # Remove -SNAPSHOT before setting the R version as R expects version strings 
to only have numbers
 R_NEXT_VERSION=`echo $NEXT_VERSION | sed 's/-SNAPSHOT//g'`
-sed -i".tmp4" 's/Version.*$/Version: '"$R_NEXT_VERSION"'/g' R/pkg/DESCRIPTION
+sed -i".tmp5" 's/Version.*$/Version: '"$R_NEXT_VERSION"'/g' R/pkg/DESCRIPTION
 # Write out the R_NEXT_VERSION to PySpark version info we use dev0 instead of 
SNAPSHOT to be closer
 # to PEP440.
-sed -i".tmp5" 's/__version__ = .*$/__version__ = "'"$R_NEXT_VERSION.dev0"'"/' 
python/pyspark/version.py
+sed -i".tmp6" 's/__version__ = .*$/__version__ = "'"$R_NEXT_VERSION.dev0"'"/' 
python/pyspark/version.py
 
 
 # Update docs with next version
-sed -i".tmp6" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' 
docs/_config.yml
+sed -i".tmp7" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' 
docs/_config.yml
 # Use R version for short version
-sed -i".tmp7" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$R_NEXT_VERSION"'/g' docs/_config.yml
+sed -i".tmp8" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$R_NEXT_VERSION"'/g' docs/_config.yml
 # Update the version index of DocSearch as the short version
-sed -i".tmp8" "s/'facetFilters':.*$/'facetFilters': 
[\"version:$R_NEXT_VERSION\"]/g" docs/_config.yml
+sed -i".tmp9" "s/'facetFilters':.*$/'facetFilters': 
[\"version:$R_NEXT_VERSION\"]/g" docs/_config.yml
 
 git commit -a -m "Preparing development version $NEXT_VERSION"
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-37017][SQL] Reduce the scope of synchronized to prevent potential deadlock

2021-10-19 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 875963a  [SPARK-37017][SQL] Reduce the scope of synchronized to 
prevent potential deadlock
875963a is described below

commit 875963a28a75532871010fcdcdb916bf093dab34
Author: chenzhx 
AuthorDate: Tue Oct 19 14:48:32 2021 +0800

[SPARK-37017][SQL] Reduce the scope of synchronized to prevent potential 
deadlock

### What changes were proposed in this pull request?

There is a `synchronize` block in `CatalogManager.currentNamespace` 
function.
This PR pulls `SessionCatalog.getCurrentDatabase` out from this 
`synchronize` block to prevent potential deadlock.

### Why are the changes needed?

In our case, we have implemented an external catalog, and there is a thread 
that directly calls SessionCatalog.getTempViewOrPermanentTableMetadata and 
holds the lock of SessionCatalog. It eventually goes into our external catalog, 
unfortunately, we then call some functions of SparkSession, e.g. sql. When it 
calls CatalogManager.currentNamespace, it tries to hold the lock of 
CatalogManager.

In the meantime, there are some query threads that execute sqls via 
DataFrame interface.

This is how deadlock occurs.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

use existed test.

Closes #34292 from chenzhx/bug-fix.

Authored-by: chenzhx 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/connector/catalog/CatalogManager.scala | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala
index 7d8bc4f..0380621 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala
@@ -89,12 +89,16 @@ class CatalogManager(
 
   private var _currentNamespace: Option[Array[String]] = None
 
-  def currentNamespace: Array[String] = synchronized {
-_currentNamespace.getOrElse {
-  if (currentCatalog.name() == SESSION_CATALOG_NAME) {
-Array(v1SessionCatalog.getCurrentDatabase)
-  } else {
-currentCatalog.defaultNamespace()
+  def currentNamespace: Array[String] = {
+val defaultNamespace = if (currentCatalog.name() == SESSION_CATALOG_NAME) {
+  Array(v1SessionCatalog.getCurrentDatabase)
+} else {
+  currentCatalog.defaultNamespace()
+}
+
+this.synchronized {
+  _currentNamespace.getOrElse {
+defaultNamespace
   }
 }
   }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [SPARK-37052][CORE] Spark should only pass --verbose argument to main class when is sql shell

2021-10-19 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 34086b0  [SPARK-37052][CORE] Spark should only pass --verbose argument 
to main class when is sql shell
34086b0 is described below

commit 34086b0c4dff77deaa3e381e63bf4597f100166e
Author: Angerszh 
AuthorDate: Tue Oct 19 14:40:52 2021 +0800

[SPARK-37052][CORE] Spark should only pass --verbose argument to main class 
when is sql shell

### What changes were proposed in this pull request?
In https://github.com/apache/spark/pull/32163 spark pass `--verbose` to 
main class o support spark-sql shell can use verbose argument too.
But for other shell main class such as saprk-shell, it's intercepter don't 
support `--verbose`, so we should only pass `--verbose` for sql shell

### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Closes #34322 from AngersZh/SPARK-37052.

Authored-by: Angerszh 
Signed-off-by: Wenchen Fan 
(cherry picked from commit a6d3a2c84e5cdc642ed57602612f0303585c4b6e)
Signed-off-by: Wenchen Fan 
---
 core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index 8124650..67a601b 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -853,7 +853,7 @@ private[spark] class SparkSubmit extends Logging {
 }
 sparkConf.set(SUBMIT_PYTHON_FILES, formattedPyFiles.split(",").toSeq)
 
-if (args.verbose) {
+if (args.verbose && isSqlShell(childMainClass)) {
   childArgs ++= Seq("--verbose")
 }
 (childArgs.toSeq, childClasspath.toSeq, sparkConf, childMainClass)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (ebca5232 -> a6d3a2c)

2021-10-19 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ebca5232 [SPARK-36871][SQL][FOLLOWUP] Move error checking from create 
cmd to parser
 add a6d3a2c  [SPARK-37052][CORE] Spark should only pass --verbose argument 
to main class when is sql shell

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36871][SQL][FOLLOWUP] Move error checking from create cmd to parser

2021-10-19 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ebca5232 [SPARK-36871][SQL][FOLLOWUP] Move error checking from create 
cmd to parser
ebca5232 is described below

commit ebca5232811cb0701d4062ac7ddc21fccc936490
Author: Huaxin Gao 
AuthorDate: Tue Oct 19 14:38:49 2021 +0800

[SPARK-36871][SQL][FOLLOWUP] Move error checking from create cmd to parser

### What changes were proposed in this pull request?
Move error checking from create cmd to parser

### Why are the changes needed?
catch error earlier and also make code consistent between parsing 
CreateFunction and CreateView

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests

Closes #34283 from huaxingao/create_view_followup.

Authored-by: Huaxin Gao 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/errors/QueryCompilationErrors.scala  | 26 ---
 .../spark/sql/errors/QueryParsingErrors.scala  | 34 +++
 .../spark/sql/execution/SparkSqlParser.scala   | 39 +++---
 .../spark/sql/execution/command/functions.scala|  9 -
 .../apache/spark/sql/execution/command/views.scala | 21 +---
 5 files changed, 69 insertions(+), 60 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 385e6b7..eb8985d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -1849,19 +1849,6 @@ object QueryCompilationErrors {
 new AnalysisException("Cannot overwrite a path that is also being read 
from.")
   }
 
-  def createFuncWithBothIfNotExistsAndReplaceError(): Throwable = {
-new AnalysisException("CREATE FUNCTION with both IF NOT EXISTS and REPLACE 
is not allowed.")
-  }
-
-  def defineTempFuncWithIfNotExistsError(): Throwable = {
-new AnalysisException("It is not allowed to define a TEMPORARY function 
with IF NOT EXISTS.")
-  }
-
-  def specifyingDBInCreateTempFuncError(databaseName: String): Throwable = {
-new AnalysisException(
-  s"Specifying a database in CREATE TEMPORARY FUNCTION is not allowed: 
'$databaseName'")
-  }
-
   def specifyingDBInDropTempFuncError(databaseName: String): Throwable = {
 new AnalysisException(
   s"Specifying a database in DROP TEMPORARY FUNCTION is not allowed: 
'$databaseName'")
@@ -2011,19 +1998,6 @@ object QueryCompilationErrors {
 features.map(" - " + _).mkString("\n"))
   }
 
-  def createViewWithBothIfNotExistsAndReplaceError(): Throwable = {
-new AnalysisException("CREATE VIEW with both IF NOT EXISTS and REPLACE is 
not allowed.")
-  }
-
-  def defineTempViewWithIfNotExistsError(): Throwable = {
-new AnalysisException("It is not allowed to define a TEMPORARY view with 
IF NOT EXISTS.")
-  }
-
-  def notAllowedToAddDBPrefixForTempViewError(database: String): Throwable = {
-new AnalysisException(
-  s"It is not allowed to add database prefix `$database` for the TEMPORARY 
view name.")
-  }
-
   def logicalPlanForViewNotAnalyzedError(): Throwable = {
 new AnalysisException("The logical plan that represents the view is not 
analyzed.")
   }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
index 3af63f1..090f73d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
@@ -391,4 +391,38 @@ object QueryParsingErrors {
   def invalidGroupingSetError(element: String, ctx: GroupingAnalyticsContext): 
Throwable = {
 new ParseException(s"Empty set in $element grouping sets is not 
supported.", ctx)
   }
+
+  def createViewWithBothIfNotExistsAndReplaceError(ctx: CreateViewContext): 
Throwable = {
+new ParseException("CREATE VIEW with both IF NOT EXISTS and REPLACE is not 
allowed.", ctx)
+  }
+
+  def defineTempViewWithIfNotExistsError(ctx: CreateViewContext): Throwable = {
+new ParseException("It is not allowed to define a TEMPORARY view with IF 
NOT EXISTS.", ctx)
+  }
+
+  def notAllowedToAddDBPrefixForTempViewError(
+  database: String,
+  ctx: CreateViewContext): Throwable = {
+new ParseException(
+  s"It is not allowed to add database prefix `$database` for the TEMPORARY 
view name.", ctx)
+  }
+
+  def createFuncWithBothIfNotExistsAndReplaceError(ctx: 
CreateFunctionContext): Throwable = {
+new ParseException("CREATE FUNCTION with both IF NOT 

[GitHub] [spark-website] gengliangwang commented on pull request #361: Add 3.2.0 release note and news and update links

2021-10-19 Thread GitBox


gengliangwang commented on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-946400783


   Thanks all for the reviews. I will keep this open for a few more hours.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang merged pull request #362: Update DocSearch facet filter of 3.2.0 documentation

2021-10-19 Thread GitBox


gengliangwang merged pull request #362:
URL: https://github.com/apache/spark-website/pull/362


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: update facetFilters (#362)

2021-10-19 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new c734972  update facetFilters (#362)
c734972 is described below

commit c73497265d34e097f46d14976c8513aa3c90939c
Author: Gengliang Wang 
AuthorDate: Tue Oct 19 14:13:40 2021 +0800

update facetFilters (#362)

There is a bug in updating the DocSearch facet filter on 
https://github.com/apache/spark/blob/master/dev/create-release/release-tag.sh. 
The version is not updated as the release version.

This PR is to fix the search function of 
https://spark.apache.org/docs/3.2.0/.
---
 site/docs/3.2.0/building-spark.html | 2 +-
 site/docs/3.2.0/cloud-integration.html  | 2 +-
 site/docs/3.2.0/cluster-overview.html   | 2 +-
 site/docs/3.2.0/configuration.html  | 2 +-
 site/docs/3.2.0/core-migration-guide.html   | 2 +-
 site/docs/3.2.0/graphx-programming-guide.html   | 2 +-
 site/docs/3.2.0/hadoop-provided.html| 2 +-
 site/docs/3.2.0/hardware-provisioning.html  | 2 +-
 site/docs/3.2.0/index.html  | 2 +-
 site/docs/3.2.0/job-scheduling.html | 2 +-
 site/docs/3.2.0/migration-guide.html| 2 +-
 site/docs/3.2.0/ml-advanced.html| 2 +-
 site/docs/3.2.0/ml-ann.html | 2 +-
 site/docs/3.2.0/ml-classification-regression.html   | 2 +-
 site/docs/3.2.0/ml-clustering.html  | 2 +-
 site/docs/3.2.0/ml-collaborative-filtering.html | 2 +-
 site/docs/3.2.0/ml-datasource.html  | 2 +-
 site/docs/3.2.0/ml-decision-tree.html   | 2 +-
 site/docs/3.2.0/ml-ensembles.html   | 2 +-
 site/docs/3.2.0/ml-features.html| 2 +-
 site/docs/3.2.0/ml-frequent-pattern-mining.html | 2 +-
 site/docs/3.2.0/ml-guide.html   | 2 +-
 site/docs/3.2.0/ml-linalg-guide.html| 2 +-
 site/docs/3.2.0/ml-linear-methods.html  | 2 +-
 site/docs/3.2.0/ml-migration-guide.html | 2 +-
 site/docs/3.2.0/ml-pipeline.html| 2 +-
 site/docs/3.2.0/ml-statistics.html  | 2 +-
 site/docs/3.2.0/ml-survival-regression.html | 2 +-
 site/docs/3.2.0/ml-tuning.html  | 2 +-
 site/docs/3.2.0/mllib-classification-regression.html| 2 +-
 site/docs/3.2.0/mllib-clustering.html   | 2 +-
 site/docs/3.2.0/mllib-collaborative-filtering.html  | 2 +-
 site/docs/3.2.0/mllib-data-types.html   | 2 +-
 site/docs/3.2.0/mllib-decision-tree.html| 2 +-
 site/docs/3.2.0/mllib-dimensionality-reduction.html | 2 +-
 site/docs/3.2.0/mllib-ensembles.html| 2 +-
 site/docs/3.2.0/mllib-evaluation-metrics.html   | 2 +-
 site/docs/3.2.0/mllib-feature-extraction.html   | 2 +-
 site/docs/3.2.0/mllib-frequent-pattern-mining.html  | 2 +-
 site/docs/3.2.0/mllib-guide.html| 2 +-
 site/docs/3.2.0/mllib-isotonic-regression.html  | 2 +-
 site/docs/3.2.0/mllib-linear-methods.html   | 2 +-
 site/docs/3.2.0/mllib-naive-bayes.html  | 2 +-
 site/docs/3.2.0/mllib-optimization.html | 2 +-
 site/docs/3.2.0/mllib-pmml-model-export.html| 2 +-
 site/docs/3.2.0/mllib-statistics.html   | 2 +-
 site/docs/3.2.0/monitoring.html | 2 +-
 site/docs/3.2.0/programming-guide.html  | 2 +-
 site/docs/3.2.0/pyspark-migration-guide.html| 2 +-
 site/docs/3.2.0/quick-start.html| 2 +-
 site/docs/3.2.0/rdd-programming-guide.html  | 2 +-
 site/docs/3.2.0/running-on-kubernetes.html  | 2 +-
 site/docs/3.2.0/running-on-mesos.html