[jira] [Created] (SPARK-47992) Support recursive descent path in get_json_object function

2024-04-25 Thread Qian Sun (Jira)
Qian Sun created SPARK-47992:


 Summary: Support recursive descent path in get_json_object function
 Key: SPARK-47992
 URL: https://issues.apache.org/jira/browse/SPARK-47992
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: Qian Sun


JSONPath borrows recursive descent syntax from E4X. We could use it to collect 
json object from json map string.
{code:java}
// json data
{"key1": {"b": {"c": "c1", "d": "d1", "e": "e1"}}}
{"key2": {"b": {"c": "c2", "d": "d2", "e": "e2"}}}

select get_json_object(data, '$..c'); -- [c1, c2]{code}
ref: https://goessner.net/articles/JsonPath/index.html#e2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47519) Support json_length function

2024-03-22 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829779#comment-17829779
 ] 

Qian Sun commented on SPARK-47519:
--

[~cloud_fan] [~smilegator] 

Excuse me, I look forward to your opinion. And I also see 
[https://github.com/apache/spark/pull/28167#issuecomment-614097511,] I wonder 
if _json_length_ function is needed by the apache spark community

> Support json_length function
> 
>
> Key: SPARK-47519
> URL: https://issues.apache.org/jira/browse/SPARK-47519
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Qian Sun
>Priority: Major
>
> At the moment, we don't support json_length built-in function in apache spark.
> This function is supported by
>  # presto: [https://prestodb.io/docs/current/functions/json.html#json_size]
>  # clickhouse: 
> [https://clickhouse.com/docs/en/sql-reference/functions/json-functions#jsonlengthjson-indices_or_keys]
>  # mysql: 
> [https://dev.mysql.com/doc/refman/8.0/en/json-attribute-functions.html#function_json-length]
>  
> *Definition*
> json_length(json_txt, path) - Return the length of a JSON array or a JSON 
> object.
> If the value does not exist or has a wrong type, {{0}} will be returned.
> Examples:
>  
> {code:java}
> SELECT json_length('{"x": {"a": 1, "b": 2}}', '$.x'); -- 2
> SELECT json_length('{"x": [1, 2, 3]}', '$.x'); -- 3
> SELECT json_length('{"x": {"a": 1, "b": 2}}', '$.x.a'); -- 1{code}
>  
> *The advantages:*
>  # skip parse phase, so its performance is better than 
> _size(get_json_object(json_txt, path)_
>  # it has more functions than _json_array_length_ and can be implemented in a 
> unified manner
>  # it allows naive users to directly get json length with a built-in json 
> function instead of UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47519) Support json_length function

2024-03-22 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-47519:
-
Description: 
At the moment, we don't support json_length built-in function in apache spark.

This function is supported by
 # presto: [https://prestodb.io/docs/current/functions/json.html#json_size]
 # clickhouse: 
[https://clickhouse.com/docs/en/sql-reference/functions/json-functions#jsonlengthjson-indices_or_keys]
 # mysql: 
[https://dev.mysql.com/doc/refman/8.0/en/json-attribute-functions.html#function_json-length]

 

*Definition*

json_length(json_txt, path) - Return the length of a JSON array or a JSON 
object.

If the value does not exist or has a wrong type, {{0}} will be returned.

Examples:

 
{code:java}
SELECT json_length('{"x": {"a": 1, "b": 2}}', '$.x'); -- 2
SELECT json_length('{"x": [1, 2, 3]}', '$.x'); -- 3
SELECT json_length('{"x": {"a": 1, "b": 2}}', '$.x.a'); -- 1{code}
 

*The advantages:*
 # skip parse phase, so its performance is better than 
_size(get_json_object(json_txt, path)_
 # it has more functions than _json_array_length_ and can be implemented in a 
unified manner
 # it allows naive users to directly get json length with a built-in json 
function instead of UDF

  was:
At the moment, we don't support json_length built-in function in apache spark.

This function is supported by
 # presto: [https://prestodb.io/docs/current/functions/json.html#json_size]
 # clickhouse: 
[https://clickhouse.com/docs/en/sql-reference/functions/json-functions#jsonlengthjson-indices_or_keys]

This allows naive users to directly get json length with a built-in json 
function.


> Support json_length function
> 
>
> Key: SPARK-47519
> URL: https://issues.apache.org/jira/browse/SPARK-47519
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Qian Sun
>Priority: Major
>
> At the moment, we don't support json_length built-in function in apache spark.
> This function is supported by
>  # presto: [https://prestodb.io/docs/current/functions/json.html#json_size]
>  # clickhouse: 
> [https://clickhouse.com/docs/en/sql-reference/functions/json-functions#jsonlengthjson-indices_or_keys]
>  # mysql: 
> [https://dev.mysql.com/doc/refman/8.0/en/json-attribute-functions.html#function_json-length]
>  
> *Definition*
> json_length(json_txt, path) - Return the length of a JSON array or a JSON 
> object.
> If the value does not exist or has a wrong type, {{0}} will be returned.
> Examples:
>  
> {code:java}
> SELECT json_length('{"x": {"a": 1, "b": 2}}', '$.x'); -- 2
> SELECT json_length('{"x": [1, 2, 3]}', '$.x'); -- 3
> SELECT json_length('{"x": {"a": 1, "b": 2}}', '$.x.a'); -- 1{code}
>  
> *The advantages:*
>  # skip parse phase, so its performance is better than 
> _size(get_json_object(json_txt, path)_
>  # it has more functions than _json_array_length_ and can be implemented in a 
> unified manner
>  # it allows naive users to directly get json length with a built-in json 
> function instead of UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47519) Support json_length function

2024-03-22 Thread Qian Sun (Jira)
Qian Sun created SPARK-47519:


 Summary: Support json_length function
 Key: SPARK-47519
 URL: https://issues.apache.org/jira/browse/SPARK-47519
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: Qian Sun


At the moment, we don't support json_length built-in function in apache spark.

This function is supported by
 # presto: [https://prestodb.io/docs/current/functions/json.html#json_size]
 # clickhouse: 
[https://clickhouse.com/docs/en/sql-reference/functions/json-functions#jsonlengthjson-indices_or_keys]

This allows naive users to directly get json length with a built-in json 
function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3

2023-11-30 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791534#comment-17791534
 ] 

Qian Sun edited comment on SPARK-44573 at 11/30/23 9:48 AM:


Did you bind role with your serviceaccount? 

ref: [https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac]

cc [~dongjoon] 


was (Author: dcoliversun):
Did you bind role with your serviceaccount? 

ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac

> Couldn't submit Spark application to Kubenetes in versions v1.27.3
> --
>
> Key: SPARK-44573
> URL: https://issues.apache.org/jira/browse/SPARK-44573
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.4.1
>Reporter: Siddaraju G C
>Priority: Blocker
>
> Spark-submit ( cluster mode on Kubernetes ) results error 
> *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s 
> cluster.
> Steps followed:
>  * using IBM cloud, created 3 Instances
>  * 1st Instance act as master node and another two acts as worker nodes
>  
> {noformat}
> root@vsi-spark-master:/opt# kubectl get nodes
> NAME                 STATUS   ROLES                  AGE   VERSION
> vsi-spark-master     Ready    control-plane,master   2d    v1.27.3+k3s1
> vsi-spark-worker-1   Ready                     47h   v1.27.3+k3s1
> vsi-spark-worker-2   Ready                     47h   
> v1.27.3+k3s1{noformat}
>  * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder 
>  * Ran spark by using below command
>  
> {noformat}
> root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master 
> k8s://http://:6443 --conf 
> spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode 
> cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf 
> spark.executor.instances=5 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark  --conf 
> spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB 
> local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat}
>  * And getting below error message.
> {noformat}
> 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
> 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified 
> a krb5.conf file locally or via a ConfigMap. Make sure that you have the 
> krb5.conf locally on the driver image.
> 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" 
> first. It should be yes.
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93)
>     at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244)
>     at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216)
>     at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>     at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.IOException: Connection reset
>     at 
> 

[jira] [Commented] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3

2023-11-30 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791534#comment-17791534
 ] 

Qian Sun commented on SPARK-44573:
--

Did you bind role with your serviceaccount? 

ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac

> Couldn't submit Spark application to Kubenetes in versions v1.27.3
> --
>
> Key: SPARK-44573
> URL: https://issues.apache.org/jira/browse/SPARK-44573
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.4.1
>Reporter: Siddaraju G C
>Priority: Blocker
>
> Spark-submit ( cluster mode on Kubernetes ) results error 
> *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s 
> cluster.
> Steps followed:
>  * using IBM cloud, created 3 Instances
>  * 1st Instance act as master node and another two acts as worker nodes
>  
> {noformat}
> root@vsi-spark-master:/opt# kubectl get nodes
> NAME                 STATUS   ROLES                  AGE   VERSION
> vsi-spark-master     Ready    control-plane,master   2d    v1.27.3+k3s1
> vsi-spark-worker-1   Ready                     47h   v1.27.3+k3s1
> vsi-spark-worker-2   Ready                     47h   
> v1.27.3+k3s1{noformat}
>  * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder 
>  * Ran spark by using below command
>  
> {noformat}
> root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master 
> k8s://http://:6443 --conf 
> spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode 
> cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf 
> spark.executor.instances=5 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark  --conf 
> spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB 
> local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat}
>  * And getting below error message.
> {noformat}
> 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
> 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified 
> a krb5.conf file locally or via a ConfigMap. Make sure that you have the 
> krb5.conf locally on the driver image.
> 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" 
> first. It should be yes.
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
>     at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93)
>     at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244)
>     at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244)
>     at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216)
>     at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>     at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.IOException: Connection reset
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535)
>     at 
> io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558)
>     at 

[jira] [Updated] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg in spark3.5.0 website

2023-11-30 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-46183:
-
Summary: Incorrect path for spark-hero-thin-light.jpg in spark3.5.0 website 
 (was: Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website)

> Incorrect path for spark-hero-thin-light.jpg in spark3.5.0 website
> --
>
> Key: SPARK-46183
> URL: https://issues.apache.org/jira/browse/SPARK-46183
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Qian Sun
>Priority: Minor
> Attachments: network.png
>
>
> When I visit [https://spark.apache.org/docs/3.5.0/,] 
> spark-hero-thin-light.jpg is not found caused by 
> [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,]
>  the path should be ../images/spark-hero-thin-light.jpg



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website

2023-11-30 Thread Qian Sun (Jira)
Qian Sun created SPARK-46183:


 Summary: Incorrect path for spark-hero-thin-light.jpg for 
spark3.5.0 website
 Key: SPARK-46183
 URL: https://issues.apache.org/jira/browse/SPARK-46183
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.5.0
Reporter: Qian Sun


When I visit [https://spark.apache.org/docs/3.5.0/,] spark-hero-thin-light.jpg 
is not found caused by 
[https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,]
 the path should be ../images/spark-hero-thin-light.jpg



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website

2023-11-30 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-46183:
-
Attachment: network.png

> Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website
> ---
>
> Key: SPARK-46183
> URL: https://issues.apache.org/jira/browse/SPARK-46183
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Qian Sun
>Priority: Minor
> Attachments: network.png
>
>
> When I visit [https://spark.apache.org/docs/3.5.0/,] 
> spark-hero-thin-light.jpg is not found caused by 
> [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,]
>  the path should be ../images/spark-hero-thin-light.jpg



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website

2023-11-30 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-46183:
-
Attachment: (was: 
L1VzZXJzL2hlbmd6aGVuLnNxL0xpYnJhcnkvQXBwbGljYXRpb24gU3VwcG9ydC9pRGluZ1RhbGsvNDUyMDQ5NjgwX3YyL0ltYWdlRmlsZXMvMTcwMTMzNjk5MjkzNF81QjRENEU2RC1FNUM2LTQxNEQtOERGRS0wOTIxRUUzMjY2OTcucG5n.png)

> Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website
> ---
>
> Key: SPARK-46183
> URL: https://issues.apache.org/jira/browse/SPARK-46183
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Qian Sun
>Priority: Minor
>
> When I visit [https://spark.apache.org/docs/3.5.0/,] 
> spark-hero-thin-light.jpg is not found caused by 
> [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,]
>  the path should be ../images/spark-hero-thin-light.jpg



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45175) download krb5.conf from remote storage in spark-submit on k8s

2023-09-22 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767908#comment-17767908
 ] 

Qian Sun commented on SPARK-45175:
--

In multi-tenant scenarios, I find Apache Spark provide 
*{{spark.kubernetes.kerberos.krb5.configMapName}}* to mount ConfigMap 
containing the {{*krb5.conf*}} file, we could manage these files by creating 
multiple configMaps for multi-tenants.

> download krb5.conf from remote storage in spark-submit on k8s
> -
>
> Key: SPARK-45175
> URL: https://issues.apache.org/jira/browse/SPARK-45175
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.1
>Reporter: Qian Sun
>Priority: Minor
>  Labels: pull-request-available
>
> krb5.conf currently only supports the local file format. Tenants would like 
> to save this file on their own servers and download it during the 
> spark-submit phase for better implementation of multi-tenant scenarios. The 
> proposed solution is to use the *downloadFile*  function[1], similar to the 
> configuration of *spark.kubernetes.driver/executor.podTemplateFile*
>  
> [1]https://github.com/apache/spark/blob/822f58f0d26b7d760469151a65eaf9ee863a07a1/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala#L82C24-L82C24



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43182) Mutilple tables join with limit when AE is enabled and one table is skewed

2023-09-19 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766690#comment-17766690
 ] 

Qian Sun edited comment on SPARK-43182 at 9/19/23 8:14 AM:
---

Hi [~Resol1992]

I ran your sql, tried different configuration combinations and believe 
regression caused by *spark.sql.adaptive.forceOptimizeSkewedJoin* , which 
introduces 
extra shuffles. AQE can give up skewJoin Optimization if extra shuffle 
introduced when *spark.sql.adaptive.forceOptimizeSkewedJoin* is false. cc 
[~cloud_fan] 

 

ref: 

[https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L225-L229]


was (Author: dcoliversun):
Hi [~Resol1992]

I ran your sql, tried different configuration combinations and believe 
regression caused by *spark.sql.adaptive.forceOptimizeSkewedJoin* , which 
introduces 
extra shuffles. AQE can give up skewJoin Optimization if extra shuffle 
introduced when *spark.sql.adaptive.forceOptimizeSkewedJoin* is false. cc 
[~cloud_fan]  * 
https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L225-L229

> Mutilple tables join with limit when AE is enabled and one table is skewed
> --
>
> Key: SPARK-43182
> URL: https://issues.apache.org/jira/browse/SPARK-43182
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Liu Shuo
>Priority: Critical
> Attachments: part-m-0.zip, part-m-1.zip, part-m-2.zip, 
> part-m-3.zip, part-m-4.zip, part-m-5.zip, part-m-6.zip, 
> part-m-7.zip, part-m-8.zip, part-m-9.zip, part-m-00010.zip, 
> part-m-00011.zip, part-m-00012.zip, part-m-00013.zip, part-m-00014.zip, 
> part-m-00015.zip, part-m-00016.zip, part-m-00017.zip, part-m-00018.zip, 
> part-m-00019.zip
>
>
> When we test AE in Spark3.4.0 with the following case, we find If we disable 
> AE or enable Ae but disable skewJoin, the sql will finish in 20s, but if we 
> enable AE and enable skewJoin,it will take very long time.
> The test case:
> {code:java}
> ###uncompress the part-m-***.zip attachment, and put these files under 
> '/tmp/spark-warehouse/data/' dir.
> create table source_aqe(c1 int,c18 string) using csv options(path 
> 'file:///tmp/spark-warehouse/data/');
> create table hive_snappy_aqe_table1(c1 int)stored as PARQUET partitioned 
> by(c18 string); 
> insert into table hive_snappy_aqe_table1 partition(c18=1)select c1 from 
> source_aqe;
> insert into table hive_snappy_aqe_table1 partition(c18=2)select c1 from 
> source_aqe limit 12;
> insert into table hive_snappy_aqe_table1 partition(c18=3)select c1 from 
> source_aqe limit 15;create table hive_snappy_aqe_table2(c1 int)stored as 
> PARQUET partitioned by(c18 string); 
> insert into table hive_snappy_aqe_table2 partition(c18=1)select c1 from 
> source_aqe limit 16;
> insert into table hive_snappy_aqe_table2 partition(c18=2)select c1 from 
> source_aqe limit 12;create table hive_snappy_aqe_table3(c1 int)stored as 
> PARQUET partitioned by(c18 string); 
> insert into table hive_snappy_aqe_table3 partition(c18=1)select c1 from 
> source_aqe limit 16;
> insert into table hive_snappy_aqe_table3 partition(c18=2)select c1 from 
> source_aqe limit 12;
> set spark.sql.adaptive.enabled=false;
> set spark.sql.adaptive.forceOptimizeSkewedJoin = false;
> set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1;
> set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB;
> set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB;
> set spark.sql.autoBroadcastJoinThreshold = 51200;
>  
> ###it will finish in 20s 
> select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join 
> hive_snappy_aqe_table3 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10;
> set spark.sql.adaptive.enabled=true;
> set spark.sql.adaptive.forceOptimizeSkewedJoin = true;
> set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1;
> set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB;
> set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB;
> set spark.sql.autoBroadcastJoinThreshold = 51200;
> ###it will take very long time 
> select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join 
> hive_snappy_aqe_table3 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To 

[jira] [Commented] (SPARK-43182) Mutilple tables join with limit when AE is enabled and one table is skewed

2023-09-19 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766690#comment-17766690
 ] 

Qian Sun commented on SPARK-43182:
--

Hi [~Resol1992]

I ran your sql, tried different configuration combinations and believe 
regression caused by *spark.sql.adaptive.forceOptimizeSkewedJoin* , which 
introduces 
extra shuffles. AQE can give up skewJoin Optimization if extra shuffle 
introduced when *spark.sql.adaptive.forceOptimizeSkewedJoin* is false. cc 
[~cloud_fan]  * 
https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L225-L229

> Mutilple tables join with limit when AE is enabled and one table is skewed
> --
>
> Key: SPARK-43182
> URL: https://issues.apache.org/jira/browse/SPARK-43182
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Liu Shuo
>Priority: Critical
> Attachments: part-m-0.zip, part-m-1.zip, part-m-2.zip, 
> part-m-3.zip, part-m-4.zip, part-m-5.zip, part-m-6.zip, 
> part-m-7.zip, part-m-8.zip, part-m-9.zip, part-m-00010.zip, 
> part-m-00011.zip, part-m-00012.zip, part-m-00013.zip, part-m-00014.zip, 
> part-m-00015.zip, part-m-00016.zip, part-m-00017.zip, part-m-00018.zip, 
> part-m-00019.zip
>
>
> When we test AE in Spark3.4.0 with the following case, we find If we disable 
> AE or enable Ae but disable skewJoin, the sql will finish in 20s, but if we 
> enable AE and enable skewJoin,it will take very long time.
> The test case:
> {code:java}
> ###uncompress the part-m-***.zip attachment, and put these files under 
> '/tmp/spark-warehouse/data/' dir.
> create table source_aqe(c1 int,c18 string) using csv options(path 
> 'file:///tmp/spark-warehouse/data/');
> create table hive_snappy_aqe_table1(c1 int)stored as PARQUET partitioned 
> by(c18 string); 
> insert into table hive_snappy_aqe_table1 partition(c18=1)select c1 from 
> source_aqe;
> insert into table hive_snappy_aqe_table1 partition(c18=2)select c1 from 
> source_aqe limit 12;
> insert into table hive_snappy_aqe_table1 partition(c18=3)select c1 from 
> source_aqe limit 15;create table hive_snappy_aqe_table2(c1 int)stored as 
> PARQUET partitioned by(c18 string); 
> insert into table hive_snappy_aqe_table2 partition(c18=1)select c1 from 
> source_aqe limit 16;
> insert into table hive_snappy_aqe_table2 partition(c18=2)select c1 from 
> source_aqe limit 12;create table hive_snappy_aqe_table3(c1 int)stored as 
> PARQUET partitioned by(c18 string); 
> insert into table hive_snappy_aqe_table3 partition(c18=1)select c1 from 
> source_aqe limit 16;
> insert into table hive_snappy_aqe_table3 partition(c18=2)select c1 from 
> source_aqe limit 12;
> set spark.sql.adaptive.enabled=false;
> set spark.sql.adaptive.forceOptimizeSkewedJoin = false;
> set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1;
> set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB;
> set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB;
> set spark.sql.autoBroadcastJoinThreshold = 51200;
>  
> ###it will finish in 20s 
> select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join 
> hive_snappy_aqe_table3 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10;
> set spark.sql.adaptive.enabled=true;
> set spark.sql.adaptive.forceOptimizeSkewedJoin = true;
> set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1;
> set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB;
> set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB;
> set spark.sql.autoBroadcastJoinThreshold = 51200;
> ###it will take very long time 
> select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join 
> hive_snappy_aqe_table3 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45208) Kubernetes Configuration in Spark Community Website doesn't have horizontal scrollbar

2023-09-18 Thread Qian Sun (Jira)
Qian Sun created SPARK-45208:


 Summary: Kubernetes Configuration in Spark Community Website 
doesn't have horizontal scrollbar
 Key: SPARK-45208
 URL: https://issues.apache.org/jira/browse/SPARK-45208
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.5.0
Reporter: Qian Sun


I find a recent issue with the official Spark documentation on the website. 
Specifically, the Kubernetes configuration lists on the right-hand side are not 
visible and doc doesn't have a horizontal scrollbar.

 
- [https://spark.apache.org/docs/3.5.0/running-on-kubernetes.html#configuration]
- [https://spark.apache.org/docs/3.4.1/running-on-kubernetes.html#configuration]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45175) download krb5.conf from remote storage in spark-submit on k8s

2023-09-15 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-45175:
-
Summary: download krb5.conf from remote storage in spark-submit on k8s  
(was: download krb5.conf from remote storage in spark-sumbit on k8s)

> download krb5.conf from remote storage in spark-submit on k8s
> -
>
> Key: SPARK-45175
> URL: https://issues.apache.org/jira/browse/SPARK-45175
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.1
>Reporter: Qian Sun
>Priority: Minor
>  Labels: pull-request-available
>
> krb5.conf currently only supports the local file format. Tenants would like 
> to save this file on their own servers and download it during the 
> spark-submit phase for better implementation of multi-tenant scenarios. The 
> proposed solution is to use the *downloadFile*  function[1], similar to the 
> configuration of *spark.kubernetes.driver/executor.podTemplateFile*
>  
> [1]https://github.com/apache/spark/blob/822f58f0d26b7d760469151a65eaf9ee863a07a1/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala#L82C24-L82C24



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45175) download krb5.conf from remote storage in spark-sumbit on k8s

2023-09-14 Thread Qian Sun (Jira)
Qian Sun created SPARK-45175:


 Summary: download krb5.conf from remote storage in spark-sumbit on 
k8s
 Key: SPARK-45175
 URL: https://issues.apache.org/jira/browse/SPARK-45175
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.4.1
Reporter: Qian Sun


krb5.conf currently only supports the local file format. Tenants would like to 
save this file on their own servers and download it during the spark-submit 
phase for better implementation of multi-tenant scenarios. The proposed 
solution is to use the *downloadFile*  function[1], similar to the 
configuration of *spark.kubernetes.driver/executor.podTemplateFile*

 

[1]https://github.com/apache/spark/blob/822f58f0d26b7d760469151a65eaf9ee863a07a1/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala#L82C24-L82C24



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-43342) Spark in Kubernetes mode throws IllegalArgumentException when using static PVC

2023-05-04 Thread Qian Sun (Jira)


[ https://issues.apache.org/jira/browse/SPARK-43342 ]


Qian Sun deleted comment on SPARK-43342:
--

was (Author: dcoliversun):
[~ofrenkel]

Hello, I tried to reproduce using the configuration you provided. There are 
some issues that I need to confirm with you:
 * When the driver and executor use the PVC with same claim name, can your 
executor start normally?
 * Did your run of spark-pi compute the value of pi?

Based on my tests, Spark 3.3 cannot start the executor properly and cannot 
compute the value of pi.

 

The logs I saw are as follows.
{code:java}
[kubernetes-executor-snapshots-subscribers-1] WARN  
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl  - 
Exception when notifying snapshot subscriber.
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST 
at: https://21.8.0.8:6443/api/v1/namespaces/test-ns/persistentvolumeclaims. 
Message: persistentvolumeclaims "a1pvc" already exists. Received status: 
Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, 
kind=persistentvolumeclaims, name=test, retryAfterSeconds=null, uid=null, 
additionalProperties={}), kind=Status, message=persistentvolumeclaims "test" 
already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, 
resourceVersion=null, selfLink=null, additionalProperties={}), 
reason=AlreadyExists, status=Failure, additionalProperties={}). {code}
I'm looking forward to any new feedback you have.

 

cc [~dongjoon] 

> Spark in Kubernetes mode throws IllegalArgumentException when using static PVC
> --
>
> Key: SPARK-43342
> URL: https://issues.apache.org/jira/browse/SPARK-43342
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Oleg Frenkel
>Priority: Blocker
>
> When using static PVC with Spark 3.4, spark PI example fails with the error 
> below. Previous versions of Spark worked well.
> {code:java}
> 23/04/26 13:22:02 INFO ExecutorPodsAllocator: Going to request 5 executors 
> from Kubernetes for ResourceProfile Id: 0, target: 5, known: 0, 
> sharedSlotFromPendingPods: 2147483647. 23/04/26 13:22:02 INFO 
> BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown 
> script 23/04/26 13:22:02 ERROR ExecutorPodsSnapshotsStoreImpl: Going to stop 
> due to IllegalArgumentException java.lang.IllegalArgumentException: PVC 
> ClaimName: a1pvc should contain OnDemand or SPARK_EXECUTOR_ID when requiring 
> multiple executors at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.checkPVCClaimName(MountVolumesFeatureStep.scala:135)
>  at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:75)
>  at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)    
>  at scala.collection.Iterator.foreach(Iterator.scala:943) at 
> scala.collection.Iterator.foreach$(Iterator.scala:943) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at 
> scala.collection.IterableLike.foreach(IterableLike.scala:74) at 
> scala.collection.IterableLike.foreach$(IterableLike.scala:73) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:56) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:286) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:279) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:58)
>  at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:35)
>  at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$5(KubernetesExecutorBuilder.scala:83)
>  at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)    
>  at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)   
>   at scala.collection.immutable.List.foldLeft(List.scala:91) at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:82)
>  at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:430)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)  
>    at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:417)
>  at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:370)
>  at 

[jira] [Commented] (SPARK-43329) driver and executors shared same Kubernetes PVC in Spark 3.4+

2023-05-04 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719622#comment-17719622
 ] 

Qian Sun commented on SPARK-43329:
--

[~dongjoon] this ticket is duplicated with SPARK-43342

> driver and executors shared same Kubernetes PVC in Spark 3.4+
> -
>
> Key: SPARK-43329
> URL: https://issues.apache.org/jira/browse/SPARK-43329
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: comet
>Priority: Major
>
> I able to shared same PVC for spark 3.3. but on Spark 3.4 onward. i get below 
> error.  I would like all the executors and driver to mount the same PVC. Is 
> this a bug ? I don't want to use SPARK_EXECUTOR_ID or OnDemand because 
> otherwise each of the executors will use an unique and separate PVC. 
>  
> Error message is "should contain OnDemand or SPARK_EXECUTOR_ID when requiring 
> multiple executors"
>  
> below is how I enabled it pvc in spark 3.3 and it works, but does not work in 
> Spark 3.4
> {code:sh}
> spark.kubernetes.driver.volumes.persistentVolumeClaim.rwxpvc.options.claimName=rwxpvc
>  
> --conf 
> spark.kubernetes.driver.volumes.persistentVolumeClaim.rwxpvc.mount.path=/opt/spark/work-dir
>  
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.rwxpvc.options.claimName=rwxpvc
>  
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.rwxpvc.mount.path=/opt/spark/work-dir
>  
>  
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43342) Spark in Kubernetes mode throws IllegalArgumentException when using static PVC

2023-05-04 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719162#comment-17719162
 ] 

Qian Sun commented on SPARK-43342:
--

[~ofrenkel]

Hello, I tried to reproduce using the configuration you provided. There are 
some issues that I need to confirm with you:
 * When the driver and executor use the PVC with same claim name, can your 
executor start normally?
 * Did your run of spark-pi compute the value of pi?

Based on my tests, Spark 3.3 cannot start the executor properly and cannot 
compute the value of pi.

 

The logs I saw are as follows.
{code:java}
[kubernetes-executor-snapshots-subscribers-1] WARN  
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl  - 
Exception when notifying snapshot subscriber.
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST 
at: https://21.8.0.8:6443/api/v1/namespaces/test-ns/persistentvolumeclaims. 
Message: persistentvolumeclaims "a1pvc" already exists. Received status: 
Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, 
kind=persistentvolumeclaims, name=test, retryAfterSeconds=null, uid=null, 
additionalProperties={}), kind=Status, message=persistentvolumeclaims "test" 
already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, 
resourceVersion=null, selfLink=null, additionalProperties={}), 
reason=AlreadyExists, status=Failure, additionalProperties={}). {code}
I'm looking forward to any new feedback you have.

 

cc [~dongjoon] 

> Spark in Kubernetes mode throws IllegalArgumentException when using static PVC
> --
>
> Key: SPARK-43342
> URL: https://issues.apache.org/jira/browse/SPARK-43342
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Oleg Frenkel
>Priority: Blocker
>
> When using static PVC with Spark 3.4, spark PI example fails with the error 
> below. Previous versions of Spark worked well.
> {code:java}
> 23/04/26 13:22:02 INFO ExecutorPodsAllocator: Going to request 5 executors 
> from Kubernetes for ResourceProfile Id: 0, target: 5, known: 0, 
> sharedSlotFromPendingPods: 2147483647. 23/04/26 13:22:02 INFO 
> BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown 
> script 23/04/26 13:22:02 ERROR ExecutorPodsSnapshotsStoreImpl: Going to stop 
> due to IllegalArgumentException java.lang.IllegalArgumentException: PVC 
> ClaimName: a1pvc should contain OnDemand or SPARK_EXECUTOR_ID when requiring 
> multiple executors at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.checkPVCClaimName(MountVolumesFeatureStep.scala:135)
>  at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:75)
>  at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)    
>  at scala.collection.Iterator.foreach(Iterator.scala:943) at 
> scala.collection.Iterator.foreach$(Iterator.scala:943) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at 
> scala.collection.IterableLike.foreach(IterableLike.scala:74) at 
> scala.collection.IterableLike.foreach$(IterableLike.scala:73) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:56) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:286) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:279) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:58)
>  at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:35)
>  at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$5(KubernetesExecutorBuilder.scala:83)
>  at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)    
>  at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)   
>   at scala.collection.immutable.List.foldLeft(List.scala:91) at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:82)
>  at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:430)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)  
>    at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:417)
>  at 
> 

[jira] [Commented] (SPARK-43342) Spark in Kubernetes mode throws IllegalArgumentException when using static PVC

2023-05-03 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719121#comment-17719121
 ] 

Qian Sun commented on SPARK-43342:
--

[~dongjoon] [~yikunkero] It seems like a regression caused by 
[SPARK-39006|https://issues.apache.org/jira/browse/SPARK-39006], please assign 
to me

> Spark in Kubernetes mode throws IllegalArgumentException when using static PVC
> --
>
> Key: SPARK-43342
> URL: https://issues.apache.org/jira/browse/SPARK-43342
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Oleg Frenkel
>Priority: Blocker
>
> When using static PVC with Spark 3.4, spark PI example fails with the error 
> below. Previous versions of Spark worked well.
> {code:java}
> 23/04/26 13:22:02 INFO ExecutorPodsAllocator: Going to request 5 executors 
> from Kubernetes for ResourceProfile Id: 0, target: 5, known: 0, 
> sharedSlotFromPendingPods: 2147483647. 23/04/26 13:22:02 INFO 
> BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown 
> script 23/04/26 13:22:02 ERROR ExecutorPodsSnapshotsStoreImpl: Going to stop 
> due to IllegalArgumentException java.lang.IllegalArgumentException: PVC 
> ClaimName: a1pvc should contain OnDemand or SPARK_EXECUTOR_ID when requiring 
> multiple executors at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.checkPVCClaimName(MountVolumesFeatureStep.scala:135)
>  at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:75)
>  at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)    
>  at scala.collection.Iterator.foreach(Iterator.scala:943) at 
> scala.collection.Iterator.foreach$(Iterator.scala:943) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at 
> scala.collection.IterableLike.foreach(IterableLike.scala:74) at 
> scala.collection.IterableLike.foreach$(IterableLike.scala:73) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:56) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:286) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:279) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:58)
>  at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:35)
>  at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$5(KubernetesExecutorBuilder.scala:83)
>  at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)    
>  at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)   
>   at scala.collection.immutable.List.foldLeft(List.scala:91) at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:82)
>  at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:430)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)  
>    at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:417)
>  at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:370)
>  at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:363)
>  at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)  
>    at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) 
> at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:363)
>  at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:134)
>  at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$adapted(ExecutorPodsAllocator.scala:134)
>  at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber$$processSnapshotsInternal(ExecutorPodsSnapshotsStoreImpl.scala:143)
>  at 
> 

[jira] [Updated] (SPARK-41781) Add the ability to create pvc before creating driver/executor pod

2022-12-29 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-41781:
-
Description: 
Creating pvc after driver/executor pod has Warning event from 
default-scheduler, such as
{code:java}
error getting PVC "spark/application-exec-1-pvc-0": could not find 
v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code}
Normal k8s workflow is to create PVC first and schedule pod to mount PVC.

 

We have a scenes that webhook server will try to reschedule pod and pvc to 
another pod. Because pvc creation after pod, wehbook couldn't find pvc based on 
pod metadata.

  was:
Creating pvc after driver/executor pod has Warning event from 
default-scheduler, such as
{code:java}
error getting PVC "spark/application-exec-1-pvc-0": could not find 
v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code}
Normally, we need to create PVC first and schedule pod to mount PVC.


> Add the ability to create pvc before creating driver/executor pod
> -
>
> Key: SPARK-41781
> URL: https://issues.apache.org/jira/browse/SPARK-41781
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Major
>
> Creating pvc after driver/executor pod has Warning event from 
> default-scheduler, such as
> {code:java}
> error getting PVC "spark/application-exec-1-pvc-0": could not find 
> v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code}
> Normal k8s workflow is to create PVC first and schedule pod to mount PVC.
>  
> We have a scenes that webhook server will try to reschedule pod and pvc to 
> another pod. Because pvc creation after pod, wehbook couldn't find pvc based 
> on pod metadata.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41781) Add the ability to create pvc before creating driver/executor pod

2022-12-29 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-41781:
-
Description: 
Creating pvc after driver/executor pod has Warning event from 
default-scheduler, such as
{code:java}
error getting PVC "spark/application-exec-1-pvc-0": could not find 
v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code}
Normally, we need to create PVC first and schedule pod to mount PVC.

  was:
Creating resources after executor pod has Warning event from default-scheduler, 
such as
{code:java}
error getting PVC "spark/application-exec-1-pvc-0": could not find 
v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code}
Normally, we need to create resources and schedule pod to mount them.


> Add the ability to create pvc before creating driver/executor pod
> -
>
> Key: SPARK-41781
> URL: https://issues.apache.org/jira/browse/SPARK-41781
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Major
>
> Creating pvc after driver/executor pod has Warning event from 
> default-scheduler, such as
> {code:java}
> error getting PVC "spark/application-exec-1-pvc-0": could not find 
> v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code}
> Normally, we need to create PVC first and schedule pod to mount PVC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41781) Add the ability to create pvc before creating driver/executor pod

2022-12-29 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-41781:
-
Summary: Add the ability to create pvc before creating driver/executor pod  
(was: Add the ability to create resources before creating executor pod)

> Add the ability to create pvc before creating driver/executor pod
> -
>
> Key: SPARK-41781
> URL: https://issues.apache.org/jira/browse/SPARK-41781
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Major
>
> Creating resources after executor pod has Warning event from 
> default-scheduler, such as
> {code:java}
> error getting PVC "spark/application-exec-1-pvc-0": could not find 
> v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code}
> Normally, we need to create resources and schedule pod to mount them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41781) Add the ability to create resources before creating executor pod

2022-12-29 Thread Qian Sun (Jira)
Qian Sun created SPARK-41781:


 Summary: Add the ability to create resources before creating 
executor pod
 Key: SPARK-41781
 URL: https://issues.apache.org/jira/browse/SPARK-41781
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: Qian Sun


Creating resources after executor pod has Warning event from default-scheduler, 
such as
{code:java}
error getting PVC "spark/application-exec-1-pvc-0": could not find 
v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code}
Normally, we need to create resources and schedule pod to mount them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40569) Add smoke test in standalone cluster for spark-docker

2022-10-30 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40569:
-
Summary: Add smoke test in standalone cluster for spark-docker  (was: 
Expose port for spark standalone mode)

> Add smoke test in standalone cluster for spark-docker
> -
>
> Key: SPARK-40569
> URL: https://issues.apache.org/jira/browse/SPARK-40569
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Qian Sun
>Assignee: Qian Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40969) Unable to download spark 3.3.0 tarball after 3.3.1 release in spark-docker

2022-10-30 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626359#comment-17626359
 ] 

Qian Sun commented on SPARK-40969:
--

[~yikunkero] fine with me, I'm working on this :)

> Unable to download spark 3.3.0 tarball after 3.3.1 release in spark-docker
> --
>
> Key: SPARK-40969
> URL: https://issues.apache.org/jira/browse/SPARK-40969
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Docker
>Affects Versions: 3.3.1
>Reporter: Qian Sun
>Priority: Major
>
> Unable to download spark 3.3.0 tarball in spark-docker. 
> {code:sh}
> #7 0.229 + wget -nv -O spark.tgz 
> https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
> #7 1.061 
> https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz:
> #7 1.061 2022-10-31 02:59:20 ERROR 404: Not Found.
> --
> executor failed running [/bin/sh -c set -ex; export SPARK_TMP="$(mktemp 
> -d)"; cd $SPARK_TMP; wget -nv -O spark.tgz "$SPARK_TGZ_URL"; wget 
> -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; export GNUPGHOME="$(mktemp 
> -d)"; gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" ||
>  gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; gpg 
> --batch --verify spark.tgz.asc spark.tgz; gpgconf --kill all; rm -rf 
> "$GNUPGHOME" spark.tgz.asc; tar -xf spark.tgz --strip-components=1;   
>   chown -R spark:spark .; mv jars /opt/spark/; mv bin /opt/spark/;
>  mv sbin /opt/spark/; mv kubernetes/dockerfiles/spark/decom.sh /opt/; 
> mv examples /opt/spark/; mv kubernetes/tests /opt/spark/; mv data 
> /opt/spark/; mv python/pyspark /opt/spark/python/pyspark/; mv 
> python/lib /opt/spark/python/lib/; cd ..; rm -rf "$SPARK_TMP";]: exit 
> code: 8
> {code}
> And spark 3.3.1 docker is ok
> {code:sh}
> => [4/9] RUN set -ex; export SPARK_TMP="$(mktemp -d)"; cd $SPARK_TMP; 
> wget -nv -O spark.tgz 
> "https://dlcdn.apache.org/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz;; 
> wget -nv -O spark.tgz.asc "https://downlo  77.8s
>  => [5/9] COPY entrypoint.sh /opt/
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40969) Unable to download spark 3.3.0 tarball after 3.3.1 release in spark-docker

2022-10-30 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626355#comment-17626355
 ] 

Qian Sun commented on SPARK-40969:
--

cc [~yikunkero][~hyukjin.kwon]

> Unable to download spark 3.3.0 tarball after 3.3.1 release in spark-docker
> --
>
> Key: SPARK-40969
> URL: https://issues.apache.org/jira/browse/SPARK-40969
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Docker
>Affects Versions: 3.3.1
>Reporter: Qian Sun
>Priority: Major
>
> Unable to download spark 3.3.0 tarball in spark-docker. 
> {code:sh}
> #7 0.229 + wget -nv -O spark.tgz 
> https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
> #7 1.061 
> https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz:
> #7 1.061 2022-10-31 02:59:20 ERROR 404: Not Found.
> --
> executor failed running [/bin/sh -c set -ex; export SPARK_TMP="$(mktemp 
> -d)"; cd $SPARK_TMP; wget -nv -O spark.tgz "$SPARK_TGZ_URL"; wget 
> -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; export GNUPGHOME="$(mktemp 
> -d)"; gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" ||
>  gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; gpg 
> --batch --verify spark.tgz.asc spark.tgz; gpgconf --kill all; rm -rf 
> "$GNUPGHOME" spark.tgz.asc; tar -xf spark.tgz --strip-components=1;   
>   chown -R spark:spark .; mv jars /opt/spark/; mv bin /opt/spark/;
>  mv sbin /opt/spark/; mv kubernetes/dockerfiles/spark/decom.sh /opt/; 
> mv examples /opt/spark/; mv kubernetes/tests /opt/spark/; mv data 
> /opt/spark/; mv python/pyspark /opt/spark/python/pyspark/; mv 
> python/lib /opt/spark/python/lib/; cd ..; rm -rf "$SPARK_TMP";]: exit 
> code: 8
> {code}
> And spark 3.3.1 docker is ok
> {code:sh}
> => [4/9] RUN set -ex; export SPARK_TMP="$(mktemp -d)"; cd $SPARK_TMP; 
> wget -nv -O spark.tgz 
> "https://dlcdn.apache.org/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz;; 
> wget -nv -O spark.tgz.asc "https://downlo  77.8s
>  => [5/9] COPY entrypoint.sh /opt/
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40969) Unable to download spark 3.3.0 tarball after 3.3.1 release in spark-docker

2022-10-30 Thread Qian Sun (Jira)
Qian Sun created SPARK-40969:


 Summary: Unable to download spark 3.3.0 tarball after 3.3.1 
release in spark-docker
 Key: SPARK-40969
 URL: https://issues.apache.org/jira/browse/SPARK-40969
 Project: Spark
  Issue Type: Bug
  Components: Spark Docker
Affects Versions: 3.3.1
Reporter: Qian Sun


Unable to download spark 3.3.0 tarball in spark-docker. 


{code:sh}
#7 0.229 + wget -nv -O spark.tgz 
https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
#7 1.061 https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz:
#7 1.061 2022-10-31 02:59:20 ERROR 404: Not Found.
--
executor failed running [/bin/sh -c set -ex; export SPARK_TMP="$(mktemp 
-d)"; cd $SPARK_TMP; wget -nv -O spark.tgz "$SPARK_TGZ_URL"; wget 
-nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; export GNUPGHOME="$(mktemp -d)"; 
gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || gpg 
--keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; gpg --batch 
--verify spark.tgz.asc spark.tgz; gpgconf --kill all; rm -rf 
"$GNUPGHOME" spark.tgz.asc; tar -xf spark.tgz --strip-components=1; 
chown -R spark:spark .; mv jars /opt/spark/; mv bin /opt/spark/; mv 
sbin /opt/spark/; mv kubernetes/dockerfiles/spark/decom.sh /opt/; mv 
examples /opt/spark/; mv kubernetes/tests /opt/spark/; mv data 
/opt/spark/; mv python/pyspark /opt/spark/python/pyspark/; mv 
python/lib /opt/spark/python/lib/; cd ..; rm -rf "$SPARK_TMP";]: exit 
code: 8
{code}
And spark 3.3.1 docker is ok

{code:sh}
=> [4/9] RUN set -ex; export SPARK_TMP="$(mktemp -d)"; cd $SPARK_TMP;   
  wget -nv -O spark.tgz 
"https://dlcdn.apache.org/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz;; 
wget -nv -O spark.tgz.asc "https://downlo  77.8s
 => [5/9] COPY entrypoint.sh /opt/
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40954) Kubernetes integration tests stuck forever on Mac M1 with Minikube + Docker

2022-10-30 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626345#comment-17626345
 ] 

Qian Sun commented on SPARK-40954:
--

Hi [~_anton] 
I use hyperkit as minikube driver. Could you try this command for starting 
minikube?  

{code:sh}
minikube --driver=hyperkit start
{code}


> Kubernetes integration tests stuck forever on Mac M1 with Minikube + Docker
> ---
>
> Key: SPARK-40954
> URL: https://issues.apache.org/jira/browse/SPARK-40954
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.3.1
> Environment: MacOS 12.6 (Mac M1)
> Minikube 1.27.1
> Docker 20.10.17
>Reporter: Anton Ippolitov
>Priority: Minor
> Attachments: TestProcess.scala
>
>
> h2. Description
> I tried running Kubernetes integration tests with the Minikube backend (+ 
> Docker driver) from commit c26d99e3f104f6603e0849d82eca03e28f196551 on 
> Spark's master branch. I ran them with the following command:
>  
> {code:java}
> mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \
> -Pkubernetes -Pkubernetes-integration-tests \
> -Phadoop-3 \
> -Dspark.kubernetes.test.imageTag=MY_IMAGE_TAG_HERE \
> -Dspark.kubernetes.test.imageRepo=docker.io/kubespark 
> \
> -Dspark.kubernetes.test.namespace=spark \
> -Dspark.kubernetes.test.serviceAccountName=spark \
> -Dspark.kubernetes.test.deployMode=minikube  {code}
> However the test suite got stuck literally for hours on my machine. 
>  
> h2. Investigation
> I ran {{jstack}} on the process that was running the tests and saw that it 
> was stuck here:
>  
> {noformat}
> "ScalaTest-main-running-KubernetesSuite" #1 prio=5 os_prio=31 
> tid=0x7f78d580b800 nid=0x2503 runnable [0x000304749000]
>    java.lang.Thread.State: RUNNABLE
>     at java.io.FileInputStream.readBytes(Native Method)
>     at java.io.FileInputStream.read(FileInputStream.java:255)
>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>     - locked <0x00076c0b6f40> (a 
> java.lang.UNIXProcess$ProcessPipeInputStream)
>     at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>     at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>     at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>     - locked <0x00076c0bb410> (a java.io.InputStreamReader)
>     at java.io.InputStreamReader.read(InputStreamReader.java:184)
>     at java.io.BufferedReader.fill(BufferedReader.java:161)
>     at java.io.BufferedReader.readLine(BufferedReader.java:324)
>     - locked <0x00076c0bb410> (a java.io.InputStreamReader)
>     at java.io.BufferedReader.readLine(BufferedReader.java:389)
>     at 
> scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74)
>     at scala.collection.Iterator.foreach(Iterator.scala:943)
>     at scala.collection.Iterator.foreach$(Iterator.scala:943)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2(ProcessUtils.scala:45)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2$adapted(ProcessUtils.scala:45)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$$$Lambda$322/20156341.apply(Unknown
>  Source)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.Utils$.tryWithResource(Utils.scala:49)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.executeProcess(ProcessUtils.scala:45)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.executeMinikube(Minikube.scala:103)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.minikubeServiceAction(Minikube.scala:112)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$getServiceUrl$1(DepsTestsSuite.scala:281)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$611/1461360262.apply(Unknown
>  Source)
>     at 
> org.scalatest.enablers.Retrying$$anon$4.makeAValiantAttempt$1(Retrying.scala:184)
>     at 
> org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:196)
>     at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226)
>     at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:313)
>     at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:312)
>     at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457)
>     at 
> 

[jira] [Created] (SPARK-40866) Rename Check Spark repo as Check Spark Docker repo in GA

2022-10-21 Thread Qian Sun (Jira)
Qian Sun created SPARK-40866:


 Summary: Rename Check Spark repo as Check Spark Docker repo in GA
 Key: SPARK-40866
 URL: https://issues.apache.org/jira/browse/SPARK-40866
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Docker
Affects Versions: 3.3.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40855) Add CONTRIBUTING.md to apache/spark-docker

2022-10-20 Thread Qian Sun (Jira)
Qian Sun created SPARK-40855:


 Summary: Add CONTRIBUTING.md to apache/spark-docker
 Key: SPARK-40855
 URL: https://issues.apache.org/jira/browse/SPARK-40855
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Docker
Affects Versions: 3.3.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40726) Supplement undocumented orc configurations in documentation

2022-10-10 Thread Qian Sun (Jira)
Qian Sun created SPARK-40726:


 Summary: Supplement undocumented orc configurations in 
documentation
 Key: SPARK-40726
 URL: https://issues.apache.org/jira/browse/SPARK-40726
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.3.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40710) Supplement undocumented parquet configurations in documentation

2022-10-08 Thread Qian Sun (Jira)
Qian Sun created SPARK-40710:


 Summary: Supplement undocumented parquet configurations in 
documentation
 Key: SPARK-40710
 URL: https://issues.apache.org/jira/browse/SPARK-40710
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.3.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40709) Supplement undocumented avro configurations in documentation

2022-10-07 Thread Qian Sun (Jira)
Qian Sun created SPARK-40709:


 Summary: Supplement undocumented avro configurations in 
documentation
 Key: SPARK-40709
 URL: https://issues.apache.org/jira/browse/SPARK-40709
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.3.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40699) Supplement undocumented yarn configuration in documentation

2022-10-07 Thread Qian Sun (Jira)
Qian Sun created SPARK-40699:


 Summary: Supplement undocumented yarn configuration in 
documentation
 Key: SPARK-40699
 URL: https://issues.apache.org/jira/browse/SPARK-40699
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.4.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40675) Supplement missing spark configuration in documentation

2022-10-07 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40675:
-
Summary: Supplement missing spark configuration in documentation  (was: 
Supplement missing spark configuration in configuration.md)

> Supplement missing spark configuration in documentation
> ---
>
> Key: SPARK-40675
> URL: https://issues.apache.org/jira/browse/SPARK-40675
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Qian Sun
>Priority: Major
>
> Supplement missing spark configuration in documentation to make the 
> documentation more readable. User can check configuration through 
> documentation instead of code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40675) Supplement missing spark configuration in configuration.md

2022-10-06 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40675:
-
Description: Supplement missing spark configuration in documentation to 
make the documentation more readable. User can check configuration through 
documentation instead of code.  (was: Supplement missing spark configuration to 
documentation to make the documentation more readable. User can check 
configuration through documentation instead of code.)

> Supplement missing spark configuration in configuration.md
> --
>
> Key: SPARK-40675
> URL: https://issues.apache.org/jira/browse/SPARK-40675
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Qian Sun
>Priority: Major
>
> Supplement missing spark configuration in documentation to make the 
> documentation more readable. User can check configuration through 
> documentation instead of code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40675) Supplement missing spark configuration in configuration.md

2022-10-06 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40675:
-
Description: Supplement missing spark configuration to documentation to 
make the documentation more readable. User can check configuration through 
documentation instead of code.  (was: Add missing spark configuration to 
documentation to make the documentation more readable. User can check 
configuration through documentation instead of code.)

> Supplement missing spark configuration in configuration.md
> --
>
> Key: SPARK-40675
> URL: https://issues.apache.org/jira/browse/SPARK-40675
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Qian Sun
>Priority: Major
>
> Supplement missing spark configuration to documentation to make the 
> documentation more readable. User can check configuration through 
> documentation instead of code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40675) Supplement missing spark configuration in configuration.md

2022-10-06 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40675:
-
Summary: Supplement missing spark configuration in configuration.md  (was: 
Add missing spark configuration to documentation)

> Supplement missing spark configuration in configuration.md
> --
>
> Key: SPARK-40675
> URL: https://issues.apache.org/jira/browse/SPARK-40675
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Qian Sun
>Priority: Major
>
> Add missing spark configuration to documentation to make the documentation 
> more readable. User can check configuration through documentation instead of 
> code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40675) Add missing spark configuration to documentation

2022-10-05 Thread Qian Sun (Jira)
Qian Sun created SPARK-40675:


 Summary: Add missing spark configuration to documentation
 Key: SPARK-40675
 URL: https://issues.apache.org/jira/browse/SPARK-40675
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.4.0
Reporter: Qian Sun


Add missing spark configuration to documentation to make the documentation more 
readable. User can check configuration through documentation instead of code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40569) Expose port for spark standalone mode

2022-10-01 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611898#comment-17611898
 ] 

Qian Sun commented on SPARK-40569:
--

[~bjornjorgensen]Thanks for your share

> Expose port for spark standalone mode
> -
>
> Key: SPARK-40569
> URL: https://issues.apache.org/jira/browse/SPARK-40569
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Qian Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in Task Table of Stage Tab

2022-09-26 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40572:
-
Priority: Minor  (was: Major)

> Executor ID sorted as lexicographical order in Task Table of Stage Tab
> --
>
> Key: SPARK-40572
> URL: https://issues.apache.org/jira/browse/SPARK-40572
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Minor
> Attachments: Executor_ID_IN_STAGES_TAB.png
>
>
> As figure shows, Executor ID sorted as lexicographical order in UI Stages 
> Tab. Better sort as number order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in Task Table of Stage Tab

2022-09-26 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40572:
-
Issue Type: Improvement  (was: Bug)

> Executor ID sorted as lexicographical order in Task Table of Stage Tab
> --
>
> Key: SPARK-40572
> URL: https://issues.apache.org/jira/browse/SPARK-40572
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Minor
> Attachments: Executor_ID_IN_STAGES_TAB.png
>
>
> As figure shows, Executor ID sorted as lexicographical order in UI Stages 
> Tab. Better sort as number order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab

2022-09-26 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609785#comment-17609785
 ] 

Qian Sun commented on SPARK-40572:
--

I think the root cause is that [executorId is string in 
TaskDataWrapper|https://github.com/apache/spark/blob/072575c9e6fc304f09e01ad0ee180c8f309ede91/core/src/main/scala/org/apache/spark/status/storeTypes.scala#L174-L175].
 Executor ID is string in apache spark and there are tons of changes that will 
be introduced into apache spark if modify the type.

> Executor ID sorted as lexicographical order in UI Stages Tab
> 
>
> Key: SPARK-40572
> URL: https://issues.apache.org/jira/browse/SPARK-40572
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Major
> Attachments: Executor_ID_IN_STAGES_TAB.png
>
>
> As figure shows, Executor ID sorted as lexicographical order in UI Stages 
> Tab. Better sort as number order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in Task Table of Stage Tab

2022-09-26 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40572:
-
Summary: Executor ID sorted as lexicographical order in Task Table of Stage 
Tab  (was: Executor ID sorted as lexicographical order in UI Stages Tab)

> Executor ID sorted as lexicographical order in Task Table of Stage Tab
> --
>
> Key: SPARK-40572
> URL: https://issues.apache.org/jira/browse/SPARK-40572
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Major
> Attachments: Executor_ID_IN_STAGES_TAB.png
>
>
> As figure shows, Executor ID sorted as lexicographical order in UI Stages 
> Tab. Better sort as number order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab

2022-09-26 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40572:
-
Attachment: Executor_ID_IN_STAGES_TAB.png

> Executor ID sorted as lexicographical order in UI Stages Tab
> 
>
> Key: SPARK-40572
> URL: https://issues.apache.org/jira/browse/SPARK-40572
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Major
> Attachments: Executor_ID_IN_STAGES_TAB.png
>
>
> As figure shows, Executor ID sorted as lexicographical order in UI Stages 
> Tab. Better sort as number order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab

2022-09-26 Thread Qian Sun (Jira)
Qian Sun created SPARK-40572:


 Summary: Executor ID sorted as lexicographical order in UI Stages 
Tab
 Key: SPARK-40572
 URL: https://issues.apache.org/jira/browse/SPARK-40572
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.3.0
Reporter: Qian Sun


As figure shows, Executor ID sorted as lexicographical order in UI Stages Tab. 
Better sort as number order

!image-2022-09-27-09-26-46-755.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab

2022-09-26 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40572:
-
Description: As figure shows, Executor ID sorted as lexicographical order 
in UI Stages Tab. Better sort as number order  (was: As figure shows, Executor 
ID sorted as lexicographical order in UI Stages Tab. Better sort as number order

!image-2022-09-27-09-26-46-755.png!)

> Executor ID sorted as lexicographical order in UI Stages Tab
> 
>
> Key: SPARK-40572
> URL: https://issues.apache.org/jira/browse/SPARK-40572
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Major
>
> As figure shows, Executor ID sorted as lexicographical order in UI Stages 
> Tab. Better sort as number order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40570) Add doc for Docker Setup in standalone mode

2022-09-26 Thread Qian Sun (Jira)
Qian Sun created SPARK-40570:


 Summary: Add doc for Docker Setup in standalone mode
 Key: SPARK-40570
 URL: https://issues.apache.org/jira/browse/SPARK-40570
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.4.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40569) Expose port for spark standalone mode

2022-09-26 Thread Qian Sun (Jira)
Qian Sun created SPARK-40569:


 Summary: Expose port for spark standalone mode
 Key: SPARK-40569
 URL: https://issues.apache.org/jira/browse/SPARK-40569
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 3.4.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40160) Make pyspark.broadcast examples self-contained

2022-08-22 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583291#comment-17583291
 ] 

Qian Sun commented on SPARK-40160:
--

working on it :)

> Make pyspark.broadcast examples self-contained
> --
>
> Key: SPARK-40160
> URL: https://issues.apache.org/jira/browse/SPARK-40160
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Qian Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40148) Make pyspark.sql.window examples self-contained

2022-08-21 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582535#comment-17582535
 ] 

Qian Sun commented on SPARK-40148:
--

[~hyukjin.kwon] OK, I'll create a follow-up PR to do these :)

> Make pyspark.sql.window examples self-contained
> ---
>
> Key: SPARK-40148
> URL: https://issues.apache.org/jira/browse/SPARK-40148
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40148) Make pyspark.sql.window examples self-contained

2022-08-20 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582454#comment-17582454
 ] 

Qian Sun commented on SPARK-40148:
--

[~hyukjin.kwon] Hi, it seems like that it is same with [SPARK-40010] Make 
pyspark.sql.window examples self-contained - ASF JIRA (apache.org)

Is there anything else that needs to be done in pyspark.sql.window? I'd like to 
work on it.

> Make pyspark.sql.window examples self-contained
> ---
>
> Key: SPARK-40148
> URL: https://issues.apache.org/jira/browse/SPARK-40148
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-40148) Make pyspark.sql.window examples self-contained

2022-08-20 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582454#comment-17582454
 ] 

Qian Sun edited comment on SPARK-40148 at 8/21/22 3:30 AM:
---

[~hyukjin.kwon] Hi, it seems like that it is same with SPARK-40010

Is there anything else that needs to be done in pyspark.sql.window? I'd like to 
work on it.


was (Author: dcoliversun):
[~hyukjin.kwon] Hi, it seems like that it is same with [SPARK-40010] Make 
pyspark.sql.window examples self-contained - ASF JIRA (apache.org)

Is there anything else that needs to be done in pyspark.sql.window? I'd like to 
work on it.

> Make pyspark.sql.window examples self-contained
> ---
>
> Key: SPARK-40148
> URL: https://issues.apache.org/jira/browse/SPARK-40148
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40160) Make pyspark.broadcast examples self-contained

2022-08-20 Thread Qian Sun (Jira)
Qian Sun created SPARK-40160:


 Summary: Make pyspark.broadcast examples self-contained
 Key: SPARK-40160
 URL: https://issues.apache.org/jira/browse/SPARK-40160
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 3.4.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40081) Add Document Parameters for pyspark.sql.streaming.query

2022-08-20 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582149#comment-17582149
 ] 

Qian Sun commented on SPARK-40081:
--

[~hyukjin.kwon] Yes, I'm working on it

> Add Document Parameters for pyspark.sql.streaming.query
> ---
>
> Key: SPARK-40081
> URL: https://issues.apache.org/jira/browse/SPARK-40081
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Qian Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40081) Add Document Parameters for pyspark.sql.streaming.query

2022-08-15 Thread Qian Sun (Jira)
Qian Sun created SPARK-40081:


 Summary: Add Document Parameters for pyspark.sql.streaming.query
 Key: SPARK-40081
 URL: https://issues.apache.org/jira/browse/SPARK-40081
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 3.4.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40078) Make pyspark.sql.column examples self-contained

2022-08-15 Thread Qian Sun (Jira)
Qian Sun created SPARK-40078:


 Summary: Make pyspark.sql.column examples self-contained
 Key: SPARK-40078
 URL: https://issues.apache.org/jira/browse/SPARK-40078
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 3.4.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40042) Make pyspark.sql.streaming.query examples self-contained

2022-08-10 Thread Qian Sun (Jira)
Qian Sun created SPARK-40042:


 Summary: Make pyspark.sql.streaming.query examples self-contained
 Key: SPARK-40042
 URL: https://issues.apache.org/jira/browse/SPARK-40042
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 3.4.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40041) Add Document Parameters for pyspark.sql.window

2022-08-10 Thread Qian Sun (Jira)
Qian Sun created SPARK-40041:


 Summary: Add Document Parameters for pyspark.sql.window
 Key: SPARK-40041
 URL: https://issues.apache.org/jira/browse/SPARK-40041
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 3.4.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40010) Make pyspark.sql.windown examples self-contained

2022-08-08 Thread Qian Sun (Jira)
Qian Sun created SPARK-40010:


 Summary: Make pyspark.sql.windown examples self-contained
 Key: SPARK-40010
 URL: https://issues.apache.org/jira/browse/SPARK-40010
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 3.4.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39676) Add task partition id for Task assertEquals method in JsonProtocolSuite

2022-07-04 Thread Qian Sun (Jira)
Qian Sun created SPARK-39676:


 Summary: Add task partition id for Task assertEquals method in 
JsonProtocolSuite 
 Key: SPARK-39676
 URL: https://issues.apache.org/jira/browse/SPARK-39676
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Qian Sun
 Fix For: 3.4.0


In https://issues.apache.org/jira/browse/SPARK-37831, Add task partition id in 
metrics.

But JsonProtocolSuite doesn't add task partition id.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39608) Upgrade to spark 3.3.0 is causing error "Cannot grow BufferHolder by size -179446840 because the size is negative"

2022-07-01 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561380#comment-17561380
 ] 

Qian Sun commented on SPARK-39608:
--

Could you share more information? Such as spark application code or generated 
code

> Upgrade to spark 3.3.0 is causing error "Cannot grow BufferHolder by size 
> -179446840 because the size is negative"
> --
>
> Key: SPARK-39608
> URL: https://issues.apache.org/jira/browse/SPARK-39608
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Isaac Eliassi
>Priority: Critical
>
> Hi,
>  
> We recently upgraded to version 3.3.0.
> The upgrade is causing the following error "Cannot grow BufferHolder by size 
> -179446840 because the size is negative"
>  
> I can't find information on this on the internet, when reverting to spark 
> 3.2.1 it works.
>  
> Full exception:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in 
> stage 36.0 failed 4 times, most recent failure: Lost task 1.3 in stage 36.0 
> (TID 2873) (172.24.214.133 executor 4): java.lang.IllegalArgumentException: 
> Cannot grow BufferHolder by size -143657042 because the size is negative
>         at 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:67)
>         at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.grow(UnsafeWriter.java:63)
>         at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:165)
>         at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage24.smj_consumeFullOuterJoinRow_0$(Unknown
>  Source)
>         at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage24.processNext(Unknown
>  Source)
>         at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>         at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$2.hasNext(WholeStageCodegenExec.scala:779)
>         at 
> org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.hasNext(InMemoryRelation.scala:118)
>         at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>         at 
> org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
>         at 
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:302)
>         at 
> org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1508)
>         at 
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1435)
>         at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1499)
>         at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1322)
>         at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:376)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:327)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
>         at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>         at org.apache.spark.scheduler.Task.run(Task.scala:136)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
>         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
> 

[jira] [Commented] (SPARK-39430) The inconsistent timezone in Spark History Server UI

2022-06-29 Thread Qian Sun (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Qian Sun commented on  SPARK-39430  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: The inconsistent timezone in Spark History Server UI   
 

  
 
 
 
 

 
 Surbhi Hi. I try it again and this phenomenon is not reproduced. I use SHS 3.2.1 and spark 3.2.0. Was your spark application running in IST timezone?  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)