[jira] [Created] (SPARK-47992) Support recursive descent path in get_json_object function
Qian Sun created SPARK-47992: Summary: Support recursive descent path in get_json_object function Key: SPARK-47992 URL: https://issues.apache.org/jira/browse/SPARK-47992 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Qian Sun JSONPath borrows recursive descent syntax from E4X. We could use it to collect json object from json map string. {code:java} // json data {"key1": {"b": {"c": "c1", "d": "d1", "e": "e1"}}} {"key2": {"b": {"c": "c2", "d": "d2", "e": "e2"}}} select get_json_object(data, '$..c'); -- [c1, c2]{code} ref: https://goessner.net/articles/JsonPath/index.html#e2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47519) Support json_length function
[ https://issues.apache.org/jira/browse/SPARK-47519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829779#comment-17829779 ] Qian Sun commented on SPARK-47519: -- [~cloud_fan] [~smilegator] Excuse me, I look forward to your opinion. And I also see [https://github.com/apache/spark/pull/28167#issuecomment-614097511,] I wonder if _json_length_ function is needed by the apache spark community > Support json_length function > > > Key: SPARK-47519 > URL: https://issues.apache.org/jira/browse/SPARK-47519 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Qian Sun >Priority: Major > > At the moment, we don't support json_length built-in function in apache spark. > This function is supported by > # presto: [https://prestodb.io/docs/current/functions/json.html#json_size] > # clickhouse: > [https://clickhouse.com/docs/en/sql-reference/functions/json-functions#jsonlengthjson-indices_or_keys] > # mysql: > [https://dev.mysql.com/doc/refman/8.0/en/json-attribute-functions.html#function_json-length] > > *Definition* > json_length(json_txt, path) - Return the length of a JSON array or a JSON > object. > If the value does not exist or has a wrong type, {{0}} will be returned. > Examples: > > {code:java} > SELECT json_length('{"x": {"a": 1, "b": 2}}', '$.x'); -- 2 > SELECT json_length('{"x": [1, 2, 3]}', '$.x'); -- 3 > SELECT json_length('{"x": {"a": 1, "b": 2}}', '$.x.a'); -- 1{code} > > *The advantages:* > # skip parse phase, so its performance is better than > _size(get_json_object(json_txt, path)_ > # it has more functions than _json_array_length_ and can be implemented in a > unified manner > # it allows naive users to directly get json length with a built-in json > function instead of UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47519) Support json_length function
[ https://issues.apache.org/jira/browse/SPARK-47519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-47519: - Description: At the moment, we don't support json_length built-in function in apache spark. This function is supported by # presto: [https://prestodb.io/docs/current/functions/json.html#json_size] # clickhouse: [https://clickhouse.com/docs/en/sql-reference/functions/json-functions#jsonlengthjson-indices_or_keys] # mysql: [https://dev.mysql.com/doc/refman/8.0/en/json-attribute-functions.html#function_json-length] *Definition* json_length(json_txt, path) - Return the length of a JSON array or a JSON object. If the value does not exist or has a wrong type, {{0}} will be returned. Examples: {code:java} SELECT json_length('{"x": {"a": 1, "b": 2}}', '$.x'); -- 2 SELECT json_length('{"x": [1, 2, 3]}', '$.x'); -- 3 SELECT json_length('{"x": {"a": 1, "b": 2}}', '$.x.a'); -- 1{code} *The advantages:* # skip parse phase, so its performance is better than _size(get_json_object(json_txt, path)_ # it has more functions than _json_array_length_ and can be implemented in a unified manner # it allows naive users to directly get json length with a built-in json function instead of UDF was: At the moment, we don't support json_length built-in function in apache spark. This function is supported by # presto: [https://prestodb.io/docs/current/functions/json.html#json_size] # clickhouse: [https://clickhouse.com/docs/en/sql-reference/functions/json-functions#jsonlengthjson-indices_or_keys] This allows naive users to directly get json length with a built-in json function. > Support json_length function > > > Key: SPARK-47519 > URL: https://issues.apache.org/jira/browse/SPARK-47519 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Qian Sun >Priority: Major > > At the moment, we don't support json_length built-in function in apache spark. > This function is supported by > # presto: [https://prestodb.io/docs/current/functions/json.html#json_size] > # clickhouse: > [https://clickhouse.com/docs/en/sql-reference/functions/json-functions#jsonlengthjson-indices_or_keys] > # mysql: > [https://dev.mysql.com/doc/refman/8.0/en/json-attribute-functions.html#function_json-length] > > *Definition* > json_length(json_txt, path) - Return the length of a JSON array or a JSON > object. > If the value does not exist or has a wrong type, {{0}} will be returned. > Examples: > > {code:java} > SELECT json_length('{"x": {"a": 1, "b": 2}}', '$.x'); -- 2 > SELECT json_length('{"x": [1, 2, 3]}', '$.x'); -- 3 > SELECT json_length('{"x": {"a": 1, "b": 2}}', '$.x.a'); -- 1{code} > > *The advantages:* > # skip parse phase, so its performance is better than > _size(get_json_object(json_txt, path)_ > # it has more functions than _json_array_length_ and can be implemented in a > unified manner > # it allows naive users to directly get json length with a built-in json > function instead of UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47519) Support json_length function
Qian Sun created SPARK-47519: Summary: Support json_length function Key: SPARK-47519 URL: https://issues.apache.org/jira/browse/SPARK-47519 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Qian Sun At the moment, we don't support json_length built-in function in apache spark. This function is supported by # presto: [https://prestodb.io/docs/current/functions/json.html#json_size] # clickhouse: [https://clickhouse.com/docs/en/sql-reference/functions/json-functions#jsonlengthjson-indices_or_keys] This allows naive users to directly get json length with a built-in json function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3
[ https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791534#comment-17791534 ] Qian Sun edited comment on SPARK-44573 at 11/30/23 9:48 AM: Did you bind role with your serviceaccount? ref: [https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac] cc [~dongjoon] was (Author: dcoliversun): Did you bind role with your serviceaccount? ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac > Couldn't submit Spark application to Kubenetes in versions v1.27.3 > -- > > Key: SPARK-44573 > URL: https://issues.apache.org/jira/browse/SPARK-44573 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.4.1 >Reporter: Siddaraju G C >Priority: Blocker > > Spark-submit ( cluster mode on Kubernetes ) results error > *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s > cluster. > Steps followed: > * using IBM cloud, created 3 Instances > * 1st Instance act as master node and another two acts as worker nodes > > {noformat} > root@vsi-spark-master:/opt# kubectl get nodes > NAME STATUS ROLES AGE VERSION > vsi-spark-master Ready control-plane,master 2d v1.27.3+k3s1 > vsi-spark-worker-1 Ready 47h v1.27.3+k3s1 > vsi-spark-worker-2 Ready 47h > v1.27.3+k3s1{noformat} > * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder > * Ran spark by using below command > > {noformat} > root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master > k8s://http://:6443 --conf > spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode > cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf > spark.executor.instances=5 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB > local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat} > * And getting below error message. > {noformat} > 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. > 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified > a krb5.conf file locally or via a ConfigMap. Make sure that you have the > krb5.conf locally on the driver image. > 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" > first. It should be yes. > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122) > at > io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.IOException: Connection reset > at >
[jira] [Commented] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3
[ https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791534#comment-17791534 ] Qian Sun commented on SPARK-44573: -- Did you bind role with your serviceaccount? ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac > Couldn't submit Spark application to Kubenetes in versions v1.27.3 > -- > > Key: SPARK-44573 > URL: https://issues.apache.org/jira/browse/SPARK-44573 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.4.1 >Reporter: Siddaraju G C >Priority: Blocker > > Spark-submit ( cluster mode on Kubernetes ) results error > *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s > cluster. > Steps followed: > * using IBM cloud, created 3 Instances > * 1st Instance act as master node and another two acts as worker nodes > > {noformat} > root@vsi-spark-master:/opt# kubectl get nodes > NAME STATUS ROLES AGE VERSION > vsi-spark-master Ready control-plane,master 2d v1.27.3+k3s1 > vsi-spark-worker-1 Ready 47h v1.27.3+k3s1 > vsi-spark-worker-2 Ready 47h > v1.27.3+k3s1{noformat} > * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder > * Ran spark by using below command > > {noformat} > root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master > k8s://http://:6443 --conf > spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode > cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf > spark.executor.instances=5 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB > local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat} > * And getting below error message. > {noformat} > 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. > 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified > a krb5.conf file locally or via a ConfigMap. Make sure that you have the > krb5.conf locally on the driver image. > 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" > first. It should be yes. > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122) > at > io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.IOException: Connection reset > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535) > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558) > at
[jira] [Updated] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg in spark3.5.0 website
[ https://issues.apache.org/jira/browse/SPARK-46183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-46183: - Summary: Incorrect path for spark-hero-thin-light.jpg in spark3.5.0 website (was: Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website) > Incorrect path for spark-hero-thin-light.jpg in spark3.5.0 website > -- > > Key: SPARK-46183 > URL: https://issues.apache.org/jira/browse/SPARK-46183 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Qian Sun >Priority: Minor > Attachments: network.png > > > When I visit [https://spark.apache.org/docs/3.5.0/,] > spark-hero-thin-light.jpg is not found caused by > [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,] > the path should be ../images/spark-hero-thin-light.jpg -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website
Qian Sun created SPARK-46183: Summary: Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website Key: SPARK-46183 URL: https://issues.apache.org/jira/browse/SPARK-46183 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 3.5.0 Reporter: Qian Sun When I visit [https://spark.apache.org/docs/3.5.0/,] spark-hero-thin-light.jpg is not found caused by [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,] the path should be ../images/spark-hero-thin-light.jpg -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website
[ https://issues.apache.org/jira/browse/SPARK-46183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-46183: - Attachment: network.png > Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website > --- > > Key: SPARK-46183 > URL: https://issues.apache.org/jira/browse/SPARK-46183 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Qian Sun >Priority: Minor > Attachments: network.png > > > When I visit [https://spark.apache.org/docs/3.5.0/,] > spark-hero-thin-light.jpg is not found caused by > [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,] > the path should be ../images/spark-hero-thin-light.jpg -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46183) Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website
[ https://issues.apache.org/jira/browse/SPARK-46183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-46183: - Attachment: (was: L1VzZXJzL2hlbmd6aGVuLnNxL0xpYnJhcnkvQXBwbGljYXRpb24gU3VwcG9ydC9pRGluZ1RhbGsvNDUyMDQ5NjgwX3YyL0ltYWdlRmlsZXMvMTcwMTMzNjk5MjkzNF81QjRENEU2RC1FNUM2LTQxNEQtOERGRS0wOTIxRUUzMjY2OTcucG5n.png) > Incorrect path for spark-hero-thin-light.jpg for spark3.5.0 website > --- > > Key: SPARK-46183 > URL: https://issues.apache.org/jira/browse/SPARK-46183 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Qian Sun >Priority: Minor > > When I visit [https://spark.apache.org/docs/3.5.0/,] > spark-hero-thin-light.jpg is not found caused by > [https://github.com/apache/spark-website/blob/17c63886085b582a1317a929114659f9e88822aa/site/docs/3.5.0/css/custom.css#L99,] > the path should be ../images/spark-hero-thin-light.jpg -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45175) download krb5.conf from remote storage in spark-submit on k8s
[ https://issues.apache.org/jira/browse/SPARK-45175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767908#comment-17767908 ] Qian Sun commented on SPARK-45175: -- In multi-tenant scenarios, I find Apache Spark provide *{{spark.kubernetes.kerberos.krb5.configMapName}}* to mount ConfigMap containing the {{*krb5.conf*}} file, we could manage these files by creating multiple configMaps for multi-tenants. > download krb5.conf from remote storage in spark-submit on k8s > - > > Key: SPARK-45175 > URL: https://issues.apache.org/jira/browse/SPARK-45175 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.1 >Reporter: Qian Sun >Priority: Minor > Labels: pull-request-available > > krb5.conf currently only supports the local file format. Tenants would like > to save this file on their own servers and download it during the > spark-submit phase for better implementation of multi-tenant scenarios. The > proposed solution is to use the *downloadFile* function[1], similar to the > configuration of *spark.kubernetes.driver/executor.podTemplateFile* > > [1]https://github.com/apache/spark/blob/822f58f0d26b7d760469151a65eaf9ee863a07a1/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala#L82C24-L82C24 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43182) Mutilple tables join with limit when AE is enabled and one table is skewed
[ https://issues.apache.org/jira/browse/SPARK-43182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766690#comment-17766690 ] Qian Sun edited comment on SPARK-43182 at 9/19/23 8:14 AM: --- Hi [~Resol1992] I ran your sql, tried different configuration combinations and believe regression caused by *spark.sql.adaptive.forceOptimizeSkewedJoin* , which introduces extra shuffles. AQE can give up skewJoin Optimization if extra shuffle introduced when *spark.sql.adaptive.forceOptimizeSkewedJoin* is false. cc [~cloud_fan] ref: [https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L225-L229] was (Author: dcoliversun): Hi [~Resol1992] I ran your sql, tried different configuration combinations and believe regression caused by *spark.sql.adaptive.forceOptimizeSkewedJoin* , which introduces extra shuffles. AQE can give up skewJoin Optimization if extra shuffle introduced when *spark.sql.adaptive.forceOptimizeSkewedJoin* is false. cc [~cloud_fan] * https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L225-L229 > Mutilple tables join with limit when AE is enabled and one table is skewed > -- > > Key: SPARK-43182 > URL: https://issues.apache.org/jira/browse/SPARK-43182 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Liu Shuo >Priority: Critical > Attachments: part-m-0.zip, part-m-1.zip, part-m-2.zip, > part-m-3.zip, part-m-4.zip, part-m-5.zip, part-m-6.zip, > part-m-7.zip, part-m-8.zip, part-m-9.zip, part-m-00010.zip, > part-m-00011.zip, part-m-00012.zip, part-m-00013.zip, part-m-00014.zip, > part-m-00015.zip, part-m-00016.zip, part-m-00017.zip, part-m-00018.zip, > part-m-00019.zip > > > When we test AE in Spark3.4.0 with the following case, we find If we disable > AE or enable Ae but disable skewJoin, the sql will finish in 20s, but if we > enable AE and enable skewJoin,it will take very long time. > The test case: > {code:java} > ###uncompress the part-m-***.zip attachment, and put these files under > '/tmp/spark-warehouse/data/' dir. > create table source_aqe(c1 int,c18 string) using csv options(path > 'file:///tmp/spark-warehouse/data/'); > create table hive_snappy_aqe_table1(c1 int)stored as PARQUET partitioned > by(c18 string); > insert into table hive_snappy_aqe_table1 partition(c18=1)select c1 from > source_aqe; > insert into table hive_snappy_aqe_table1 partition(c18=2)select c1 from > source_aqe limit 12; > insert into table hive_snappy_aqe_table1 partition(c18=3)select c1 from > source_aqe limit 15;create table hive_snappy_aqe_table2(c1 int)stored as > PARQUET partitioned by(c18 string); > insert into table hive_snappy_aqe_table2 partition(c18=1)select c1 from > source_aqe limit 16; > insert into table hive_snappy_aqe_table2 partition(c18=2)select c1 from > source_aqe limit 12;create table hive_snappy_aqe_table3(c1 int)stored as > PARQUET partitioned by(c18 string); > insert into table hive_snappy_aqe_table3 partition(c18=1)select c1 from > source_aqe limit 16; > insert into table hive_snappy_aqe_table3 partition(c18=2)select c1 from > source_aqe limit 12; > set spark.sql.adaptive.enabled=false; > set spark.sql.adaptive.forceOptimizeSkewedJoin = false; > set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1; > set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB; > set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB; > set spark.sql.autoBroadcastJoinThreshold = 51200; > > ###it will finish in 20s > select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join > hive_snappy_aqe_table3 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10; > set spark.sql.adaptive.enabled=true; > set spark.sql.adaptive.forceOptimizeSkewedJoin = true; > set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1; > set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB; > set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB; > set spark.sql.autoBroadcastJoinThreshold = 51200; > ###it will take very long time > select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join > hive_snappy_aqe_table3 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To
[jira] [Commented] (SPARK-43182) Mutilple tables join with limit when AE is enabled and one table is skewed
[ https://issues.apache.org/jira/browse/SPARK-43182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766690#comment-17766690 ] Qian Sun commented on SPARK-43182: -- Hi [~Resol1992] I ran your sql, tried different configuration combinations and believe regression caused by *spark.sql.adaptive.forceOptimizeSkewedJoin* , which introduces extra shuffles. AQE can give up skewJoin Optimization if extra shuffle introduced when *spark.sql.adaptive.forceOptimizeSkewedJoin* is false. cc [~cloud_fan] * https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L225-L229 > Mutilple tables join with limit when AE is enabled and one table is skewed > -- > > Key: SPARK-43182 > URL: https://issues.apache.org/jira/browse/SPARK-43182 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Liu Shuo >Priority: Critical > Attachments: part-m-0.zip, part-m-1.zip, part-m-2.zip, > part-m-3.zip, part-m-4.zip, part-m-5.zip, part-m-6.zip, > part-m-7.zip, part-m-8.zip, part-m-9.zip, part-m-00010.zip, > part-m-00011.zip, part-m-00012.zip, part-m-00013.zip, part-m-00014.zip, > part-m-00015.zip, part-m-00016.zip, part-m-00017.zip, part-m-00018.zip, > part-m-00019.zip > > > When we test AE in Spark3.4.0 with the following case, we find If we disable > AE or enable Ae but disable skewJoin, the sql will finish in 20s, but if we > enable AE and enable skewJoin,it will take very long time. > The test case: > {code:java} > ###uncompress the part-m-***.zip attachment, and put these files under > '/tmp/spark-warehouse/data/' dir. > create table source_aqe(c1 int,c18 string) using csv options(path > 'file:///tmp/spark-warehouse/data/'); > create table hive_snappy_aqe_table1(c1 int)stored as PARQUET partitioned > by(c18 string); > insert into table hive_snappy_aqe_table1 partition(c18=1)select c1 from > source_aqe; > insert into table hive_snappy_aqe_table1 partition(c18=2)select c1 from > source_aqe limit 12; > insert into table hive_snappy_aqe_table1 partition(c18=3)select c1 from > source_aqe limit 15;create table hive_snappy_aqe_table2(c1 int)stored as > PARQUET partitioned by(c18 string); > insert into table hive_snappy_aqe_table2 partition(c18=1)select c1 from > source_aqe limit 16; > insert into table hive_snappy_aqe_table2 partition(c18=2)select c1 from > source_aqe limit 12;create table hive_snappy_aqe_table3(c1 int)stored as > PARQUET partitioned by(c18 string); > insert into table hive_snappy_aqe_table3 partition(c18=1)select c1 from > source_aqe limit 16; > insert into table hive_snappy_aqe_table3 partition(c18=2)select c1 from > source_aqe limit 12; > set spark.sql.adaptive.enabled=false; > set spark.sql.adaptive.forceOptimizeSkewedJoin = false; > set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1; > set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB; > set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB; > set spark.sql.autoBroadcastJoinThreshold = 51200; > > ###it will finish in 20s > select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join > hive_snappy_aqe_table3 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10; > set spark.sql.adaptive.enabled=true; > set spark.sql.adaptive.forceOptimizeSkewedJoin = true; > set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1; > set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB; > set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB; > set spark.sql.autoBroadcastJoinThreshold = 51200; > ###it will take very long time > select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join > hive_snappy_aqe_table3 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45208) Kubernetes Configuration in Spark Community Website doesn't have horizontal scrollbar
Qian Sun created SPARK-45208: Summary: Kubernetes Configuration in Spark Community Website doesn't have horizontal scrollbar Key: SPARK-45208 URL: https://issues.apache.org/jira/browse/SPARK-45208 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 3.5.0 Reporter: Qian Sun I find a recent issue with the official Spark documentation on the website. Specifically, the Kubernetes configuration lists on the right-hand side are not visible and doc doesn't have a horizontal scrollbar. - [https://spark.apache.org/docs/3.5.0/running-on-kubernetes.html#configuration] - [https://spark.apache.org/docs/3.4.1/running-on-kubernetes.html#configuration] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45175) download krb5.conf from remote storage in spark-submit on k8s
[ https://issues.apache.org/jira/browse/SPARK-45175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-45175: - Summary: download krb5.conf from remote storage in spark-submit on k8s (was: download krb5.conf from remote storage in spark-sumbit on k8s) > download krb5.conf from remote storage in spark-submit on k8s > - > > Key: SPARK-45175 > URL: https://issues.apache.org/jira/browse/SPARK-45175 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.1 >Reporter: Qian Sun >Priority: Minor > Labels: pull-request-available > > krb5.conf currently only supports the local file format. Tenants would like > to save this file on their own servers and download it during the > spark-submit phase for better implementation of multi-tenant scenarios. The > proposed solution is to use the *downloadFile* function[1], similar to the > configuration of *spark.kubernetes.driver/executor.podTemplateFile* > > [1]https://github.com/apache/spark/blob/822f58f0d26b7d760469151a65eaf9ee863a07a1/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala#L82C24-L82C24 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45175) download krb5.conf from remote storage in spark-sumbit on k8s
Qian Sun created SPARK-45175: Summary: download krb5.conf from remote storage in spark-sumbit on k8s Key: SPARK-45175 URL: https://issues.apache.org/jira/browse/SPARK-45175 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.4.1 Reporter: Qian Sun krb5.conf currently only supports the local file format. Tenants would like to save this file on their own servers and download it during the spark-submit phase for better implementation of multi-tenant scenarios. The proposed solution is to use the *downloadFile* function[1], similar to the configuration of *spark.kubernetes.driver/executor.podTemplateFile* [1]https://github.com/apache/spark/blob/822f58f0d26b7d760469151a65eaf9ee863a07a1/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala#L82C24-L82C24 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-43342) Spark in Kubernetes mode throws IllegalArgumentException when using static PVC
[ https://issues.apache.org/jira/browse/SPARK-43342 ] Qian Sun deleted comment on SPARK-43342: -- was (Author: dcoliversun): [~ofrenkel] Hello, I tried to reproduce using the configuration you provided. There are some issues that I need to confirm with you: * When the driver and executor use the PVC with same claim name, can your executor start normally? * Did your run of spark-pi compute the value of pi? Based on my tests, Spark 3.3 cannot start the executor properly and cannot compute the value of pi. The logs I saw are as follows. {code:java} [kubernetes-executor-snapshots-subscribers-1] WARN org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl - Exception when notifying snapshot subscriber. io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://21.8.0.8:6443/api/v1/namespaces/test-ns/persistentvolumeclaims. Message: persistentvolumeclaims "a1pvc" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=persistentvolumeclaims, name=test, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=persistentvolumeclaims "test" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}). {code} I'm looking forward to any new feedback you have. cc [~dongjoon] > Spark in Kubernetes mode throws IllegalArgumentException when using static PVC > -- > > Key: SPARK-43342 > URL: https://issues.apache.org/jira/browse/SPARK-43342 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Oleg Frenkel >Priority: Blocker > > When using static PVC with Spark 3.4, spark PI example fails with the error > below. Previous versions of Spark worked well. > {code:java} > 23/04/26 13:22:02 INFO ExecutorPodsAllocator: Going to request 5 executors > from Kubernetes for ResourceProfile Id: 0, target: 5, known: 0, > sharedSlotFromPendingPods: 2147483647. 23/04/26 13:22:02 INFO > BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown > script 23/04/26 13:22:02 ERROR ExecutorPodsSnapshotsStoreImpl: Going to stop > due to IllegalArgumentException java.lang.IllegalArgumentException: PVC > ClaimName: a1pvc should contain OnDemand or SPARK_EXECUTOR_ID when requiring > multiple executors at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.checkPVCClaimName(MountVolumesFeatureStep.scala:135) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:75) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.Iterator.foreach(Iterator.scala:943) at > scala.collection.Iterator.foreach$(Iterator.scala:943) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at > scala.collection.IterableLike.foreach(IterableLike.scala:74) at > scala.collection.IterableLike.foreach$(IterableLike.scala:73) at > scala.collection.AbstractIterable.foreach(Iterable.scala:56) at > scala.collection.TraversableLike.map(TraversableLike.scala:286) at > scala.collection.TraversableLike.map$(TraversableLike.scala:279) at > scala.collection.AbstractTraversable.map(Traversable.scala:108) at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:58) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:35) > at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$5(KubernetesExecutorBuilder.scala:83) > at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > at scala.collection.immutable.List.foldLeft(List.scala:91) at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:82) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:430) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:417) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:370) > at
[jira] [Commented] (SPARK-43329) driver and executors shared same Kubernetes PVC in Spark 3.4+
[ https://issues.apache.org/jira/browse/SPARK-43329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719622#comment-17719622 ] Qian Sun commented on SPARK-43329: -- [~dongjoon] this ticket is duplicated with SPARK-43342 > driver and executors shared same Kubernetes PVC in Spark 3.4+ > - > > Key: SPARK-43329 > URL: https://issues.apache.org/jira/browse/SPARK-43329 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: comet >Priority: Major > > I able to shared same PVC for spark 3.3. but on Spark 3.4 onward. i get below > error. I would like all the executors and driver to mount the same PVC. Is > this a bug ? I don't want to use SPARK_EXECUTOR_ID or OnDemand because > otherwise each of the executors will use an unique and separate PVC. > > Error message is "should contain OnDemand or SPARK_EXECUTOR_ID when requiring > multiple executors" > > below is how I enabled it pvc in spark 3.3 and it works, but does not work in > Spark 3.4 > {code:sh} > spark.kubernetes.driver.volumes.persistentVolumeClaim.rwxpvc.options.claimName=rwxpvc > > --conf > spark.kubernetes.driver.volumes.persistentVolumeClaim.rwxpvc.mount.path=/opt/spark/work-dir > > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.rwxpvc.options.claimName=rwxpvc > > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.rwxpvc.mount.path=/opt/spark/work-dir > > > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43342) Spark in Kubernetes mode throws IllegalArgumentException when using static PVC
[ https://issues.apache.org/jira/browse/SPARK-43342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719162#comment-17719162 ] Qian Sun commented on SPARK-43342: -- [~ofrenkel] Hello, I tried to reproduce using the configuration you provided. There are some issues that I need to confirm with you: * When the driver and executor use the PVC with same claim name, can your executor start normally? * Did your run of spark-pi compute the value of pi? Based on my tests, Spark 3.3 cannot start the executor properly and cannot compute the value of pi. The logs I saw are as follows. {code:java} [kubernetes-executor-snapshots-subscribers-1] WARN org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl - Exception when notifying snapshot subscriber. io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://21.8.0.8:6443/api/v1/namespaces/test-ns/persistentvolumeclaims. Message: persistentvolumeclaims "a1pvc" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=persistentvolumeclaims, name=test, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=persistentvolumeclaims "test" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}). {code} I'm looking forward to any new feedback you have. cc [~dongjoon] > Spark in Kubernetes mode throws IllegalArgumentException when using static PVC > -- > > Key: SPARK-43342 > URL: https://issues.apache.org/jira/browse/SPARK-43342 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Oleg Frenkel >Priority: Blocker > > When using static PVC with Spark 3.4, spark PI example fails with the error > below. Previous versions of Spark worked well. > {code:java} > 23/04/26 13:22:02 INFO ExecutorPodsAllocator: Going to request 5 executors > from Kubernetes for ResourceProfile Id: 0, target: 5, known: 0, > sharedSlotFromPendingPods: 2147483647. 23/04/26 13:22:02 INFO > BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown > script 23/04/26 13:22:02 ERROR ExecutorPodsSnapshotsStoreImpl: Going to stop > due to IllegalArgumentException java.lang.IllegalArgumentException: PVC > ClaimName: a1pvc should contain OnDemand or SPARK_EXECUTOR_ID when requiring > multiple executors at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.checkPVCClaimName(MountVolumesFeatureStep.scala:135) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:75) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.Iterator.foreach(Iterator.scala:943) at > scala.collection.Iterator.foreach$(Iterator.scala:943) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at > scala.collection.IterableLike.foreach(IterableLike.scala:74) at > scala.collection.IterableLike.foreach$(IterableLike.scala:73) at > scala.collection.AbstractIterable.foreach(Iterable.scala:56) at > scala.collection.TraversableLike.map(TraversableLike.scala:286) at > scala.collection.TraversableLike.map$(TraversableLike.scala:279) at > scala.collection.AbstractTraversable.map(Traversable.scala:108) at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:58) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:35) > at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$5(KubernetesExecutorBuilder.scala:83) > at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > at scala.collection.immutable.List.foldLeft(List.scala:91) at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:82) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:430) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:417) > at >
[jira] [Commented] (SPARK-43342) Spark in Kubernetes mode throws IllegalArgumentException when using static PVC
[ https://issues.apache.org/jira/browse/SPARK-43342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719121#comment-17719121 ] Qian Sun commented on SPARK-43342: -- [~dongjoon] [~yikunkero] It seems like a regression caused by [SPARK-39006|https://issues.apache.org/jira/browse/SPARK-39006], please assign to me > Spark in Kubernetes mode throws IllegalArgumentException when using static PVC > -- > > Key: SPARK-43342 > URL: https://issues.apache.org/jira/browse/SPARK-43342 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Oleg Frenkel >Priority: Blocker > > When using static PVC with Spark 3.4, spark PI example fails with the error > below. Previous versions of Spark worked well. > {code:java} > 23/04/26 13:22:02 INFO ExecutorPodsAllocator: Going to request 5 executors > from Kubernetes for ResourceProfile Id: 0, target: 5, known: 0, > sharedSlotFromPendingPods: 2147483647. 23/04/26 13:22:02 INFO > BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown > script 23/04/26 13:22:02 ERROR ExecutorPodsSnapshotsStoreImpl: Going to stop > due to IllegalArgumentException java.lang.IllegalArgumentException: PVC > ClaimName: a1pvc should contain OnDemand or SPARK_EXECUTOR_ID when requiring > multiple executors at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.checkPVCClaimName(MountVolumesFeatureStep.scala:135) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:75) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.Iterator.foreach(Iterator.scala:943) at > scala.collection.Iterator.foreach$(Iterator.scala:943) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at > scala.collection.IterableLike.foreach(IterableLike.scala:74) at > scala.collection.IterableLike.foreach$(IterableLike.scala:73) at > scala.collection.AbstractIterable.foreach(Iterable.scala:56) at > scala.collection.TraversableLike.map(TraversableLike.scala:286) at > scala.collection.TraversableLike.map$(TraversableLike.scala:279) at > scala.collection.AbstractTraversable.map(Traversable.scala:108) at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:58) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:35) > at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$5(KubernetesExecutorBuilder.scala:83) > at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > at scala.collection.immutable.List.foldLeft(List.scala:91) at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:82) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:430) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:417) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:370) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:363) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:363) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:134) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$adapted(ExecutorPodsAllocator.scala:134) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber$$processSnapshotsInternal(ExecutorPodsSnapshotsStoreImpl.scala:143) > at >
[jira] [Updated] (SPARK-41781) Add the ability to create pvc before creating driver/executor pod
[ https://issues.apache.org/jira/browse/SPARK-41781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-41781: - Description: Creating pvc after driver/executor pod has Warning event from default-scheduler, such as {code:java} error getting PVC "spark/application-exec-1-pvc-0": could not find v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code} Normal k8s workflow is to create PVC first and schedule pod to mount PVC. We have a scenes that webhook server will try to reschedule pod and pvc to another pod. Because pvc creation after pod, wehbook couldn't find pvc based on pod metadata. was: Creating pvc after driver/executor pod has Warning event from default-scheduler, such as {code:java} error getting PVC "spark/application-exec-1-pvc-0": could not find v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code} Normally, we need to create PVC first and schedule pod to mount PVC. > Add the ability to create pvc before creating driver/executor pod > - > > Key: SPARK-41781 > URL: https://issues.apache.org/jira/browse/SPARK-41781 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > > Creating pvc after driver/executor pod has Warning event from > default-scheduler, such as > {code:java} > error getting PVC "spark/application-exec-1-pvc-0": could not find > v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code} > Normal k8s workflow is to create PVC first and schedule pod to mount PVC. > > We have a scenes that webhook server will try to reschedule pod and pvc to > another pod. Because pvc creation after pod, wehbook couldn't find pvc based > on pod metadata. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41781) Add the ability to create pvc before creating driver/executor pod
[ https://issues.apache.org/jira/browse/SPARK-41781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-41781: - Description: Creating pvc after driver/executor pod has Warning event from default-scheduler, such as {code:java} error getting PVC "spark/application-exec-1-pvc-0": could not find v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code} Normally, we need to create PVC first and schedule pod to mount PVC. was: Creating resources after executor pod has Warning event from default-scheduler, such as {code:java} error getting PVC "spark/application-exec-1-pvc-0": could not find v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code} Normally, we need to create resources and schedule pod to mount them. > Add the ability to create pvc before creating driver/executor pod > - > > Key: SPARK-41781 > URL: https://issues.apache.org/jira/browse/SPARK-41781 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > > Creating pvc after driver/executor pod has Warning event from > default-scheduler, such as > {code:java} > error getting PVC "spark/application-exec-1-pvc-0": could not find > v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code} > Normally, we need to create PVC first and schedule pod to mount PVC. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41781) Add the ability to create pvc before creating driver/executor pod
[ https://issues.apache.org/jira/browse/SPARK-41781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-41781: - Summary: Add the ability to create pvc before creating driver/executor pod (was: Add the ability to create resources before creating executor pod) > Add the ability to create pvc before creating driver/executor pod > - > > Key: SPARK-41781 > URL: https://issues.apache.org/jira/browse/SPARK-41781 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > > Creating resources after executor pod has Warning event from > default-scheduler, such as > {code:java} > error getting PVC "spark/application-exec-1-pvc-0": could not find > v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code} > Normally, we need to create resources and schedule pod to mount them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41781) Add the ability to create resources before creating executor pod
Qian Sun created SPARK-41781: Summary: Add the ability to create resources before creating executor pod Key: SPARK-41781 URL: https://issues.apache.org/jira/browse/SPARK-41781 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.3.0 Reporter: Qian Sun Creating resources after executor pod has Warning event from default-scheduler, such as {code:java} error getting PVC "spark/application-exec-1-pvc-0": could not find v1.PersistentVolumeClaim "spark/application-exec-1-pvc-0" {code} Normally, we need to create resources and schedule pod to mount them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40569) Add smoke test in standalone cluster for spark-docker
[ https://issues.apache.org/jira/browse/SPARK-40569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40569: - Summary: Add smoke test in standalone cluster for spark-docker (was: Expose port for spark standalone mode) > Add smoke test in standalone cluster for spark-docker > - > > Key: SPARK-40569 > URL: https://issues.apache.org/jira/browse/SPARK-40569 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40969) Unable to download spark 3.3.0 tarball after 3.3.1 release in spark-docker
[ https://issues.apache.org/jira/browse/SPARK-40969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626359#comment-17626359 ] Qian Sun commented on SPARK-40969: -- [~yikunkero] fine with me, I'm working on this :) > Unable to download spark 3.3.0 tarball after 3.3.1 release in spark-docker > -- > > Key: SPARK-40969 > URL: https://issues.apache.org/jira/browse/SPARK-40969 > Project: Spark > Issue Type: Bug > Components: Spark Docker >Affects Versions: 3.3.1 >Reporter: Qian Sun >Priority: Major > > Unable to download spark 3.3.0 tarball in spark-docker. > {code:sh} > #7 0.229 + wget -nv -O spark.tgz > https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz > #7 1.061 > https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz: > #7 1.061 2022-10-31 02:59:20 ERROR 404: Not Found. > -- > executor failed running [/bin/sh -c set -ex; export SPARK_TMP="$(mktemp > -d)"; cd $SPARK_TMP; wget -nv -O spark.tgz "$SPARK_TGZ_URL"; wget > -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; export GNUPGHOME="$(mktemp > -d)"; gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || > gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; gpg > --batch --verify spark.tgz.asc spark.tgz; gpgconf --kill all; rm -rf > "$GNUPGHOME" spark.tgz.asc; tar -xf spark.tgz --strip-components=1; > chown -R spark:spark .; mv jars /opt/spark/; mv bin /opt/spark/; > mv sbin /opt/spark/; mv kubernetes/dockerfiles/spark/decom.sh /opt/; > mv examples /opt/spark/; mv kubernetes/tests /opt/spark/; mv data > /opt/spark/; mv python/pyspark /opt/spark/python/pyspark/; mv > python/lib /opt/spark/python/lib/; cd ..; rm -rf "$SPARK_TMP";]: exit > code: 8 > {code} > And spark 3.3.1 docker is ok > {code:sh} > => [4/9] RUN set -ex; export SPARK_TMP="$(mktemp -d)"; cd $SPARK_TMP; > wget -nv -O spark.tgz > "https://dlcdn.apache.org/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz;; > wget -nv -O spark.tgz.asc "https://downlo 77.8s > => [5/9] COPY entrypoint.sh /opt/ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40969) Unable to download spark 3.3.0 tarball after 3.3.1 release in spark-docker
[ https://issues.apache.org/jira/browse/SPARK-40969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626355#comment-17626355 ] Qian Sun commented on SPARK-40969: -- cc [~yikunkero][~hyukjin.kwon] > Unable to download spark 3.3.0 tarball after 3.3.1 release in spark-docker > -- > > Key: SPARK-40969 > URL: https://issues.apache.org/jira/browse/SPARK-40969 > Project: Spark > Issue Type: Bug > Components: Spark Docker >Affects Versions: 3.3.1 >Reporter: Qian Sun >Priority: Major > > Unable to download spark 3.3.0 tarball in spark-docker. > {code:sh} > #7 0.229 + wget -nv -O spark.tgz > https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz > #7 1.061 > https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz: > #7 1.061 2022-10-31 02:59:20 ERROR 404: Not Found. > -- > executor failed running [/bin/sh -c set -ex; export SPARK_TMP="$(mktemp > -d)"; cd $SPARK_TMP; wget -nv -O spark.tgz "$SPARK_TGZ_URL"; wget > -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; export GNUPGHOME="$(mktemp > -d)"; gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || > gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; gpg > --batch --verify spark.tgz.asc spark.tgz; gpgconf --kill all; rm -rf > "$GNUPGHOME" spark.tgz.asc; tar -xf spark.tgz --strip-components=1; > chown -R spark:spark .; mv jars /opt/spark/; mv bin /opt/spark/; > mv sbin /opt/spark/; mv kubernetes/dockerfiles/spark/decom.sh /opt/; > mv examples /opt/spark/; mv kubernetes/tests /opt/spark/; mv data > /opt/spark/; mv python/pyspark /opt/spark/python/pyspark/; mv > python/lib /opt/spark/python/lib/; cd ..; rm -rf "$SPARK_TMP";]: exit > code: 8 > {code} > And spark 3.3.1 docker is ok > {code:sh} > => [4/9] RUN set -ex; export SPARK_TMP="$(mktemp -d)"; cd $SPARK_TMP; > wget -nv -O spark.tgz > "https://dlcdn.apache.org/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz;; > wget -nv -O spark.tgz.asc "https://downlo 77.8s > => [5/9] COPY entrypoint.sh /opt/ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40969) Unable to download spark 3.3.0 tarball after 3.3.1 release in spark-docker
Qian Sun created SPARK-40969: Summary: Unable to download spark 3.3.0 tarball after 3.3.1 release in spark-docker Key: SPARK-40969 URL: https://issues.apache.org/jira/browse/SPARK-40969 Project: Spark Issue Type: Bug Components: Spark Docker Affects Versions: 3.3.1 Reporter: Qian Sun Unable to download spark 3.3.0 tarball in spark-docker. {code:sh} #7 0.229 + wget -nv -O spark.tgz https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz #7 1.061 https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz: #7 1.061 2022-10-31 02:59:20 ERROR 404: Not Found. -- executor failed running [/bin/sh -c set -ex; export SPARK_TMP="$(mktemp -d)"; cd $SPARK_TMP; wget -nv -O spark.tgz "$SPARK_TGZ_URL"; wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; export GNUPGHOME="$(mktemp -d)"; gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; gpg --batch --verify spark.tgz.asc spark.tgz; gpgconf --kill all; rm -rf "$GNUPGHOME" spark.tgz.asc; tar -xf spark.tgz --strip-components=1; chown -R spark:spark .; mv jars /opt/spark/; mv bin /opt/spark/; mv sbin /opt/spark/; mv kubernetes/dockerfiles/spark/decom.sh /opt/; mv examples /opt/spark/; mv kubernetes/tests /opt/spark/; mv data /opt/spark/; mv python/pyspark /opt/spark/python/pyspark/; mv python/lib /opt/spark/python/lib/; cd ..; rm -rf "$SPARK_TMP";]: exit code: 8 {code} And spark 3.3.1 docker is ok {code:sh} => [4/9] RUN set -ex; export SPARK_TMP="$(mktemp -d)"; cd $SPARK_TMP; wget -nv -O spark.tgz "https://dlcdn.apache.org/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz;; wget -nv -O spark.tgz.asc "https://downlo 77.8s => [5/9] COPY entrypoint.sh /opt/ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40954) Kubernetes integration tests stuck forever on Mac M1 with Minikube + Docker
[ https://issues.apache.org/jira/browse/SPARK-40954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626345#comment-17626345 ] Qian Sun commented on SPARK-40954: -- Hi [~_anton] I use hyperkit as minikube driver. Could you try this command for starting minikube? {code:sh} minikube --driver=hyperkit start {code} > Kubernetes integration tests stuck forever on Mac M1 with Minikube + Docker > --- > > Key: SPARK-40954 > URL: https://issues.apache.org/jira/browse/SPARK-40954 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.3.1 > Environment: MacOS 12.6 (Mac M1) > Minikube 1.27.1 > Docker 20.10.17 >Reporter: Anton Ippolitov >Priority: Minor > Attachments: TestProcess.scala > > > h2. Description > I tried running Kubernetes integration tests with the Minikube backend (+ > Docker driver) from commit c26d99e3f104f6603e0849d82eca03e28f196551 on > Spark's master branch. I ran them with the following command: > > {code:java} > mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \ > -Pkubernetes -Pkubernetes-integration-tests \ > -Phadoop-3 \ > -Dspark.kubernetes.test.imageTag=MY_IMAGE_TAG_HERE \ > -Dspark.kubernetes.test.imageRepo=docker.io/kubespark > \ > -Dspark.kubernetes.test.namespace=spark \ > -Dspark.kubernetes.test.serviceAccountName=spark \ > -Dspark.kubernetes.test.deployMode=minikube {code} > However the test suite got stuck literally for hours on my machine. > > h2. Investigation > I ran {{jstack}} on the process that was running the tests and saw that it > was stuck here: > > {noformat} > "ScalaTest-main-running-KubernetesSuite" #1 prio=5 os_prio=31 > tid=0x7f78d580b800 nid=0x2503 runnable [0x000304749000] > java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x00076c0b6f40> (a > java.lang.UNIXProcess$ProcessPipeInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x00076c0bb410> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > - locked <0x00076c0bb410> (a java.io.InputStreamReader) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at > org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2(ProcessUtils.scala:45) > at > org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.$anonfun$executeProcess$2$adapted(ProcessUtils.scala:45) > at > org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$$$Lambda$322/20156341.apply(Unknown > Source) > at > org.apache.spark.deploy.k8s.integrationtest.Utils$.tryWithResource(Utils.scala:49) > at > org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.executeProcess(ProcessUtils.scala:45) > at > org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.executeMinikube(Minikube.scala:103) > at > org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.minikubeServiceAction(Minikube.scala:112) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$getServiceUrl$1(DepsTestsSuite.scala:281) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite$$Lambda$611/1461360262.apply(Unknown > Source) > at > org.scalatest.enablers.Retrying$$anon$4.makeAValiantAttempt$1(Retrying.scala:184) > at > org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:196) > at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226) > at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:313) > at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:312) > at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457) > at >
[jira] [Created] (SPARK-40866) Rename Check Spark repo as Check Spark Docker repo in GA
Qian Sun created SPARK-40866: Summary: Rename Check Spark repo as Check Spark Docker repo in GA Key: SPARK-40866 URL: https://issues.apache.org/jira/browse/SPARK-40866 Project: Spark Issue Type: Sub-task Components: Spark Docker Affects Versions: 3.3.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40855) Add CONTRIBUTING.md to apache/spark-docker
Qian Sun created SPARK-40855: Summary: Add CONTRIBUTING.md to apache/spark-docker Key: SPARK-40855 URL: https://issues.apache.org/jira/browse/SPARK-40855 Project: Spark Issue Type: Sub-task Components: Spark Docker Affects Versions: 3.3.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40726) Supplement undocumented orc configurations in documentation
Qian Sun created SPARK-40726: Summary: Supplement undocumented orc configurations in documentation Key: SPARK-40726 URL: https://issues.apache.org/jira/browse/SPARK-40726 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.3.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40710) Supplement undocumented parquet configurations in documentation
Qian Sun created SPARK-40710: Summary: Supplement undocumented parquet configurations in documentation Key: SPARK-40710 URL: https://issues.apache.org/jira/browse/SPARK-40710 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.3.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40709) Supplement undocumented avro configurations in documentation
Qian Sun created SPARK-40709: Summary: Supplement undocumented avro configurations in documentation Key: SPARK-40709 URL: https://issues.apache.org/jira/browse/SPARK-40709 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.3.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40699) Supplement undocumented yarn configuration in documentation
Qian Sun created SPARK-40699: Summary: Supplement undocumented yarn configuration in documentation Key: SPARK-40699 URL: https://issues.apache.org/jira/browse/SPARK-40699 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.4.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40675) Supplement missing spark configuration in documentation
[ https://issues.apache.org/jira/browse/SPARK-40675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40675: - Summary: Supplement missing spark configuration in documentation (was: Supplement missing spark configuration in configuration.md) > Supplement missing spark configuration in documentation > --- > > Key: SPARK-40675 > URL: https://issues.apache.org/jira/browse/SPARK-40675 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Qian Sun >Priority: Major > > Supplement missing spark configuration in documentation to make the > documentation more readable. User can check configuration through > documentation instead of code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40675) Supplement missing spark configuration in configuration.md
[ https://issues.apache.org/jira/browse/SPARK-40675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40675: - Description: Supplement missing spark configuration in documentation to make the documentation more readable. User can check configuration through documentation instead of code. (was: Supplement missing spark configuration to documentation to make the documentation more readable. User can check configuration through documentation instead of code.) > Supplement missing spark configuration in configuration.md > -- > > Key: SPARK-40675 > URL: https://issues.apache.org/jira/browse/SPARK-40675 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Qian Sun >Priority: Major > > Supplement missing spark configuration in documentation to make the > documentation more readable. User can check configuration through > documentation instead of code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40675) Supplement missing spark configuration in configuration.md
[ https://issues.apache.org/jira/browse/SPARK-40675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40675: - Description: Supplement missing spark configuration to documentation to make the documentation more readable. User can check configuration through documentation instead of code. (was: Add missing spark configuration to documentation to make the documentation more readable. User can check configuration through documentation instead of code.) > Supplement missing spark configuration in configuration.md > -- > > Key: SPARK-40675 > URL: https://issues.apache.org/jira/browse/SPARK-40675 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Qian Sun >Priority: Major > > Supplement missing spark configuration to documentation to make the > documentation more readable. User can check configuration through > documentation instead of code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40675) Supplement missing spark configuration in configuration.md
[ https://issues.apache.org/jira/browse/SPARK-40675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40675: - Summary: Supplement missing spark configuration in configuration.md (was: Add missing spark configuration to documentation) > Supplement missing spark configuration in configuration.md > -- > > Key: SPARK-40675 > URL: https://issues.apache.org/jira/browse/SPARK-40675 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Qian Sun >Priority: Major > > Add missing spark configuration to documentation to make the documentation > more readable. User can check configuration through documentation instead of > code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40675) Add missing spark configuration to documentation
Qian Sun created SPARK-40675: Summary: Add missing spark configuration to documentation Key: SPARK-40675 URL: https://issues.apache.org/jira/browse/SPARK-40675 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 3.4.0 Reporter: Qian Sun Add missing spark configuration to documentation to make the documentation more readable. User can check configuration through documentation instead of code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40569) Expose port for spark standalone mode
[ https://issues.apache.org/jira/browse/SPARK-40569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611898#comment-17611898 ] Qian Sun commented on SPARK-40569: -- [~bjornjorgensen]Thanks for your share > Expose port for spark standalone mode > - > > Key: SPARK-40569 > URL: https://issues.apache.org/jira/browse/SPARK-40569 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in Task Table of Stage Tab
[ https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40572: - Priority: Minor (was: Major) > Executor ID sorted as lexicographical order in Task Table of Stage Tab > -- > > Key: SPARK-40572 > URL: https://issues.apache.org/jira/browse/SPARK-40572 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Minor > Attachments: Executor_ID_IN_STAGES_TAB.png > > > As figure shows, Executor ID sorted as lexicographical order in UI Stages > Tab. Better sort as number order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in Task Table of Stage Tab
[ https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40572: - Issue Type: Improvement (was: Bug) > Executor ID sorted as lexicographical order in Task Table of Stage Tab > -- > > Key: SPARK-40572 > URL: https://issues.apache.org/jira/browse/SPARK-40572 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Minor > Attachments: Executor_ID_IN_STAGES_TAB.png > > > As figure shows, Executor ID sorted as lexicographical order in UI Stages > Tab. Better sort as number order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab
[ https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609785#comment-17609785 ] Qian Sun commented on SPARK-40572: -- I think the root cause is that [executorId is string in TaskDataWrapper|https://github.com/apache/spark/blob/072575c9e6fc304f09e01ad0ee180c8f309ede91/core/src/main/scala/org/apache/spark/status/storeTypes.scala#L174-L175]. Executor ID is string in apache spark and there are tons of changes that will be introduced into apache spark if modify the type. > Executor ID sorted as lexicographical order in UI Stages Tab > > > Key: SPARK-40572 > URL: https://issues.apache.org/jira/browse/SPARK-40572 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > Attachments: Executor_ID_IN_STAGES_TAB.png > > > As figure shows, Executor ID sorted as lexicographical order in UI Stages > Tab. Better sort as number order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in Task Table of Stage Tab
[ https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40572: - Summary: Executor ID sorted as lexicographical order in Task Table of Stage Tab (was: Executor ID sorted as lexicographical order in UI Stages Tab) > Executor ID sorted as lexicographical order in Task Table of Stage Tab > -- > > Key: SPARK-40572 > URL: https://issues.apache.org/jira/browse/SPARK-40572 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > Attachments: Executor_ID_IN_STAGES_TAB.png > > > As figure shows, Executor ID sorted as lexicographical order in UI Stages > Tab. Better sort as number order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab
[ https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40572: - Attachment: Executor_ID_IN_STAGES_TAB.png > Executor ID sorted as lexicographical order in UI Stages Tab > > > Key: SPARK-40572 > URL: https://issues.apache.org/jira/browse/SPARK-40572 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > Attachments: Executor_ID_IN_STAGES_TAB.png > > > As figure shows, Executor ID sorted as lexicographical order in UI Stages > Tab. Better sort as number order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab
Qian Sun created SPARK-40572: Summary: Executor ID sorted as lexicographical order in UI Stages Tab Key: SPARK-40572 URL: https://issues.apache.org/jira/browse/SPARK-40572 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.3.0 Reporter: Qian Sun As figure shows, Executor ID sorted as lexicographical order in UI Stages Tab. Better sort as number order !image-2022-09-27-09-26-46-755.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab
[ https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40572: - Description: As figure shows, Executor ID sorted as lexicographical order in UI Stages Tab. Better sort as number order (was: As figure shows, Executor ID sorted as lexicographical order in UI Stages Tab. Better sort as number order !image-2022-09-27-09-26-46-755.png!) > Executor ID sorted as lexicographical order in UI Stages Tab > > > Key: SPARK-40572 > URL: https://issues.apache.org/jira/browse/SPARK-40572 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > > As figure shows, Executor ID sorted as lexicographical order in UI Stages > Tab. Better sort as number order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40570) Add doc for Docker Setup in standalone mode
Qian Sun created SPARK-40570: Summary: Add doc for Docker Setup in standalone mode Key: SPARK-40570 URL: https://issues.apache.org/jira/browse/SPARK-40570 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.4.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40569) Expose port for spark standalone mode
Qian Sun created SPARK-40569: Summary: Expose port for spark standalone mode Key: SPARK-40569 URL: https://issues.apache.org/jira/browse/SPARK-40569 Project: Spark Issue Type: Sub-task Components: Project Infra Affects Versions: 3.4.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40160) Make pyspark.broadcast examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583291#comment-17583291 ] Qian Sun commented on SPARK-40160: -- working on it :) > Make pyspark.broadcast examples self-contained > -- > > Key: SPARK-40160 > URL: https://issues.apache.org/jira/browse/SPARK-40160 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40148) Make pyspark.sql.window examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582535#comment-17582535 ] Qian Sun commented on SPARK-40148: -- [~hyukjin.kwon] OK, I'll create a follow-up PR to do these :) > Make pyspark.sql.window examples self-contained > --- > > Key: SPARK-40148 > URL: https://issues.apache.org/jira/browse/SPARK-40148 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40148) Make pyspark.sql.window examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582454#comment-17582454 ] Qian Sun commented on SPARK-40148: -- [~hyukjin.kwon] Hi, it seems like that it is same with [SPARK-40010] Make pyspark.sql.window examples self-contained - ASF JIRA (apache.org) Is there anything else that needs to be done in pyspark.sql.window? I'd like to work on it. > Make pyspark.sql.window examples self-contained > --- > > Key: SPARK-40148 > URL: https://issues.apache.org/jira/browse/SPARK-40148 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40148) Make pyspark.sql.window examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582454#comment-17582454 ] Qian Sun edited comment on SPARK-40148 at 8/21/22 3:30 AM: --- [~hyukjin.kwon] Hi, it seems like that it is same with SPARK-40010 Is there anything else that needs to be done in pyspark.sql.window? I'd like to work on it. was (Author: dcoliversun): [~hyukjin.kwon] Hi, it seems like that it is same with [SPARK-40010] Make pyspark.sql.window examples self-contained - ASF JIRA (apache.org) Is there anything else that needs to be done in pyspark.sql.window? I'd like to work on it. > Make pyspark.sql.window examples self-contained > --- > > Key: SPARK-40148 > URL: https://issues.apache.org/jira/browse/SPARK-40148 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40160) Make pyspark.broadcast examples self-contained
Qian Sun created SPARK-40160: Summary: Make pyspark.broadcast examples self-contained Key: SPARK-40160 URL: https://issues.apache.org/jira/browse/SPARK-40160 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 3.4.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40081) Add Document Parameters for pyspark.sql.streaming.query
[ https://issues.apache.org/jira/browse/SPARK-40081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582149#comment-17582149 ] Qian Sun commented on SPARK-40081: -- [~hyukjin.kwon] Yes, I'm working on it > Add Document Parameters for pyspark.sql.streaming.query > --- > > Key: SPARK-40081 > URL: https://issues.apache.org/jira/browse/SPARK-40081 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40081) Add Document Parameters for pyspark.sql.streaming.query
Qian Sun created SPARK-40081: Summary: Add Document Parameters for pyspark.sql.streaming.query Key: SPARK-40081 URL: https://issues.apache.org/jira/browse/SPARK-40081 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 3.4.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40078) Make pyspark.sql.column examples self-contained
Qian Sun created SPARK-40078: Summary: Make pyspark.sql.column examples self-contained Key: SPARK-40078 URL: https://issues.apache.org/jira/browse/SPARK-40078 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 3.4.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40042) Make pyspark.sql.streaming.query examples self-contained
Qian Sun created SPARK-40042: Summary: Make pyspark.sql.streaming.query examples self-contained Key: SPARK-40042 URL: https://issues.apache.org/jira/browse/SPARK-40042 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 3.4.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40041) Add Document Parameters for pyspark.sql.window
Qian Sun created SPARK-40041: Summary: Add Document Parameters for pyspark.sql.window Key: SPARK-40041 URL: https://issues.apache.org/jira/browse/SPARK-40041 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 3.4.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40010) Make pyspark.sql.windown examples self-contained
Qian Sun created SPARK-40010: Summary: Make pyspark.sql.windown examples self-contained Key: SPARK-40010 URL: https://issues.apache.org/jira/browse/SPARK-40010 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 3.4.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39676) Add task partition id for Task assertEquals method in JsonProtocolSuite
Qian Sun created SPARK-39676: Summary: Add task partition id for Task assertEquals method in JsonProtocolSuite Key: SPARK-39676 URL: https://issues.apache.org/jira/browse/SPARK-39676 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.0 Reporter: Qian Sun Fix For: 3.4.0 In https://issues.apache.org/jira/browse/SPARK-37831, Add task partition id in metrics. But JsonProtocolSuite doesn't add task partition id. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39608) Upgrade to spark 3.3.0 is causing error "Cannot grow BufferHolder by size -179446840 because the size is negative"
[ https://issues.apache.org/jira/browse/SPARK-39608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561380#comment-17561380 ] Qian Sun commented on SPARK-39608: -- Could you share more information? Such as spark application code or generated code > Upgrade to spark 3.3.0 is causing error "Cannot grow BufferHolder by size > -179446840 because the size is negative" > -- > > Key: SPARK-39608 > URL: https://issues.apache.org/jira/browse/SPARK-39608 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Isaac Eliassi >Priority: Critical > > Hi, > > We recently upgraded to version 3.3.0. > The upgrade is causing the following error "Cannot grow BufferHolder by size > -179446840 because the size is negative" > > I can't find information on this on the internet, when reverting to spark > 3.2.1 it works. > > Full exception: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in > stage 36.0 failed 4 times, most recent failure: Lost task 1.3 in stage 36.0 > (TID 2873) (172.24.214.133 executor 4): java.lang.IllegalArgumentException: > Cannot grow BufferHolder by size -143657042 because the size is negative > at > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:67) > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.grow(UnsafeWriter.java:63) > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:165) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage24.smj_consumeFullOuterJoinRow_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage24.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$2.hasNext(WholeStageCodegenExec.scala:779) > at > org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer$$anon$1.hasNext(InMemoryRelation.scala:118) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223) > at > org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:302) > at > org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1508) > at > org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1435) > at > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1499) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1322) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:376) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:327) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) > at org.apache.spark.scheduler.Task.run(Task.scala:136) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) >
[jira] [Commented] (SPARK-39430) The inconsistent timezone in Spark History Server UI
Title: Message Title Qian Sun commented on SPARK-39430 Re: The inconsistent timezone in Spark History Server UI Surbhi Hi. I try it again and this phenomenon is not reproduced. I use SHS 3.2.1 and spark 3.2.0. Was your spark application running in IST timezone? Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)