[jira] [Commented] (SPARK-44881) Executor stucked on retrying to fetch shuffle data when `java.lang.OutOfMemoryError. unable to create native thread` exception occurred.
[ https://issues.apache.org/jira/browse/SPARK-44881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756445#comment-17756445 ] Ignite TC Bot commented on SPARK-44881: --- User 'hgs19921112' has created a pull request for this issue: https://github.com/apache/spark/pull/42572 > Executor stucked on retrying to fetch shuffle data when > `java.lang.OutOfMemoryError. unable to create native thread` exception > occurred. > > > Key: SPARK-44881 > URL: https://issues.apache.org/jira/browse/SPARK-44881 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: hgs >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44881) Executor stucked on retrying to fetch shuffle data when `java.lang.OutOfMemoryError. unable to create native thread` exception occurred.
[ https://issues.apache.org/jira/browse/SPARK-44881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hgs updated SPARK-44881: Summary: Executor stucked on retrying to fetch shuffle data when `java.lang.OutOfMemoryError. unable to create native thread` exception occurred. (was: Executor stucked on retry to fetch shuffle data when `java.lang.OutOfMemoryError. unable to create native thread` exception occurred.) > Executor stucked on retrying to fetch shuffle data when > `java.lang.OutOfMemoryError. unable to create native thread` exception > occurred. > > > Key: SPARK-44881 > URL: https://issues.apache.org/jira/browse/SPARK-44881 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: hgs >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44881) Executor stucked on retry to fetch shuffle data when `java.lang.OutOfMemoryError. unable to create native thread` exception occurred.
hgs created SPARK-44881: --- Summary: Executor stucked on retry to fetch shuffle data when `java.lang.OutOfMemoryError. unable to create native thread` exception occurred. Key: SPARK-44881 URL: https://issues.apache.org/jira/browse/SPARK-44881 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.0 Reporter: hgs -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44880) Remove unnecessary curly braces at the end of the thread locks info
[ https://issues.apache.org/jira/browse/SPARK-44880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-44880: --- Assignee: Kent Yao > Remove unnecessary curly braces at the end of the thread locks info > --- > > Key: SPARK-44880 > URL: https://issues.apache.org/jira/browse/SPARK-44880 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > Remove unnecessary curly braces at the end of the thread locks info -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44880) Remove unnecessary curly braces at the end of the thread locks info
[ https://issues.apache.org/jira/browse/SPARK-44880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-44880. - Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42571 [https://github.com/apache/spark/pull/42571] > Remove unnecessary curly braces at the end of the thread locks info > --- > > Key: SPARK-44880 > URL: https://issues.apache.org/jira/browse/SPARK-44880 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0 >Reporter: Kent Yao >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > Remove unnecessary curly braces at the end of the thread locks info -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44880) Remove unnecessary curly braces at the end of the thread locks info
[ https://issues.apache.org/jira/browse/SPARK-44880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44880: Fix Version/s: 3.5.1 (was: 3.5.0) > Remove unnecessary curly braces at the end of the thread locks info > --- > > Key: SPARK-44880 > URL: https://issues.apache.org/jira/browse/SPARK-44880 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 4.0.0, 3.5.1 > > > Remove unnecessary curly braces at the end of the thread locks info -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44874) Handle unrecognizable exceptions
[ https://issues.apache.org/jira/browse/SPARK-44874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yihong He resolved SPARK-44874. --- Resolution: Duplicate > Handle unrecognizable exceptions > > > Key: SPARK-44874 > URL: https://issues.apache.org/jira/browse/SPARK-44874 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yihong He >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44862) Adding metric tracking number of jobs running on a cluster
[ https://issues.apache.org/jira/browse/SPARK-44862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-44862: Assignee: Kent Yao > Adding metric tracking number of jobs running on a cluster > -- > > Key: SPARK-44862 > URL: https://issues.apache.org/jira/browse/SPARK-44862 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Kent Yao >Priority: Major > > We frequently come across issues where in there are considerable number of > jobs/notebooks running on one interactive cluster. This leads to either > degraded performance due to cluster being overloaded or leads to driver OOM. > It would be helpful to have a metric identifying number of notebooks/jobs > running on a cluster as this would be important datapoint for customers to > balance out their workloads. > We propose to add a metric which keeps track of number of jobs running on a > cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44880) Remove unnecessary curly braces at the end of the thread locks info
Kent Yao created SPARK-44880: Summary: Remove unnecessary curly braces at the end of the thread locks info Key: SPARK-44880 URL: https://issues.apache.org/jira/browse/SPARK-44880 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.4.1, 3.3.2, 3.5.0, 4.0.0 Reporter: Kent Yao Remove unnecessary curly braces at the end of the thread locks info -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22876) spark.yarn.am.attemptFailuresValidityInterval does not work correctly
[ https://issues.apache.org/jira/browse/SPARK-22876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756323#comment-17756323 ] Adam Binford commented on SPARK-22876: -- Attempting to finally add support for this: https://github.com/apache/spark/pull/42570 > spark.yarn.am.attemptFailuresValidityInterval does not work correctly > - > > Key: SPARK-22876 > URL: https://issues.apache.org/jira/browse/SPARK-22876 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 2.2.0 > Environment: hadoop version 2.7.3 >Reporter: Jinhan Zhong >Priority: Minor > Labels: bulk-closed > > I assume we can use spark.yarn.maxAppAttempts together with > spark.yarn.am.attemptFailuresValidityInterval to make a long running > application avoid stopping after acceptable number of failures. > But after testing, I found that the application always stops after failing n > times ( n is minimum value of spark.yarn.maxAppAttempts and > yarn.resourcemanager.am.max-attempts from client yarn-site.xml) > for example, following setup will allow the application master to fail 20 > times. > * spark.yarn.am.attemptFailuresValidityInterval=1s > * spark.yarn.maxAppAttempts=20 > * yarn client: yarn.resourcemanager.am.max-attempts=20 > * yarn resource manager: yarn.resourcemanager.am.max-attempts=3 > And after checking the source code, I found in source file > ApplicationMaster.scala > https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L293 > there's a ShutdownHook that checks the attempt id against the maxAppAttempts, > if attempt id >= maxAppAttempts, it will try to unregister the application > and the application will finish. > is this a expected design or a bug? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-22876) spark.yarn.am.attemptFailuresValidityInterval does not work correctly
[ https://issues.apache.org/jira/browse/SPARK-22876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Binford reopened SPARK-22876: -- > spark.yarn.am.attemptFailuresValidityInterval does not work correctly > - > > Key: SPARK-22876 > URL: https://issues.apache.org/jira/browse/SPARK-22876 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 2.2.0 > Environment: hadoop version 2.7.3 >Reporter: Jinhan Zhong >Priority: Minor > Labels: bulk-closed > > I assume we can use spark.yarn.maxAppAttempts together with > spark.yarn.am.attemptFailuresValidityInterval to make a long running > application avoid stopping after acceptable number of failures. > But after testing, I found that the application always stops after failing n > times ( n is minimum value of spark.yarn.maxAppAttempts and > yarn.resourcemanager.am.max-attempts from client yarn-site.xml) > for example, following setup will allow the application master to fail 20 > times. > * spark.yarn.am.attemptFailuresValidityInterval=1s > * spark.yarn.maxAppAttempts=20 > * yarn client: yarn.resourcemanager.am.max-attempts=20 > * yarn resource manager: yarn.resourcemanager.am.max-attempts=3 > And after checking the source code, I found in source file > ApplicationMaster.scala > https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L293 > there's a ShutdownHook that checks the attempt id against the maxAppAttempts, > if attempt id >= maxAppAttempts, it will try to unregister the application > and the application will finish. > is this a expected design or a bug? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org