[jira] [Commented] (SPARK-32898) totalExecutorRunTimeMs is too big
[ https://issues.apache.org/jira/browse/SPARK-32898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199840#comment-17199840 ] Apache Spark commented on SPARK-32898: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/29832 > totalExecutorRunTimeMs is too big > - > > Key: SPARK-32898 > URL: https://issues.apache.org/jira/browse/SPARK-32898 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.7, 3.0.1 >Reporter: Linhong Liu >Assignee: wuyi >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > This might be because of incorrectly calculating executorRunTimeMs in > Executor.scala > The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can > be called when taskStartTimeNs is not set yet (it is 0). > As of now in master branch, here is the problematic code: > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470] > > There is a throw exception before this line. The catch branch still updates > the metric. > However the query shows as SUCCESSful. Maybe this task is speculative. Not > sure. > > submissionTime in LiveExecutionData may also have similar problem. > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32898) totalExecutorRunTimeMs is too big
[ https://issues.apache.org/jira/browse/SPARK-32898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197763#comment-17197763 ] Apache Spark commented on SPARK-32898: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/29789 > totalExecutorRunTimeMs is too big > - > > Key: SPARK-32898 > URL: https://issues.apache.org/jira/browse/SPARK-32898 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Linhong Liu >Priority: Major > > This might be because of incorrectly calculating executorRunTimeMs in > Executor.scala > The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can > be called when taskStartTimeNs is not set yet (it is 0). > As of now in master branch, here is the problematic code: > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470] > > There is a throw exception before this line. The catch branch still updates > the metric. > However the query shows as SUCCESSful. Maybe this task is speculative. Not > sure. > > submissionTime in LiveExecutionData may also have similar problem. > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32898) totalExecutorRunTimeMs is too big
[ https://issues.apache.org/jira/browse/SPARK-32898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197762#comment-17197762 ] Apache Spark commented on SPARK-32898: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/29789 > totalExecutorRunTimeMs is too big > - > > Key: SPARK-32898 > URL: https://issues.apache.org/jira/browse/SPARK-32898 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Linhong Liu >Priority: Major > > This might be because of incorrectly calculating executorRunTimeMs in > Executor.scala > The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can > be called when taskStartTimeNs is not set yet (it is 0). > As of now in master branch, here is the problematic code: > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470] > > There is a throw exception before this line. The catch branch still updates > the metric. > However the query shows as SUCCESSful. Maybe this task is speculative. Not > sure. > > submissionTime in LiveExecutionData may also have similar problem. > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32898) totalExecutorRunTimeMs is too big
[ https://issues.apache.org/jira/browse/SPARK-32898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197341#comment-17197341 ] wuyi commented on SPARK-32898: -- I think the issue is(for executorRunTimeMs): Before a task reaches to "taskStartTimeNs = System.nanoTime()", it might be already killed(e.g., by another successful attempt). So, taskStartTimeNs can not get initialized and remains 0. However, the executorRunTimeMs is calculated by "System.nanoTime() - taskStartTimeNs" in collectAccumulatorsAndResetStatusOnFailure, which is obviously a wrong big result when taskStartTimeNs = 0. I haven't taken a detail look for the submissionTime, but it sounds like it's a different issue? Though, it may be due to the same logic hole. I'd like to make a fix for the executorRunTimeMs first if [~linhongliu-db] doesn't mind. > totalExecutorRunTimeMs is too big > - > > Key: SPARK-32898 > URL: https://issues.apache.org/jira/browse/SPARK-32898 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Linhong Liu >Priority: Major > > This might be because of incorrectly calculating executorRunTimeMs in > Executor.scala > The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can > be called when taskStartTimeNs is not set yet (it is 0). > As of now in master branch, here is the problematic code: > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470] > > There is a throw exception before this line. The catch branch still updates > the metric. > However the query shows as SUCCESSful. Maybe this task is speculative. Not > sure. > > submissionTime in LiveExecutionData may also have similar problem. > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32898) totalExecutorRunTimeMs is too big
[ https://issues.apache.org/jira/browse/SPARK-32898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196972#comment-17196972 ] Thomas Graves commented on SPARK-32898: --- [~linhongliu-db] can you please provide more of a description. You say this was too big, did it cause an error for your job or you just noticed the time was to big? Do you have a reproducible case? You have some details there about what might be wrong with taskStartTimeNs possibly not initialized, if you can give more details there in generally that would be great as its a bit hard to follow your description. If you have spent the time to debug you and have a fix in mind please feel free to put up a pull request. > totalExecutorRunTimeMs is too big > - > > Key: SPARK-32898 > URL: https://issues.apache.org/jira/browse/SPARK-32898 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Linhong Liu >Priority: Major > > This might be because of incorrectly calculating executorRunTimeMs in > Executor.scala > The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can > be called when taskStartTimeNs is not set yet (it is 0). > As of now in master branch, here is the problematic code: > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470] > > There is a throw exception before this line. The catch branch still updates > the metric. > However the query shows as SUCCESSful in QPL. Maybe this task is speculative. > Not sure. > > submissionTime in LiveExecutionData may also have similar problem. > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org