[jira] [Commented] (SPARK-32898) totalExecutorRunTimeMs is too big

2020-09-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199840#comment-17199840
 ] 

Apache Spark commented on SPARK-32898:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/29832

> totalExecutorRunTimeMs is too big
> -
>
> Key: SPARK-32898
> URL: https://issues.apache.org/jira/browse/SPARK-32898
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7, 3.0.1
>Reporter: Linhong Liu
>Assignee: wuyi
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>
> This might be because of incorrectly calculating executorRunTimeMs in 
> Executor.scala
>  The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can 
> be called when taskStartTimeNs is not set yet (it is 0).
> As of now in master branch, here is the problematic code: 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470]
>  
> There is a throw exception before this line. The catch branch still updates 
> the metric.
>  However the query shows as SUCCESSful. Maybe this task is speculative. Not 
> sure.
>  
> submissionTime in LiveExecutionData may also have similar problem.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32898) totalExecutorRunTimeMs is too big

2020-09-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197763#comment-17197763
 ] 

Apache Spark commented on SPARK-32898:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/29789

> totalExecutorRunTimeMs is too big
> -
>
> Key: SPARK-32898
> URL: https://issues.apache.org/jira/browse/SPARK-32898
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Linhong Liu
>Priority: Major
>
> This might be because of incorrectly calculating executorRunTimeMs in 
> Executor.scala
>  The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can 
> be called when taskStartTimeNs is not set yet (it is 0).
> As of now in master branch, here is the problematic code: 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470]
>  
> There is a throw exception before this line. The catch branch still updates 
> the metric.
>  However the query shows as SUCCESSful. Maybe this task is speculative. Not 
> sure.
>  
> submissionTime in LiveExecutionData may also have similar problem.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32898) totalExecutorRunTimeMs is too big

2020-09-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197762#comment-17197762
 ] 

Apache Spark commented on SPARK-32898:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/29789

> totalExecutorRunTimeMs is too big
> -
>
> Key: SPARK-32898
> URL: https://issues.apache.org/jira/browse/SPARK-32898
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Linhong Liu
>Priority: Major
>
> This might be because of incorrectly calculating executorRunTimeMs in 
> Executor.scala
>  The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can 
> be called when taskStartTimeNs is not set yet (it is 0).
> As of now in master branch, here is the problematic code: 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470]
>  
> There is a throw exception before this line. The catch branch still updates 
> the metric.
>  However the query shows as SUCCESSful. Maybe this task is speculative. Not 
> sure.
>  
> submissionTime in LiveExecutionData may also have similar problem.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32898) totalExecutorRunTimeMs is too big

2020-09-16 Thread wuyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197341#comment-17197341
 ] 

wuyi commented on SPARK-32898:
--

I think the issue is(for executorRunTimeMs): Before a task reaches to 
"taskStartTimeNs = System.nanoTime()", it might be already killed(e.g., by 
another successful attempt).  So, taskStartTimeNs can not get initialized and 
remains 0. However, the executorRunTimeMs is calculated by "System.nanoTime() - 
taskStartTimeNs" in collectAccumulatorsAndResetStatusOnFailure, which is 
obviously a wrong big result when taskStartTimeNs = 0.

 

I haven't taken a detail look for the submissionTime, but it sounds like it's a 
different issue? Though, it may be due to the same logic hole.

 

I'd like to make a fix for the executorRunTimeMs first if [~linhongliu-db] 
doesn't mind.

> totalExecutorRunTimeMs is too big
> -
>
> Key: SPARK-32898
> URL: https://issues.apache.org/jira/browse/SPARK-32898
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Linhong Liu
>Priority: Major
>
> This might be because of incorrectly calculating executorRunTimeMs in 
> Executor.scala
>  The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can 
> be called when taskStartTimeNs is not set yet (it is 0).
> As of now in master branch, here is the problematic code: 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470]
>  
> There is a throw exception before this line. The catch branch still updates 
> the metric.
>  However the query shows as SUCCESSful. Maybe this task is speculative. Not 
> sure.
>  
> submissionTime in LiveExecutionData may also have similar problem.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32898) totalExecutorRunTimeMs is too big

2020-09-16 Thread Thomas Graves (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196972#comment-17196972
 ] 

Thomas Graves commented on SPARK-32898:
---

[~linhongliu-db] can you please provide more of a description. You say this was 
too big, did it cause an error for your job or you just noticed the time was to 
big?  Do you have a reproducible case?

You have some details there about what might be wrong with  taskStartTimeNs 
possibly not initialized, if you can give more details there in generally that 
would be great as its a bit hard to follow your description.  If you have spent 
the time to debug you and have a fix in mind please feel free to put up a pull 
request.

> totalExecutorRunTimeMs is too big
> -
>
> Key: SPARK-32898
> URL: https://issues.apache.org/jira/browse/SPARK-32898
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Linhong Liu
>Priority: Major
>
> This might be because of incorrectly calculating executorRunTimeMs in 
> Executor.scala
> The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can 
> be called when taskStartTimeNs is not set yet (it is 0).
> As of now in master branch, here is the problematic code: 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470]
>  
> There is a throw exception before this line. The catch branch still updates 
> the metric.
> However the query shows as SUCCESSful in QPL. Maybe this task is speculative. 
> Not sure.
>  
> submissionTime in LiveExecutionData may also have similar problem.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org