[ 
https://issues.apache.org/jira/browse/SPARK-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739619#comment-14739619
 ] 

Ryan Williams commented on SPARK-10551:
---------------------------------------

The same behavior is observable on a second task, {{13737}}, in that file:

{code}
$ grep -n '"Task ID":13737' application_1439224376754_5702

26576:{"Event":"SparkListenerTaskStart","Stage ID":6,"Stage Attempt ID":0,"Task 
Info":{"Task ID":13737,"Index":1316,"Attempt":0,"Launch 
Time":1440703706769,"Executor 
ID":"232","Host":"demeter-csmaz11-11.demeter.hpc.mssm.edu","Locality":"PROCESS_LOCAL","Speculative":false,"Getting
 Result Time":0,"Finish Time":0,"Failed":false,"Accumulables":[]}}

28919:{"Event":"SparkListenerTaskEnd","Stage ID":6,"Stage Attempt ID":0,"Task 
Type":"ShuffleMapTask","Task End 
Reason":{"Reason":"ExecutorLostFailure","Executor ID":"232"},"Task Info":{"Task 
ID":13737,"Index":1316,"Attempt":0,"Launch Time":1440703706769,"Executor 
ID":"232","Host":"demeter-csmaz11-11.demeter.hpc.mssm.edu","Locality":"PROCESS_LOCAL","Speculative":false,"Getting
 Result Time":0,"Finish Time":1440703708467,"Failed":true,"Accumulables":[]}}

29708:{"Event":"SparkListenerTaskEnd","Stage ID":6,"Stage Attempt ID":0,"Task 
Type":"ShuffleMapTask","Task End Reason":{"Reason":"Success"},"Task 
Info":{"Task ID":13737,"Index":1316,"Attempt":0,"Launch 
Time":1440703706769,"Executor 
ID":"232","Host":"demeter-csmaz11-11.demeter.hpc.mssm.edu","Locality":"PROCESS_LOCAL","Speculative":false,"Getting
 Result Time":0,"Finish 
Time":1440703708467,"Failed":true,"Accumulables":[]},"Task Metrics":{"Host 
Name":"demeter-csmaz11-11.demeter.hpc.mssm.edu","Executor Deserialize 
Time":6,"Executor Run Time":194,"Result Size":8760,"JVM GC Time":0,"Result 
Serialization Time":0,"Memory Bytes Spilled":0,"Disk Bytes Spilled":0,"Shuffle 
Write Metrics":{"Shuffle Bytes Written":245058,"Shuffle Write 
Time":6412935,"Shuffle Records Written":1095},"Input Metrics":{"Data Read 
Method":"Memory","Bytes Read":324704,"Records Read":1095}}}
{code}

> Successful task-end event after task failed due to executor loss
> ----------------------------------------------------------------
>
>                 Key: SPARK-10551
>                 URL: https://issues.apache.org/jira/browse/SPARK-10551
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.4.1
>            Reporter: Ryan Williams
>
> Doing forensics on some failed Spark applications and seeing nonsensical 
> things in the event logs, e.g.:
> {code}
> $ grep -n '"Task ID":12083' application_1439224376754_5702
> 24578:{"Event":"SparkListenerTaskStart","Stage ID":6,"Stage Attempt 
> ID":0,"Task Info":{"Task ID":12083,"Index":145,"Attempt":0,"Launch 
> Time":1440703704768,"Executor 
> ID":"232","Host":"demeter-csmaz11-11.demeter.hpc.mssm.edu","Locality":"PROCESS_LOCAL","Speculative":false,"Getting
>  Result Time":0,"Finish Time":0,"Failed":false,"Accumulables":[]}}
> 28918:{"Event":"SparkListenerTaskEnd","Stage ID":6,"Stage Attempt ID":0,"Task 
> Type":"ShuffleMapTask","Task End 
> Reason":{"Reason":"ExecutorLostFailure","Executor ID":"232"},"Task 
> Info":{"Task ID":12083,"Index":145,"Attempt":0,"Launch 
> Time":1440703704768,"Executor 
> ID":"232","Host":"demeter-csmaz11-11.demeter.hpc.mssm.edu","Locality":"PROCESS_LOCAL","Speculative":false,"Getting
>  Result Time":0,"Finish Time":1440703707747,"Failed":true,"Accumulables":[]}}
> 29062:{"Event":"SparkListenerTaskEnd","Stage ID":6,"Stage Attempt ID":0,"Task 
> Type":"ShuffleMapTask","Task End Reason":{"Reason":"Success"},"Task 
> Info":{"Task ID":12083,"Index":145,"Attempt":0,"Launch 
> Time":1440703704768,"Executor 
> ID":"232","Host":"demeter-csmaz11-11.demeter.hpc.mssm.edu","Locality":"PROCESS_LOCAL","Speculative":false,"Getting
>  Result Time":0,"Finish 
> Time":1440703707747,"Failed":true,"Accumulables":[]},"Task Metrics":{"Host 
> Name":"demeter-csmaz11-11.demeter.hpc.mssm.edu","Executor Deserialize 
> Time":181,"Executor Run Time":1585,"Result Size":8760,"JVM GC Time":0,"Result 
> Serialization Time":0,"Memory Bytes Spilled":0,"Disk Bytes 
> Spilled":0,"Shuffle Write Metrics":{"Shuffle Bytes Written":454121,"Shuffle 
> Write Time":43293396,"Shuffle Records Written":2549},"Input Metrics":{"Data 
> Read Method":"Memory","Bytes Read":810520,"Records Read":2549}}}
> {code}
> Task ID 12083 has a TaskStart event, a TaskEnd event indicating that the task 
> failed due to {{ExecutorLostFailure}}, and then a TaskEnd event saying that 
> the task succeeded.
> The history server is not showing me this file in the "complete" or 
> "incomplete" sections, though it has this line in its stdout (and no apparent 
> exceptions later), which I thought meant that it parsed the file correctly:
> {code}
> 15/09/10 17:57:56 INFO FsHistoryProvider: Replaying log path: 
> hdfs://demeter-nn1.demeter.hpc.mssm.edu/spark/tmp/logs/willir31/application_1439224376754_5702
> {code}
> [~arahuja] ran this application originally and says that the live web UI was 
> showing inconsistent/nonsensical data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to