[ 
https://issues.apache.org/jira/browse/SPARK-19764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ari Gesher updated SPARK-19764:
-------------------------------

We're driving everything from Python.  It may be a bug that we're not getting 
the error to propagate up to the notebook - generally, we see exceptions.  When 
we ran the same job from the PySpark shell, we saw the stacktrace, so I'm 
inclined to point at something in the notebook stop that made it not propagate.

We're happy to investigate if you think it's useful.



> Executors hang with supposedly running task that are really finished.
> ---------------------------------------------------------------------
>
>                 Key: SPARK-19764
>                 URL: https://issues.apache.org/jira/browse/SPARK-19764
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Spark Core
>    Affects Versions: 2.0.2
>         Environment: Ubuntu 16.04 LTS
> OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13)
> Spark 2.0.2 - Spark Cluster Manager
>            Reporter: Ari Gesher
>         Attachments: driver-log-stderr.log, executor-2.log, netty-6153.jpg, 
> SPARK-19764.tgz
>
>
> We've come across a job that won't finish.  Running on a six-node cluster, 
> each of the executors end up with 5-7 tasks that are never marked as 
> completed.
> Here's an excerpt from the web UI:
> ||Index  ▴||ID||Attempt||Status||Locality Level||Executor ID / Host||Launch 
> Time||Duration||Scheduler Delay||Task Deserialization Time||GC Time||Result 
> Serialization Time||Getting Result Time||Peak Execution Memory||Shuffle Read 
> Size / Records||Errors||
> |105  | 1131  | 0     | SUCCESS       |PROCESS_LOCAL  |4 / 172.31.24.171 |    
> 2017/02/27 22:51:36 |   1.9 min |       9 ms |  4 ms |  0.7 s | 2 ms|   6 ms| 
>   384.1 MB|       90.3 MB / 572   | |
> |106| 1168|   0|      RUNNING |ANY|   2 / 172.31.16.112|      2017/02/27 
> 22:53:25|    6.5 h   |0 ms|  0 ms|   1 s     |0 ms|  0 ms|   |384.1 MB       
> |98.7 MB / 624 | |      
> However, the Executor reports the task as finished: 
> {noformat}
> 17/02/27 22:53:25 INFO Executor: Running task 106.0 in stage 5.0 (TID 1168)
> 17/02/27 22:55:29 INFO Executor: Finished task 106.0 in stage 5.0 (TID 1168). 
> 2633558 bytes result sent via BlockManager)
> {noformat}
> As does the driver log:
> {noformat}
> 17/02/27 22:53:25 INFO Executor: Running task 106.0 in stage 5.0 (TID 1168)
> 17/02/27 22:55:29 INFO Executor: Finished task 106.0 in stage 5.0 (TID 1168). 
> 2633558 bytes result sent via BlockManager)
> {noformat}
> Full log from this executor and the {{stderr}} from 
> {{app-20170227223614-0001/2/stderr}} attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to