[ https://issues.apache.org/jira/browse/SPARK-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803301#comment-15803301 ]
Weizhong commented on SPARK-16180: ---------------------------------- Hi, we also meet this issue on Spark 1.6. From the executor log, we found the thread hang on 3h, and then task succeed. {noformat} 2017-01-04 21:07:21,675 | INFO | [Executor task launch worker-0] | Running task 447.0 in stage 22.0 (TID 22335) | org.apache.spark.Logging$class.logInfo(Logging.scala:59) 2017-01-04 21:07:21,883 | INFO | [Executor task launch worker-0] | Found block rdd_31_447 remotely | org.apache.spark.Logging$class.logInfo(Logging.scala:59) 2017-01-04 21:07:22,091 | INFO | [Executor task launch worker-4] | Finished task 1866.0 in stage 18.0 (TID 21754). 106402 bytes result sent to driver executor run time: 27585 ms | org.apache.spark.Logging$class.logInfo(Logging.scala:59) 2017-01-04 21:07:22,197 | INFO | [Executor task launch worker-1] | Found block rdd_31_424 remotely | org.apache.spark.Logging$class.logInfo(Logging.scala:59) 2017-01-04 21:07:22,201 | INFO | [dispatcher-event-loop-18] | Got assigned task 22354 | org.apache.spark.Logging$class.logInfo(Logging.scala:59) 2017-01-04 21:07:22,202 | INFO | [Executor task launch worker-4] | Running task 466.0 in stage 22.0 (TID 22354) | org.apache.spark.Logging$class.logInfo(Logging.scala:59) 2017-01-04 21:07:22,948 | INFO | [Executor task launch worker-4] | Found block rdd_31_466 remotely | org.apache.spark.Logging$class.logInfo(Logging.scala:59) 2017-01-05 00:40:25,638 | INFO | [Executor task launch worker-2] | Finished task 227.0 in stage 22.0 (TID 22115). 4961 bytes result sent to driver executor run time: 12787090 ms | org.apache.spark.Logging$class.logInfo(Logging.scala:59) 2017-01-05 00:40:26,948 | INFO | [Executor task launch worker-1] | Finished task 424.0 in stage 22.0 (TID 22312). 4961 bytes result sent to driver executor run time: 12785601 ms | org.apache.spark.Logging$class.logInfo(Logging.scala:59) 2017-01-05 00:40:27,492 | INFO | [Executor task launch worker-0] | Finished task 447.0 in stage 22.0 (TID 22335). 4961 bytes result sent to driver executor run time: 12785815 ms | org.apache.spark.Logging$class.logInfo(Logging.scala:59) 2017-01-05 00:40:27,561 | INFO | [Executor task launch worker-4] | Finished task 466.0 in stage 22.0 (TID 22354). 4961 bytes result sent to driver executor run time: 12785356 ms | org.apache.spark.Logging$class.logInfo(Logging.scala:59) {noformat} Do you have found the root reason? > Task hang on fetching blocks (cached RDD) > ----------------------------------------- > > Key: SPARK-16180 > URL: https://issues.apache.org/jira/browse/SPARK-16180 > Project: Spark > Issue Type: Improvement > Affects Versions: 1.6.1 > Reporter: Davies Liu > > Here is the stackdump of executor: > {code} > sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > scala.concurrent.Await$.result(package.scala:107) > org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:102) > org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:588) > org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:585) > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:585) > org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:570) > org.apache.spark.storage.BlockManager.get(BlockManager.scala:630) > org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44) > org.apache.spark.rdd.RDD.iterator(RDD.scala:268) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:46) > org.apache.spark.scheduler.Task.run(Task.scala:96) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:222) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org