Pete Robbins created SPARK-15606:
------------------------------------

             Summary: Driver hang in o.a.s.DistributedSuite on 2 core machine
                 Key: SPARK-15606
                 URL: https://issues.apache.org/jira/browse/SPARK-15606
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.0
         Environment: AMD64 box with only 2 cores
            Reporter: Pete Robbins


repeatedly failing task that crashes JVM *** FAILED ***
  The code passed to failAfter did not complete within 100000 milliseconds. 
(DistributedSuite.scala:128)

This test started failing and DistrbutedSuite hanging following 
https://github.com/apache/spark/pull/13055

It looks like the extra message to remove the BlockManager deadlocks as there 
are only 2 message processing loop threads. Related to 
https://issues.apache.org/jira/browse/SPARK-13906

{code}
  /** Thread pool used for dispatching messages. */
  private val threadpool: ThreadPoolExecutor = {
    val numThreads = 
nettyEnv.conf.getInt("spark.rpc.netty.dispatcher.numThreads",
      math.max(2, Runtime.getRuntime.availableProcessors()))
    val pool = ThreadUtils.newDaemonFixedThreadPool(numThreads, 
"dispatcher-event-loop")
    for (i <- 0 until numThreads) {
      pool.execute(new MessageLoop)
    }
    pool
  }

{code} 

Setting a minimum of 3 threads alleviates this issue but I'm not sure there 
isn't another underlying problem.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to