Micah Whitacre created CRUNCH-466:
-------------------------------------

             Summary: Occasional Spark Test failures due to Future Timeouts
                 Key: CRUNCH-466
                 URL: https://issues.apache.org/jira/browse/CRUNCH-466
             Project: Crunch
          Issue Type: Bug
          Components: Core
            Reporter: Micah Whitacre
            Assignee: Josh Wills


When building master and the 0.11 RC on one devices I started getting sporadic 
test failures.  The test that failed changed between runs.  The error seems to 
be related to Spark starting up for testing vs anything wrong with our code.

Here is an example of one of the failures...
{quote}
14/08/29 16:16:17 INFO Remoting: Starting remoting
14/08/29 16:16:27 ERROR Remoting: Remoting error: [Startup timed out] [
akka.remote.RemoteTransportException: Startup timed out
        at 
akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
        at akka.remote.Remoting.start(Remoting.scala:191)
        at 
akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
        at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579)
        at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)
        at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)
        at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
        at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
        at 
org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:104)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:152)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:202)
        at 
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
        at 
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:67)
        at 
org.apache.crunch.impl.spark.SparkPipeline.runAsync(SparkPipeline.java:137)
        at 
org.apache.crunch.impl.spark.SparkPipeline.run(SparkPipeline.java:110)
        at 
org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:94)
        at com.google.common.collect.Lists.newArrayList(Lists.java:125)
        at 
org.apache.crunch.SparkAggregatorIT.testCount(SparkAggregatorIT.java:43)
{quote}

If we changed the tests to specify a SparkConf we should be able to increase 
the akka.actor.timeout to be longer.  I also saw a few posts about Akka having 
trouble if it spins up a lot of actors.  I haven't looked into Spark's testing 
framework but maybe if we could consolidate startup/shutdown to the beginning 
or end of a suite it might help.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to