Micah Whitacre created CRUNCH-466:
-------------------------------------
Summary: Occasional Spark Test failures due to Future Timeouts
Key: CRUNCH-466
URL: https://issues.apache.org/jira/browse/CRUNCH-466
Project: Crunch
Issue Type: Bug
Components: Core
Reporter: Micah Whitacre
Assignee: Josh Wills
When building master and the 0.11 RC on one devices I started getting sporadic
test failures. The test that failed changed between runs. The error seems to
be related to Spark starting up for testing vs anything wrong with our code.
Here is an example of one of the failures...
{quote}
14/08/29 16:16:17 INFO Remoting: Starting remoting
14/08/29 16:16:27 ERROR Remoting: Remoting error: [Startup timed out] [
akka.remote.RemoteTransportException: Startup timed out
at
akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
at akka.remote.Remoting.start(Remoting.scala:191)
at
akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
at
org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:104)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:152)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:202)
at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:67)
at
org.apache.crunch.impl.spark.SparkPipeline.runAsync(SparkPipeline.java:137)
at
org.apache.crunch.impl.spark.SparkPipeline.run(SparkPipeline.java:110)
at
org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:94)
at com.google.common.collect.Lists.newArrayList(Lists.java:125)
at
org.apache.crunch.SparkAggregatorIT.testCount(SparkAggregatorIT.java:43)
{quote}
If we changed the tests to specify a SparkConf we should be able to increase
the akka.actor.timeout to be longer. I also saw a few posts about Akka having
trouble if it spins up a lot of actors. I haven't looked into Spark's testing
framework but maybe if we could consolidate startup/shutdown to the beginning
or end of a suite it might help.
--
This message was sent by Atlassian JIRA
(v6.2#6252)