Thanks. We've run into timeout issues at scale as well. We were able to
workaround them by setting the following JVM options:
-Dspark.akka.askTimeout=300
-Dspark.akka.timeout=300
-Dspark.worker.timeout=300
NOTE: these JVM options *must* be set on worker nodes (and not just the
driver/master) for
We're running into an issue where periodically the master loses connectivity
with workers in the spark cluster. We believe this issue tends to manifest
when the cluster is under heavy load, but we're not entirely sure when it
happens. I've seen one or two other messages to this list about this
Thanks for the clarification.
What is the proper way to configure RDDs when your aggregate data size
exceeds your available working memory size? In particular, in additional to
typical operations, I'm performing cogroups, joins, and coalesces/shuffles.
I see that the default storage level for