from:"Allen Chang"

Re: Using Spark on Data size larger than Memory size

2014-06-11 Thread Allen Chang

Thanks. We've run into timeout issues at scale as well. We were able to workaround them by setting the following JVM options: -Dspark.akka.askTimeout=300 -Dspark.akka.timeout=300 -Dspark.worker.timeout=300 NOTE: these JVM options *must* be set on worker nodes (and not just the driver/master) for

Monitoring spark dis-associated workers

2014-06-10 Thread Allen Chang

We're running into an issue where periodically the master loses connectivity with workers in the spark cluster. We believe this issue tends to manifest when the cluster is under heavy load, but we're not entirely sure when it happens. I've seen one or two other messages to this list about this

Re: Using Spark on Data size larger than Memory size

2014-06-10 Thread Allen Chang

Thanks for the clarification. What is the proper way to configure RDDs when your aggregate data size exceeds your available working memory size? In particular, in additional to typical operations, I'm performing cogroups, joins, and coalesces/shuffles. I see that the default storage level for

Re: Using Spark on Data size larger than Memory size

Monitoring spark dis-associated workers

Re: Using Spark on Data size larger than Memory size

3 matches

Site Navigation

Mail list logo

Footer information