Re: heterogeneous cluster setup

2014-12-04 Thread Victor Tso-Guillen
To reiterate, it's very important for Spark's workers to have the same memory available. Think about Spark uniformly chopping up your data and distributing the work to the nodes. The algorithm is not designed to consider that a worker has less memory available than some other worker. On Thu, Dec

Re: heterogeneous cluster setup

2014-12-03 Thread Victor Tso-Guillen
I don't have a great answer for you. For us, we found a common divisor, not necessarily a whole gigabyte, of the available memory of the different hardware and used that as the amount of memory per worker and scaled the number of cores accordingly so that every core in the system has the same

Re: heterogeneous cluster setup

2014-12-03 Thread Victor Tso-Guillen
You'll have to decide which is more expensive in your heterogenous environment and optimize for the utilization of that. For example, you may decide that memory is the only costing factor and you can discount the number of cores. Then you could have 8GB on each worker each with four cores. Note