I've got a stack of Dell Commodity servers-- Ram~>(8 to 32Gb) single or dual quad core processor cores per machine. I think I will have them loaded with CentOS. Eventually, I may want to add GPUs on the nodes to handle linear alg. operations...
My Idea has been: 1) to find a way to configure Spark to allocate different resources per-machine, per-job. -- at least have a "standard executor"... and allow different machines to have different numbers of executors. 2) make (using vanilla spark) a pre-run optimization phase which benchmarks the throughput of each node (per hardware), and repartition the dataset to more efficiently use the hardware rather than rely on Spark Speculation-- which has always seemed a dis-optimal way to balance the load across several differing machines. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/heterogeneous-cluster-hardware-tp11567p12581.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org