Hi We are using Spark for a long running job, in fact it is a REST-server that does some joins with some tables in Casandra and returns the result. Now we need to have multiple applications running in the same Spark cluster, and from what I understand this is not possible, or should I say somewhat complicated
1. A Spark application takes all the resources / nodes in the cluster (we have 4 nodes one for each Cassandra Node) 2. A Spark application returns it’s resources when it is done (exits or the context is closed/returned) 3. Sharing resources using Mesos only allows scaling down and then scaling up by a step-by-step policy, i.e. 2 nodes, 3 nodes, 4 nodes, … And increases as the need increases But if this is true, I can not have several applications running in parallell, is that true ? If I use Mesos then the whole idea with one Spark Worker per Cassandra Node fails, as it talks directly to a node, and that is how it is so efficient. In this case I need all nodes, not 3 out of 4. Any mistakes in my thinking ? Any ideas on how to solve this ? Should be a common problem I think -Tobias