Hi
 We are using Spark for a long running job, in fact it is a REST-server that 
does some joins with some tables in Casandra and returns the result.
Now we need to have multiple applications running in the same Spark cluster, 
and from what I understand this is not possible, or should I say somewhat 
complicated

  1.  A Spark application takes all the resources / nodes in the cluster (we 
have 4 nodes one for each Cassandra Node)
  2.  A Spark application returns it’s resources when it is done (exits or the 
context is closed/returned)
  3.  Sharing resources using Mesos only allows scaling down and then scaling 
up by a step-by-step policy, i.e. 2 nodes, 3 nodes, 4 nodes, … And increases as 
the need increases

But if this is true, I can not have several applications running in parallell, 
is that true ?
If I use Mesos then the whole idea with one Spark Worker per Cassandra Node 
fails, as it talks directly to a node, and that is how it is so efficient.
In this case I need all nodes, not 3 out of 4.

Any mistakes in my thinking ?
Any ideas on how to solve this ? Should be a common problem I think

-Tobias


Reply via email to