Re: Multiple Spark Applications that use Cassandra, how to share resources/nodes

Alonso Isidoro Roman Wed, 04 May 2016 02:33:43 -0700

Andy, i think there are some ideas to implement a pool of spark context,
but, for now, it is only an idea.



https://github.com/spark-jobserver/spark-jobserver/issues/365


It is possible to share a spark context between apps, i did not have to use
this feature, sorry about that.

Regards,

Alonso



Alonso Isidoro Roman.

Mis citas preferidas (de hoy) :
"Si depurar es el proceso de quitar los errores de software, entonces
programar debe ser el proceso de introducirlos..."
 -  Edsger Dijkstra

My favorite quotes (today):
"If debugging is the process of removing software bugs, then programming
must be the process of putting ..."
  - Edsger Dijkstra

"If you pay peanuts you get monkeys"


2016-05-04 11:08 GMT+02:00 Tobias Eriksson <tobias.eriks...@qvantel.com>:

> Hi Andy,
>  We have a very simple approach I think, we do like this
>
>    1. Submit our Spark application to the Spark Master (version 1.6.1.)
>    2. Our Application creates a Spark Context that we use throughout
>    3. We use Spray REST server
>    4. Every request that comes in we simply serve by querying Cassandra
>    doing some joins and some processing, and returns JSON as a result back on
>    the REST-API
>    5. We to take advantage of co-locating the Spark Workers with the
>    Cassandra Nodes to “boost” performance (in our test lab we have a 4 node
>    cluster)
>
> Performance wise we have had some challenges but that has had to do with
> how the data was arranged in Cassandra, after changing to the
> time-series-design-pattern we improved our performance dramatically, 750
> times in our test lab.
>
> But now the problem is that we have more Spark applications running
> concurrently/in parallell and we are then forced to scale down on the
> number of cores that OUR application can use to ensure that we give way for
> other applications to come in a “play” too. This is not optimal, cause if
> there is free resources then I would like to use them
>
> When it comes to having load balancing the REST requests, then in my case
> I will not have that many clients, yet in my case I think that I could
> scale by adding multiple instances of my Spark Applications, but would
> obviously suffer in having to share the resources between the different
> Spark Workers (say cores). Or I would have to use dynamic resourcing.
> But as I started out my question here this is where I struggle, I need to
> get this right with sharing the resources.
> This is a challenges since I rely on that I HAVE TO co-locate the Spark
> Workers and Cassandra Nodes, meaning that I can not have 3 out of 4 nodes,
> cause then the Cassandra access will not be efficient since I use
> repartitionByCassandraReplica()
>
> Satisfying 250ms requests, well that depends very much on your use case I
> would say, and boring answer :-( sorry
>
> Regards
>  Tobias
>
> From: Andy Davidson <a...@santacruzintegration.com>
> Date: Tuesday 3 May 2016 at 17:26
> To: Tobias Eriksson <tobias.eriks...@qvantel.com>, "user@spark.apache.org"
> <user@spark.apache.org>
> Subject: Re: Multiple Spark Applications that use Cassandra, how to share
> resources/nodes
>
> Hi Tobias
>
> I am very interested implemented rest based api on top of spark. My rest
> based system would make predictions from data provided in the request using
> models trained in batch. My SLA is 250 ms.
>
> Would you mind sharing how you implemented your rest server?
>
> I am using spark-1.6.1. I have several unit tests that create spark
> context, master is set to ‘local[4]’. I do not think the unit test frame is
> going to scale. Can each rest server have a pool of sparks contexts?
>
>
> The system would like to replacing is set up as follows
>
> Layer of dumb load balancers: l1, l2, l3
> Layer of proxy servers:       p1, p2, p3, p4, p5, ….. Pn
> Layer of containers:          c1, c2, c3, ….. Cn
>
> Where Cn is much larger than Pn
>
>
> Kind regards
>
> Andy
>
> P.s. There is a talk on 5/5 about spark 2.0 Hoping there is something in
> the near future.
>
> https://www.brighttalk.com/webcast/12891/202021?utm_campaign=google-calendar&utm_content=&utm_source=brighttalk-portal&utm_medium=calendar&utm_term=
>
> From: Tobias Eriksson <tobias.eriks...@qvantel.com>
> Date: Tuesday, May 3, 2016 at 7:34 AM
> To: "user @spark" <user@spark.apache.org>
> Subject: Multiple Spark Applications that use Cassandra, how to share
> resources/nodes
>
> Hi
>  We are using Spark for a long running job, in fact it is a REST-server
> that does some joins with some tables in Casandra and returns the result.
> Now we need to have multiple applications running in the same Spark
> cluster, and from what I understand this is not possible, or should I say
> somewhat complicated
>
>    1. A Spark application takes all the resources / nodes in the cluster
>    (we have 4 nodes one for each Cassandra Node)
>    2. A Spark application returns it’s resources when it is done (exits
>    or the context is closed/returned)
>    3. Sharing resources using Mesos only allows scaling down and then
>    scaling up by a step-by-step policy, i.e. 2 nodes, 3 nodes, 4 nodes, … And
>    increases as the need increases
>
> But if this is true, I can not have several applications running in
> parallell, is that true ?
> If I use Mesos then the whole idea with one Spark Worker per Cassandra
> Node fails, as it talks directly to a node, and that is how it is so
> efficient.
> In this case I need all nodes, not 3 out of 4.
>
> Any mistakes in my thinking ?
> Any ideas on how to solve this ? Should be a common problem I think
>
> -Tobias
>
>
>

Re: Multiple Spark Applications that use Cassandra, how to share resources/nodes

Reply via email to