Andy, i think there are some ideas to implement a pool of spark context, but, for now, it is only an idea.
https://github.com/spark-jobserver/spark-jobserver/issues/365 It is possible to share a spark context between apps, i did not have to use this feature, sorry about that. Regards, Alonso Alonso Isidoro Roman. Mis citas preferidas (de hoy) : "Si depurar es el proceso de quitar los errores de software, entonces programar debe ser el proceso de introducirlos..." - Edsger Dijkstra My favorite quotes (today): "If debugging is the process of removing software bugs, then programming must be the process of putting ..." - Edsger Dijkstra "If you pay peanuts you get monkeys" 2016-05-04 11:08 GMT+02:00 Tobias Eriksson <tobias.eriks...@qvantel.com>: > Hi Andy, > We have a very simple approach I think, we do like this > > 1. Submit our Spark application to the Spark Master (version 1.6.1.) > 2. Our Application creates a Spark Context that we use throughout > 3. We use Spray REST server > 4. Every request that comes in we simply serve by querying Cassandra > doing some joins and some processing, and returns JSON as a result back on > the REST-API > 5. We to take advantage of co-locating the Spark Workers with the > Cassandra Nodes to “boost” performance (in our test lab we have a 4 node > cluster) > > Performance wise we have had some challenges but that has had to do with > how the data was arranged in Cassandra, after changing to the > time-series-design-pattern we improved our performance dramatically, 750 > times in our test lab. > > But now the problem is that we have more Spark applications running > concurrently/in parallell and we are then forced to scale down on the > number of cores that OUR application can use to ensure that we give way for > other applications to come in a “play” too. This is not optimal, cause if > there is free resources then I would like to use them > > When it comes to having load balancing the REST requests, then in my case > I will not have that many clients, yet in my case I think that I could > scale by adding multiple instances of my Spark Applications, but would > obviously suffer in having to share the resources between the different > Spark Workers (say cores). Or I would have to use dynamic resourcing. > But as I started out my question here this is where I struggle, I need to > get this right with sharing the resources. > This is a challenges since I rely on that I HAVE TO co-locate the Spark > Workers and Cassandra Nodes, meaning that I can not have 3 out of 4 nodes, > cause then the Cassandra access will not be efficient since I use > repartitionByCassandraReplica() > > Satisfying 250ms requests, well that depends very much on your use case I > would say, and boring answer :-( sorry > > Regards > Tobias > > From: Andy Davidson <a...@santacruzintegration.com> > Date: Tuesday 3 May 2016 at 17:26 > To: Tobias Eriksson <tobias.eriks...@qvantel.com>, "user@spark.apache.org" > <user@spark.apache.org> > Subject: Re: Multiple Spark Applications that use Cassandra, how to share > resources/nodes > > Hi Tobias > > I am very interested implemented rest based api on top of spark. My rest > based system would make predictions from data provided in the request using > models trained in batch. My SLA is 250 ms. > > Would you mind sharing how you implemented your rest server? > > I am using spark-1.6.1. I have several unit tests that create spark > context, master is set to ‘local[4]’. I do not think the unit test frame is > going to scale. Can each rest server have a pool of sparks contexts? > > > The system would like to replacing is set up as follows > > Layer of dumb load balancers: l1, l2, l3 > Layer of proxy servers: p1, p2, p3, p4, p5, ….. Pn > Layer of containers: c1, c2, c3, ….. Cn > > Where Cn is much larger than Pn > > > Kind regards > > Andy > > P.s. There is a talk on 5/5 about spark 2.0 Hoping there is something in > the near future. > > https://www.brighttalk.com/webcast/12891/202021?utm_campaign=google-calendar&utm_content=&utm_source=brighttalk-portal&utm_medium=calendar&utm_term= > > From: Tobias Eriksson <tobias.eriks...@qvantel.com> > Date: Tuesday, May 3, 2016 at 7:34 AM > To: "user @spark" <user@spark.apache.org> > Subject: Multiple Spark Applications that use Cassandra, how to share > resources/nodes > > Hi > We are using Spark for a long running job, in fact it is a REST-server > that does some joins with some tables in Casandra and returns the result. > Now we need to have multiple applications running in the same Spark > cluster, and from what I understand this is not possible, or should I say > somewhat complicated > > 1. A Spark application takes all the resources / nodes in the cluster > (we have 4 nodes one for each Cassandra Node) > 2. A Spark application returns it’s resources when it is done (exits > or the context is closed/returned) > 3. Sharing resources using Mesos only allows scaling down and then > scaling up by a step-by-step policy, i.e. 2 nodes, 3 nodes, 4 nodes, … And > increases as the need increases > > But if this is true, I can not have several applications running in > parallell, is that true ? > If I use Mesos then the whole idea with one Spark Worker per Cassandra > Node fails, as it talks directly to a node, and that is how it is so > efficient. > In this case I need all nodes, not 3 out of 4. > > Any mistakes in my thinking ? > Any ideas on how to solve this ? Should be a common problem I think > > -Tobias > > >