Why couldn’t you use the spark thrift server?

On Feb 7, 2017, at 1:28 PM, Cosmin Posteuca 
<cosmin.poste...@gmail.com<mailto:cosmin.poste...@gmail.com>> wrote:

answer for Gourav Sengupta

I want to use same spark application because i want to work as a FIFO 
scheduler. My problem is that i have many jobs(not so big) and if i run an 
application for every job my cluster will split resources as a FAIR 
scheduler(it's what i observe, maybe i'm wrong) and exist the possibility to 
create bottleneck effect. The start time isn't a problem for me, because it 
isn't a real-time application.

I need a business solution, that's the reason why i can't use code from github.

Thanks!

2017-02-07 19:55 GMT+02:00 Gourav Sengupta 
<gourav.sengu...@gmail.com<mailto:gourav.sengu...@gmail.com>>:
Hi,

May I ask the reason for using the same spark application? Is it because of the 
time it takes in order to start a spark context?

On another note you may want to look at the number of contributors in a github 
repo before choosing a solution.


Regards,
Gourav

On Tue, Feb 7, 2017 at 5:26 PM, vincent gromakowski 
<vincent.gromakow...@gmail.com<mailto:vincent.gromakow...@gmail.com>> wrote:
Spark jobserver or Livy server are the best options for pure technical API.
If you want to publish business API you will probably have to build you own app 
like the one I wrote a year ago https://github.com/elppc/akka-spark-experiments
It combines Akka actors and a shared Spark context to serve concurrent 
subsecond jobs


2017-02-07 15:28 GMT+01:00 ayan guha 
<guha.a...@gmail.com<mailto:guha.a...@gmail.com>>:
I think you are loking for livy or spark  jobserver

On Wed, 8 Feb 2017 at 12:37 am, Cosmin Posteuca 
<cosmin.poste...@gmail.com<mailto:cosmin.poste...@gmail.com>> wrote:

I want to run different jobs on demand with same spark context, but i don't 
know how exactly i can do this.

I try to get current context, but seems it create a new spark context(with new 
executors).

I call spark-submit to add new jobs.

I run code on Amazon EMR(3 instances, 4 core & 16GB ram / instance), with yarn 
as resource manager.

My code:

val sparkContext = SparkContext.getOrCreate()
val content = 1 to 40000
val result = sparkContext.parallelize(content, 5)
result.map(value => value.toString).foreach(loop)

def loop(x: String): Unit = {
   for (a <- 1 to 30000000) {

   }
}


spark-submit:

spark-submit --executor-cores 1 \
             --executor-memory 1g \
             --driver-memory 1g \
             --master yarn \
             --deploy-mode cluster \
             --conf spark.dynamicAllocation.enabled=true \
             --conf spark.shuffle.service.enabled=true \
             --conf spark.dynamicAllocation.minExecutors=1 \
             --conf spark.dynamicAllocation.maxExecutors=3 \
             --conf spark.dynamicAllocation.initialExecutors=3 \
             --conf spark.executor.instances=3 \


If i run twice spark-submit it create 6 executors, but i want to run all this 
jobs on same spark application.

How can achieve adding jobs to an existing spark application?

I don't understand why SparkContext.getOrCreate() don't get existing spark 
context.


Thanks,

Cosmin P.

--
Best Regards,
Ayan Guha




Reply via email to