t oo created SPARK-24618:
----------------------------

             Summary: Allow ability to consume driver memory on worker hosts 
not master (option for clustermode to wait for returncode?)
                 Key: SPARK-24618
                 URL: https://issues.apache.org/jira/browse/SPARK-24618
             Project: Spark
          Issue Type: New Feature
          Components: Scheduler, Spark Core
    Affects Versions: 2.3.1
            Reporter: t oo


My scenario is this:
EC2 master (488GB RAM of memory and 64 cores)
Autoscaling group of up to 8 EC2 workers that get registered with the master

I send 100s of parallel spark-submits to the ec2 master but I seem to be 
artificially limited to approx 240 in parallel (if driver of each spark-submit 
takes 2gb memory). I would like to know the returncode of each sparksubmit so 
deploymode is client. I understand using deploymode of cluster would not wait 
for the returncode.
Sparksubmits are not submitted directly to worker nodes as EC2s are ephemeral 
beasts that pop-up/down regularly, while the master can simply redirect tasks 
to another worker whenever another worker is lost.

This new feature would allow as many spark-submits in parallel as there is 
total memory in the pool of 8 worker nodes (ie don't limit by memory of the 
master) AND make each sparksubmit wait for return code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to