t oo created SPARK-24618: ---------------------------- Summary: Allow ability to consume driver memory on worker hosts not master (option for clustermode to wait for returncode?) Key: SPARK-24618 URL: https://issues.apache.org/jira/browse/SPARK-24618 Project: Spark Issue Type: New Feature Components: Scheduler, Spark Core Affects Versions: 2.3.1 Reporter: t oo
My scenario is this: EC2 master (488GB RAM of memory and 64 cores) Autoscaling group of up to 8 EC2 workers that get registered with the master I send 100s of parallel spark-submits to the ec2 master but I seem to be artificially limited to approx 240 in parallel (if driver of each spark-submit takes 2gb memory). I would like to know the returncode of each sparksubmit so deploymode is client. I understand using deploymode of cluster would not wait for the returncode. Sparksubmits are not submitted directly to worker nodes as EC2s are ephemeral beasts that pop-up/down regularly, while the master can simply redirect tasks to another worker whenever another worker is lost. This new feature would allow as many spark-submits in parallel as there is total memory in the pool of 8 worker nodes (ie don't limit by memory of the master) AND make each sparksubmit wait for return code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org