[ 
https://issues.apache.org/jira/browse/SPARK-20054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-20054:
---------------------------------
    Labels: bulk-closed  (was: )

> [Mesos] Detectability for resource starvation
> ---------------------------------------------
>
>                 Key: SPARK-20054
>                 URL: https://issues.apache.org/jira/browse/SPARK-20054
>             Project: Spark
>          Issue Type: Improvement
>          Components: Mesos, Scheduler
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>            Reporter: Kamal Gurala
>            Priority: Minor
>              Labels: bulk-closed
>
> We currently use Mesos 1.1.0 for our Spark cluster in coarse-grained mode. We 
> had a production issue recently wherein we had our spark frameworks accept 
> resources from the Mesos master, so executors were started and spark driver 
> was aware of them, but the driver didn’t plan any task and nothing was 
> happening for a long time because it didn't meet a minimum registered 
> resources threshold. and the cluster is usually under-provisioned in order 
> because not all the jobs need to run at the same time. These held resources 
> were never offered back to the master for re-allocation leading to the entire 
> cluster to a halt until we had to manually intervene. 
> Using DRF for mesos and FIFO for Spark and the cluster is usually 
> under-provisioned. At any point of time there could be 10-15 spark frameworks 
> running on Mesos on the under-provisioned cluster 
> The ask is to have a way to better recoverability or detectability for a 
> scenario where the individual Spark frameworks hold onto resources but never 
> launch any tasks or have these frameworks release these resources after a 
> fixed amount of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to