[ https://issues.apache.org/jira/browse/SPARK-11049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-11049. ------------------------------- Resolution: Not A Problem Pending more info > If a single executor fails to allocate memory, entire job fails > --------------------------------------------------------------- > > Key: SPARK-11049 > URL: https://issues.apache.org/jira/browse/SPARK-11049 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.4.0 > Reporter: Brian > > To reproduce: > * Create a spark cluster using start-master.sh and start-slave.sh (I believe > this is the "standalone cluster manager?"). > * Leave a process running on some nodes that take up about significant > amounts of RAM. > * Leave some nodes with plenty of RAM to run spark. > * Run a job against this cluster with spark.executor.memory asking for all or > most of the memory available on each node. > On the node that has insufficient memory, there will of course be an error > like: > Error occurred during initialization of VM > Could not reserve enough space for object heap > Could not create the Java virtual machine. > On the driver node, and in the spark master UI, I see that _all_ executors > exit or are killed, and the entire job fails. It would be better if there > was an indication of which individual node is actually at fault. It would > also be better if the cluster manager could handle failing-over to nodes that > are still operating properly and have sufficient RAM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org