That's true, if the scheduler waits until the control task is RUNNING before doing anything else, this problem goes away. There's also then no need to rely on the order tasks are launched on the executor.
Thanks everyone! On Tue, Sep 30, 2014 at 5:51 PM, Benjamin Mahler <bmah...@twitter.com.invalid> wrote: > Why can't the executor just commit suicide if all running tasks are killed? > If you're simultaneously launching two tasks for each executor, you'll only > see this race if you kill very quickly after launching. Your scheduler is > informed when both tasks are running as well, so that could gate the flexing > down. > Anything I'm missing? > Sent from my iPhone >> On Sep 30, 2014, at 12:42 AM, Tom Arnfeld <t...@duedil.com> wrote: >> >> Thanks Vinod. I missed that issue when searching! >> >> >> I did consider sending a shutdown task, though my worry was that there may >> be cases where the task might not launch. Perhaps due to resource starvation >> and/or no offers being received. Presumably it would not be correct to store >> the original OfferId and launch a new task from that offer, as it *could* be >> days old. >> >>> On Tue, Sep 30, 2014 at 2:10 AM, Vinod Kone <vinodk...@gmail.com> wrote: >>> >>> Adding a shutdownExecutor() driver call has been discussed before. >>> https://issues.apache.org/jira/browse/MESOS-330 >>> As a work around, have you considered sending a special "kill" task as a >>> signal to the executor to commit suicide? >>>> On Mon, Sep 29, 2014 at 5:27 PM, Tom Arnfeld <t...@duedil.com> wrote: >>>> Hi, >>>> >>>> I've been making some modifications to the Hadoop framework recently and >>>> have come up against a brick wall. I'm wondering if the concept of killing >>>> an executor from a framework has been discussed before? >>>> >>>> Currently we are launching two tasks for each Hadoop TaskTracker, one that >>>> has a bit of CPU and all the memory, and then another with the rest of the >>>> CPU. In total this equals the amount of resources we want to give each >>>> TaskTracker. This is *kind of* how spark works, ish. >>>> >>>> The reason we do this is to be able to free up CPU resources and remove >>>> slots from a TaskTracker (killing it half dead) but keeping the executor >>>> alive. At some undefined point in the future we then want to kill the >>>> executor, this happens by killing the other "control" task. >>>> >>>> This approach doesn't work very well in practice as a result of >>>> https://issues.apache.org/jira/browse/MESOS-1812 which means tasks are not >>>> launched in order on the slave, so there is no way to guarantee the control >>>> task comes up first, which leads to all sorts of interesting races. >>>> >>>> Is this is bad road to go down? I can't use framework messages as I don't >>>> believe those are a reliable way of sending signals, so not sure where else >>>> to turn. >>>> >>>> Cheers, >>>> >>>> Tom.