> On Nov. 3, 2014, 10:13 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, line 2952 > > <https://reviews.apache.org/r/26382/diff/11/?file=746449#file746449line2952> > > > > So, after thinking a bit more and looking at MESOS-343, what we really > > want here is to set the reason for failed as oom or memory limit > > (REASON_MEMORY_LIMIT). Slave generates TASK_FAILED here not because the > > executor terminated but becaused it oomed (and in the future due to disk > > limit etc). Note that the slave also generates TASK_FAILED when command > > executor unexpectedly terminates. This is because, under normal conditions, > > command executor is not expected to terminate until the task finishes. For > > this case I think REASON_INVALID_COMMAND or REASON_BAD_COMMAND or > > REASON_COMMAND_FAILED is a better reason than REASON_EXECUTOR_TERMINATED > > because users have no idea about executors when using command tasks. > > > > The right way to do this is to plumb the reason via the Termination > > protobuf. That way any isolator can include the right reason if its > > corresponding limit is reached (memory, disk etc). If you want to punt on > > the plumbing, please add a TODO and tracking ticket. > > Dominic Hamon wrote: > if termination is set to 'killed' then should it be TASK_KILLED instead > of TASK_FAILED? > > I think this needs a TODO as it changes the plumbing quite a bit (and > tests, etc). For now, is REASON_EXECUTOR_TERMINATED accurate enough? I could > add something for command vs containerizer vs executor, but that starts to > require more plumbing pretty quickly. > > Vinod Kone wrote: > ``` > if termination is set to 'killed' then should it be TASK_KILLED instead > of TASK_FAILED? > ``` > > No. It should be TASK_FAILED. Those are the semantics. > > ``` > I think this needs a TODO as it changes the plumbing quite a bit (and > tests, etc) > ``` > > Which one? Plumbing via the termination protobuf or setting > REASON_FAILED_COMMAND and REASON_MEMORY_LIMIT? The latter could be done > without any plumbing; all changes will be in executorTerminated(). > > Dominic Hamon wrote: > this assumes that the termination reason is memory limit though, right? > > and we still don't have a reason for TASK_LOST.
``` this assumes that the termination reason is memory limit though, right? ``` thats correct, which is true as of today. ``` and we still don't have a reason for TASK_LOST. ``` TASK_LOST should be REASON_EXECUTOR_TERMINATED as you already had. That should only apply for non-terminal tasks belonging to a terminated (non-oomed and non-command) executor. - Vinod ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26382/#review59654 ----------------------------------------------------------- On Oct. 31, 2014, 10:09 p.m., Dominic Hamon wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/26382/ > ----------------------------------------------------------- > > (Updated Oct. 31, 2014, 10:09 p.m.) > > > Review request for mesos, Vinod Kone and Bill Farner. > > > Bugs: MESOS-1830 and MESOS-343 > https://issues.apache.org/jira/browse/MESOS-1830 > https://issues.apache.org/jira/browse/MESOS-343 > > > Repository: mesos-git > > > Description > ------- > > Added source and reason, enabled TASK_ERROR, and made the changes necessary > throughout the codebase. > > > Diffs > ----- > > include/mesos/mesos.proto 168a7a8c35ed1bf3f5bd6d7431b1e511bae7b789 > src/common/protobuf_utils.hpp 212d5124b9a4cc58e61719fa7f07a61cd166e834 > src/common/protobuf_utils.cpp a9b65e328c4c62bff7fbf5633dda25d742d79019 > src/examples/balloon_framework.cpp b05d5679fe2915142907af0b2dc00c6cd76eb9c1 > src/examples/java/TestFramework.java > bc593d0abfacb00690b1492b2b82c970f4e4de6d > src/examples/low_level_scheduler_libprocess.cpp > 7ef5ea78ade4ed856b97009fdfe31281f0a55c17 > src/examples/low_level_scheduler_pthread.cpp > 6e233a10117a1c7aa669806b5b430e746e227ee5 > src/examples/no_executor_framework.cpp > f98a0735b9f287e7f1bf98af6c2e9a47ca6a77b2 > src/examples/test_framework.cpp 187a611ebfe35cb13ee48aa5eca934cf55f34dea > src/master/master.cpp 762d2ff6c168ac212f70b43275692a77496a7fcd > src/sched/sched.cpp 0fb8c7bda75545389f8024489b3c76ae115111f4 > src/slave/slave.cpp 96fb5f7385b0762d46d8129f7e43207bd6311644 > src/tests/fault_tolerance_tests.cpp > a18a41a3e34ff112e04e693447d757403e5013bd > src/tests/master_authorization_tests.cpp > 652e80d0d4567b225c6ffb326725ddfde06f7fd3 > src/tests/master_tests.cpp 2e525749247626c05efb2f54a707599facb114b6 > src/tests/resource_offers_tests.cpp > fe66432b1bf75ee25feb73c4bb353e4d4e5b503f > > Diff: https://reviews.apache.org/r/26382/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Dominic Hamon > >