> On Nov. 3, 2014, 10:13 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 2952
> > <https://reviews.apache.org/r/26382/diff/11/?file=746449#file746449line2952>
> >
> >     So, after thinking a bit more and looking at MESOS-343, what we really 
> > want here is to set the reason for failed as oom or memory limit 
> > (REASON_MEMORY_LIMIT). Slave generates TASK_FAILED here not because the 
> > executor terminated but becaused it oomed (and in the future due to disk 
> > limit etc). Note that the slave also generates TASK_FAILED when command 
> > executor unexpectedly terminates. This is because, under normal conditions, 
> > command executor is not expected to terminate until the task finishes. For 
> > this case I think REASON_INVALID_COMMAND or REASON_BAD_COMMAND or 
> > REASON_COMMAND_FAILED is a better reason than REASON_EXECUTOR_TERMINATED 
> > because users have no idea about executors when using command tasks. 
> >     
> >     The right way to do this is to plumb the reason via the Termination 
> > protobuf. That way any isolator can include the right reason if its 
> > corresponding limit is reached (memory, disk etc). If you want to punt on 
> > the plumbing, please add a TODO and tracking ticket.
> 
> Dominic Hamon wrote:
>     if termination is set to 'killed' then should it be TASK_KILLED instead 
> of TASK_FAILED?
>     
>     I think this needs a TODO as it changes the plumbing quite a bit (and 
> tests, etc). For now, is REASON_EXECUTOR_TERMINATED accurate enough? I could 
> add something for command vs containerizer vs executor, but that starts to 
> require more plumbing pretty quickly.
> 
> Vinod Kone wrote:
>     ```
>     if termination is set to 'killed' then should it be TASK_KILLED instead 
> of TASK_FAILED?
>     ```
>     
>     No. It should be TASK_FAILED. Those are the semantics.
>     
>     ```
>     I think this needs a TODO as it changes the plumbing quite a bit (and 
> tests, etc)
>     ```
>     
>     Which one? Plumbing via the termination protobuf or setting 
> REASON_FAILED_COMMAND and REASON_MEMORY_LIMIT? The latter could be done 
> without any plumbing; all changes will be in executorTerminated().
> 
> Dominic Hamon wrote:
>     this assumes that the termination reason is memory limit though, right?
>     
>     and we still don't have a reason for TASK_LOST.

```
this assumes that the termination reason is memory limit though, right?
```
thats correct, which is true as of today.

```
and we still don't have a reason for TASK_LOST.
```

TASK_LOST should be REASON_EXECUTOR_TERMINATED as you already had. That should 
only apply for non-terminal tasks belonging to a terminated (non-oomed and 
non-command) executor.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26382/#review59654
-----------------------------------------------------------


On Oct. 31, 2014, 10:09 p.m., Dominic Hamon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26382/
> -----------------------------------------------------------
> 
> (Updated Oct. 31, 2014, 10:09 p.m.)
> 
> 
> Review request for mesos, Vinod Kone and Bill Farner.
> 
> 
> Bugs: MESOS-1830 and MESOS-343
>     https://issues.apache.org/jira/browse/MESOS-1830
>     https://issues.apache.org/jira/browse/MESOS-343
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added source and reason, enabled TASK_ERROR, and made the changes necessary 
> throughout the codebase.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 168a7a8c35ed1bf3f5bd6d7431b1e511bae7b789 
>   src/common/protobuf_utils.hpp 212d5124b9a4cc58e61719fa7f07a61cd166e834 
>   src/common/protobuf_utils.cpp a9b65e328c4c62bff7fbf5633dda25d742d79019 
>   src/examples/balloon_framework.cpp b05d5679fe2915142907af0b2dc00c6cd76eb9c1 
>   src/examples/java/TestFramework.java 
> bc593d0abfacb00690b1492b2b82c970f4e4de6d 
>   src/examples/low_level_scheduler_libprocess.cpp 
> 7ef5ea78ade4ed856b97009fdfe31281f0a55c17 
>   src/examples/low_level_scheduler_pthread.cpp 
> 6e233a10117a1c7aa669806b5b430e746e227ee5 
>   src/examples/no_executor_framework.cpp 
> f98a0735b9f287e7f1bf98af6c2e9a47ca6a77b2 
>   src/examples/test_framework.cpp 187a611ebfe35cb13ee48aa5eca934cf55f34dea 
>   src/master/master.cpp 762d2ff6c168ac212f70b43275692a77496a7fcd 
>   src/sched/sched.cpp 0fb8c7bda75545389f8024489b3c76ae115111f4 
>   src/slave/slave.cpp 96fb5f7385b0762d46d8129f7e43207bd6311644 
>   src/tests/fault_tolerance_tests.cpp 
> a18a41a3e34ff112e04e693447d757403e5013bd 
>   src/tests/master_authorization_tests.cpp 
> 652e80d0d4567b225c6ffb326725ddfde06f7fd3 
>   src/tests/master_tests.cpp 2e525749247626c05efb2f54a707599facb114b6 
>   src/tests/resource_offers_tests.cpp 
> fe66432b1bf75ee25feb73c4bb353e4d4e5b503f 
> 
> Diff: https://reviews.apache.org/r/26382/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Dominic Hamon
> 
>

Reply via email to