+1 On Mon, Oct 9, 2017 at 10:56 AM, James Peach <jor...@gmail.com> wrote:
> Hi all, > > In https://reviews.apache.org/r/62644/, I am proposing to add an optional > Resources field to the TaskStatus message named `limited_resources`. > > In the case that a task is killed because it violated a resource > constraint (ie. the reason field is REASON_CONTAINER_LIMITATION, > REASON_CONTAINER_LIMITATION_DISK or REASON_CONTAINER_LIMITATION_MEMORY), > this field may be populated with the resource that triggered the > limitation. This is intended to give better information to schedulers about > task resource failures, in the expectation that it will help them bubble > useful information up to the user or a monitoring system. > > diff --git a/include/mesos/v1/mesos.proto b/include/mesos/v1/mesos.proto > index d742adbbf..559d09e37 100644 > --- a/include/mesos/v1/mesos.proto > +++ b/include/mesos/v1/mesos.proto > @@ -2252,6 +2252,13 @@ message TaskStatus { > // status updates for tasks running on agents that are unreachable > // (e.g., partitioned away from the master). > optional TimeInfo unreachable_time = 14; > + > + // If the reason field indicates a container resource limitation, > + // this field contains the resource whose limits were violated. > + // > + // NOTE: 'Resources' is used here because the resource may span > + // multiple roles (e.g. `"mem(*):1;mem(role):2"`). > + repeated Resource limited_resources = 16; > } > > > > cheers, > James > > >