> On Oct 9, 2017, at 7:15 PM, Wil Yegelwel <wyegel...@gmail.com> wrote:
> 
> Is it correct to say that the limited resource field is *only* meant to 
> provide machine readable information about what resources limits were 
> exceeded?

Yes,

> If so, does it make sense to provide richer reporting fields for all failure 
> reasons? I imagine other failure reasons could benefit from being able to 
> report details of the failure that are machine readable.

Some other reasons already have their own structured information, eg. the 
TASK_UNREACHABLE state populates the `unreachable_time` field. I'm not planning 
to add structured information to any other failure reasons, but I'd support 
doing it if you have a specific suggestion.

> On Mon, Oct 9, 2017, 3:50 PM James Peach <jor...@gmail.com> wrote:
> 
> > On Oct 9, 2017, at 1:27 PM, Vinod Kone <vinodk...@apache.org> wrote:
> >
> >> In the case that a task is killed because it violated a resource
> >> constraint (ie. the reason field is REASON_CONTAINER_LIMITATION,
> >> REASON_CONTAINER_LIMITATION_DISK or REASON_CONTAINER_LIMITATION_MEMORY),
> >> this field may be populated with the resource that triggered the
> >> limitation. This is intended to give better information to schedulers about
> >> task resource failures, in the expectation that it will help them bubble
> >> useful information up to the user or a monitoring system.
> >>
> >
> > Can you elaborate what schedulers are expected to do with this information?
> > Looking for some concrete use cases if you can.
> 
> There's no concrete use case here; it's just a matter of propagating 
> information we know in a structured way.
> 
> If we assume that the scheduler knows about some sort of monitoring system or 
> has a UI, we can present this to the user or a system that can take action on 
> it. The status quo is that the raw message string is dumped to logs, and has 
> to be manually interpreted.
> 
> Additionally, this can pave the way to getting rid of 
> REASON_CONTAINER_LIMITATION_DISK and REASON_CONTAINER_LIMITATION_MEMORY. All 
> you really need is REASON_CONTAINER_LIMITATION plus the resource information.
> 
> J
> 

Reply via email to