[ 
https://issues.apache.org/jira/browse/MESOS-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Whitlock updated MESOS-38:
------------------------------

    Description: 
Implement reporting of resource usage on executors and log them to a local log 
file (for now). The eventual usage of this will be to report these statistics 
to the Mesos master in order to build either or both a timeline for the webui 
and/or a top-like command-line interface. This improvement ticket is just for 
the local monitoring and log file reporting. A reporting system (to the master 
node) will be a later improvement ticket.

With the current version of Mesos, it is not possible to monitor individual 
tasks. Therefore the best this sort of system can do is monitor the usage of an 
individual executor and aggregate the resource usage of over the executor's 
tasks and resource allocations. If frameworks have a 1-to-1 relationship of a 
job to an executor, then the aggregate statistics will be more meaningful.

Reporting will be available for both lxc isolation and process-based isolation. 
For lxc isolation the task is easier because of the isolation facilities of 
lxc. Process-based isolation is more difficult as processes can become 
re-parented from the process tree of the executor (e.g. double fork). The 
session ID and the process group ID will likely still be the same as that of 
the executor except for the uncommon case of the process resetting both of 
those.

Initial reporting will be to a local log file. This will be a 'heartbeat' style 
akin to pidstat output (in sysstat library). This may not be incredibly useful, 
but local monitoring of resource usage is separate from the reporting and 
timeline building mentioned above.

When usage statistics are eventually reported to the Mesos master, it may be 
possible to use them to oversubscribe slave nodes.

  was:
Implement reporting of resource usage on executors and log them to a local log 
file (for now). The eventual usage of this will be to report these statistics 
to the Mesos master in order to build either or both a timeline for the webui 
and/or a top-like command-line interface. This improvement ticket is just for 
the local monitoring and log file reporting. A reporting system (to the master 
node) will be a later improvement ticket.

With the current version of Mesos, it is not possible to monitor individual 
tasks. Therefore the best this sort of system can do is monitor the usage of an 
individual executor and aggregate the resource usage of over the executor's 
tasks and resource allocations. If frameworks have a 1-to-1 relationship of a 
job to an executor, then the aggregate statistics will be more meaningful.

Reporting will be available for both lxc isolation and process-based isolation. 
For lxc isolation the task is easier because of the isolation facilities of 
lxc. Process-based isolation is more difficult as processes can become 
re-parented from the process tree of the executor (e.g. double fork). The 
session ID and the process group ID will likely still be the same as that of 
the executor except for the uncommon case of the process resetting both of 
those.

Initial reporting will be to a local log file. This will be a 'heartbeat' style 
akin to pidstat output (in sysstat library). This may not be incredibly useful, 
but local monitoring of resource usage is separate from the reporting and 
timeline building mentioned above.

    
> Executor resource monitoring and local reporting of usage stats
> ---------------------------------------------------------------
>
>                 Key: MESOS-38
>                 URL: https://issues.apache.org/jira/browse/MESOS-38
>             Project: Mesos
>          Issue Type: New Feature
>         Environment: Initial executor monitoring for linux only. Dummy 
> monitoring capability (no-op) for OSX, with functionality to be filled in 
> later.
>            Reporter: Sam Whitlock
>              Labels: monitoring
>
> Implement reporting of resource usage on executors and log them to a local 
> log file (for now). The eventual usage of this will be to report these 
> statistics to the Mesos master in order to build either or both a timeline 
> for the webui and/or a top-like command-line interface. This improvement 
> ticket is just for the local monitoring and log file reporting. A reporting 
> system (to the master node) will be a later improvement ticket.
> With the current version of Mesos, it is not possible to monitor individual 
> tasks. Therefore the best this sort of system can do is monitor the usage of 
> an individual executor and aggregate the resource usage of over the 
> executor's tasks and resource allocations. If frameworks have a 1-to-1 
> relationship of a job to an executor, then the aggregate statistics will be 
> more meaningful.
> Reporting will be available for both lxc isolation and process-based 
> isolation. For lxc isolation the task is easier because of the isolation 
> facilities of lxc. Process-based isolation is more difficult as processes can 
> become re-parented from the process tree of the executor (e.g. double fork). 
> The session ID and the process group ID will likely still be the same as that 
> of the executor except for the uncommon case of the process resetting both of 
> those.
> Initial reporting will be to a local log file. This will be a 'heartbeat' 
> style akin to pidstat output (in sysstat library). This may not be incredibly 
> useful, but local monitoring of resource usage is separate from the reporting 
> and timeline building mentioned above.
> When usage statistics are eventually reported to the Mesos master, it may be 
> possible to use them to oversubscribe slave nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to