[
https://issues.apache.org/jira/browse/MESOS-38?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222992#comment-13222992
]
[email protected] commented on MESOS-38:
----------------------------------------------------
bq. On 2012-03-05 23:04:17, Charles Reiss wrote:
bq. > src/slave/resource_monitor.cpp, line 67
bq. > <https://reviews.apache.org/r/4167/diff/2/?file=88033#file88033line67>
bq. >
bq. > I assume that you've discussed calling this 'mem_usage' and
'cpu_usage' rather than 'mem' and 'cpus' with Ben? Can you explain the
reasoning briefly?
I sorta made this choice unilaterally based on its similarity to the naming for
cgroups infos.
I don't really care what it is called and would certainly be willing to change
it if it would make it more similar to the rest of mesos.
bq. On 2012-03-05 23:04:17, Charles Reiss wrote:
bq. > src/slave/slave.cpp, line 1446
bq. > <https://reviews.apache.org/r/4167/diff/2/?file=88035#file88035line1446>
bq. >
bq. > How is this deleted? Does this actually need to be on the heap?
no. before collect, I was doing things on the heap, but now I am using copying
instead. thanks for catching this memory leak!
bq. On 2012-03-05 23:04:17, Charles Reiss wrote:
bq. > src/slave/slave.cpp, line 1448
bq. > <https://reviews.apache.org/r/4167/diff/2/?file=88035#file88035line1448>
bq. >
bq. > Suggest foreachpair
done
bq. On 2012-03-05 23:04:17, Charles Reiss wrote:
bq. > src/slave/slave.cpp, line 1469
bq. > <https://reviews.apache.org/r/4167/diff/2/?file=88035#file88035line1469>
bq. >
bq. > const UsageMessage&
done
bq. On 2012-03-05 23:04:17, Charles Reiss wrote:
bq. > src/slave/http.cpp, line 76
bq. > <https://reviews.apache.org/r/4167/diff/2/?file=88024#file88024line76>
bq. >
bq. > Why would we bother adding this before it does anything useful?
This was the start of displaying the usage through the webui.
I just bundled this in because it is a work-in-progress review (not for
committing). For a future review that is more committable, this would not be
part of it.
Only stuff in src/monitoring and the non-webui-related changes in src/slave
would be part of the eventual committable review.
- Sam
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4167/#review5605
-----------------------------------------------------------
On 2012-03-06 04:54:32, Sam Whitlock wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4167/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-06 04:54:32)
bq.
bq.
bq. Review request for mesos, Benjamin Hindman and Charles Reiss.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. This mega-patch is intended to represent the partial completion of the
slave monitoring functionality. It is not intended to be committed. Changes
based on comments in this review will be reflected in future reviews that are
smaller and more modular.
bq.
bq. Proc utils is included in this patch, but is already under review here:
https://reviews.apache.org/r/3050/
bq.
bq. The relevant design doc can be found here:
https://docs.google.com/document/d/14Wj9i6TpMR6cV3LL0ySjLjfZOq5QOQeybvt1gSyaQGs/edit
bq.
bq. The following items are ones where specific feedback is requested:
bq.
bq. * A better mechanism is needed to control the rate at which the slave asks
each executor for its UsageMessage. This is currently hard-coded to be at 1
second intervals, but could potentially be read as a command-line option or
from a config file. Is there a better or different way to pass in this value?
bq. * Currently, UsageMessages are passed from a ResourceMonitor to the Slave
using the Future construct, and used as containers that hold a snapshot of the
latest usage. This is to prevent unnecessary marshalling and extra data
structures, since messages will eventually be sent in the standard dispatch
style from the slave to the master. Is it fine that we are using Protobuf
messages in this way?
bq.
bq. There are several changes that are not yet implemented in this patch.
These changes are as follows:
bq.
bq. * Sufficient tests cases have not yet been written for any component
(resource monitor, lxc collector, and process collector).
bq. * Code has not been cleaned up to adhere to all style recommendations.
bq. * Process collector code needs to be updated to prevent CPU usage spikes
when monitored sub-processes die.
bq. * Code to send UsageMessages from the slave to the master.
bq.
bq.
bq. This addresses bug MESOS-38.
bq. https://issues.apache.org/jira/browse/MESOS-38
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/Makefile.am 1137a3e
bq. src/master/allocator.hpp 1ac435b
bq. src/master/http.cpp 591433a
bq. src/master/master.hpp 53551b0
bq. src/master/master.cpp 1d3961e
bq. src/messages/messages.proto 11a2c41
bq. src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION
bq. src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION
bq. src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION
bq. src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION
bq. src/monitoring/linux/proc_utils.hpp PRE-CREATION
bq. src/monitoring/linux/proc_utils.cpp PRE-CREATION
bq. src/monitoring/process_resource_collector.hpp PRE-CREATION
bq. src/monitoring/process_resource_collector.cpp PRE-CREATION
bq. src/monitoring/process_stats.hpp PRE-CREATION
bq. src/monitoring/resource_collector.hpp PRE-CREATION
bq. src/slave/http.cpp f03815d
bq. src/slave/isolation_module.hpp c896908
bq. src/slave/isolation_module.cpp 5b7b4a2
bq. src/slave/lxc_isolation_module.hpp b7beefe
bq. src/slave/lxc_isolation_module.cpp d544625
bq. src/slave/main.cpp ac780c4
bq. src/slave/process_based_isolation_module.hpp f6f9554
bq. src/slave/process_based_isolation_module.cpp 100b1e3
bq. src/slave/resource_monitor.hpp PRE-CREATION
bq. src/slave/resource_monitor.cpp PRE-CREATION
bq. src/slave/slave.hpp b1a07e9
bq. src/slave/slave.cpp ce8fda5
bq. src/tests/Makefile.in 6f51be4
bq. src/tests/proc_utils_tests.cpp PRE-CREATION
bq. src/tests/process_resource_collector_tests.cpp PRE-CREATION
bq. src/tests/resource_monitor_tests.cpp PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/4167/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Test cases:
bq. * A test case exercising the basic monitoring code with a mocked-out
collector.
bq. * The first of several tests for the process resource monitor, with the
proc-based collecting mocked out.
bq.
bq. Some ad-hoc testing with log statements to ensure that the monitoring
works end-to-end from both the container-based and process-based isolation
modules.
bq.
bq.
bq. Thanks,
bq.
bq. Sam
bq.
bq.
> Executor resource monitoring and local reporting of usage stats
> ---------------------------------------------------------------
>
> Key: MESOS-38
> URL: https://issues.apache.org/jira/browse/MESOS-38
> Project: Mesos
> Issue Type: New Feature
> Components: isolation, slave
> Environment: Initial executor monitoring for linux only. Dummy
> monitoring capability (no-op) for OSX, with functionality to be filled in
> later.
> Reporter: Sam Whitlock
> Assignee: Sam Whitlock
> Labels: monitoring
>
> Implement reporting of resource usage on executors and log them to a local
> log file (for now). The eventual usage of this will be to report these
> statistics to the Mesos master in order to build either or both a timeline
> for the webui and/or a top-like command-line interface. This improvement
> ticket is just for the local monitoring and log file reporting. A reporting
> system (to the master node) will be a later improvement ticket.
> With the current version of Mesos, it is not possible to monitor individual
> tasks. Therefore the best this sort of system can do is monitor the usage of
> an individual executor and aggregate the resource usage of over the
> executor's tasks and resource allocations. If frameworks have a 1-to-1
> relationship of a job to an executor, then the aggregate statistics will be
> more meaningful.
> Reporting will be available for both lxc isolation and process-based
> isolation. For lxc isolation the task is easier because of the isolation
> facilities of lxc. Process-based isolation is more difficult as processes can
> become re-parented from the process tree of the executor (e.g. double fork).
> The session ID and the process group ID will likely still be the same as that
> of the executor except for the uncommon case of the process resetting both of
> those.
> When usage statistics are eventually reported to the Mesos master, it may be
> possible to use them to oversubscribe slave nodes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira