another way would be to just use cadvisor
> On 22 Mar 2019, at 08:35, Jorge Machado <jom...@me.com.INVALID> wrote:
>
> Hi Mesos devs,
>
> In our use case from mesos we need to get gpu resource usage per task and
> build dashboards on grafana for it. Getting the metrics to Grafana we will
> send the metrics to prometheus the main problem is how to get the metrics in
> a reliable way.
> I proposing the following:
>
> Changing the mesos.proto and mesos.proto under v1 and on ResourceStatistics
> message add:
>
> //GPU statistics for each container
> optional int32 gpu_idx = 50;
> optional string gpu_uuid = 51;
> optional string device_name = 52;
> optional uint64 gpu_memory_used_mb = 53;
> optional uint64 gpu_memory_total_mb = 54;
> optional double gpu_usage = 55;
> optional int32 gpu_temperature = 56;
> optional int32 gpu_frequency_MHz = 57;
> optional int32 gpu_power_used_W = 58;
>
> For starters I would like to change NvidiaGpuIsolatorProcess at isolator.cpp
> and there get the nvml call for the usage method. As I’m new to this I need
> some guidelines please.
>
> My questions:
>
> Does the NvidiaGpuIsolatorProcess runs already inside the container or just
> outside in the agent ? (I’m assuming outside)
> From what I saw on the cpu metrics they are gathered inside the container for
> the gpu we could do it in the NvidiaGpuIsolatorProcess and get the metrics
> via the host.
> Anything more that I should check ?
>
> Thanks a lot
>
> Jorge Machado
> www.jmachado.me
>
>
>
>
>