Re: [MESOS-8248] - Expose information about GPU assigned to a task

Jorge Machado Fri, 22 Mar 2019 03:59:10 -0700

another way would be to just use cadvisor

> On 22 Mar 2019, at 08:35, Jorge Machado <jom...@me.com.INVALID> wrote:
> 
> Hi Mesos devs, 
> 
> In our use case from mesos we need to get gpu resource usage per task and 
> build dashboards on grafana for it.  Getting the metrics to Grafana we will 
> send the metrics to prometheus the main problem is how to get the metrics in 
> a reliable way. 
> I proposing the following: 
> 
> Changing the mesos.proto and mesos.proto under v1 and on ResourceStatistics 
> message add: 
> 
> //GPU statistics for each container
> optional int32 gpu_idx = 50;
> optional string gpu_uuid = 51;
> optional string device_name = 52;
> optional uint64 gpu_memory_used_mb = 53;
> optional uint64 gpu_memory_total_mb = 54;
> optional double gpu_usage = 55;
> optional int32 gpu_temperature = 56;
> optional int32 gpu_frequency_MHz = 57;
> optional int32 gpu_power_used_W = 58;
> 
> For starters I would like to change NvidiaGpuIsolatorProcess at isolator.cpp 
> and there get the nvml call for the usage method. As I’m new to this I need 
> some guidelines please. 
> 
> My questions:  
> 
> Does the NvidiaGpuIsolatorProcess runs already inside the container or just 
> outside in the agent ? (I’m assuming outside)
> From what I saw on the cpu metrics they are gathered inside the container for 
> the gpu we could do it in the NvidiaGpuIsolatorProcess and get the metrics 
> via the host. 
> Anything more that I should check ? 
> 
> Thanks a lot
> 
> Jorge Machado
> www.jmachado.me
> 
> 
> 
> 
>

Re: [MESOS-8248] - Expose information about GPU assigned to a task

Reply via email to