Hi Bryan! We are using here but in a different way, customized for our
environment and using how it is possible the features of CloudStack. In
documentation we can see support for some GPU models a little bit old
today.

We are using pci passthrough. All hosts with GPU are configured to boot
with IOMMU and vfio-pci, not loading kernel modules for each GPU.

Then, we create a serviceoffering to describe VMs that will have GPU. In
this serviceoffering we use the serviceofferingdetails[1].value field to
insert a block of configuration related to the GPU. It is something like
"<device> ... <hostdev> ... address type=pci" that describes the PCI bus
from each GPU. Then, we use tags to force this computeoffering to run only
in hosts with GPUs.

We create a Cloudstack cluster with a lot of hosts equipped with GPUs. When
a user needs a VM with GPU he/she should use the created computeoffering.
VM will be instantiated in some host of the cluster and GPUs are
passthrough to VM.

There are no control executed by cloudstack. For example, it can try to
instantiate a VM in a host when a GPU is already being used (will fail).
Our management is that the ROOT admin always controls that creation. We
launch all VMs using all GPUs from the infrastructure. Then we use a queue
manager to run jobs in those VMs with GPUs. When a user needs a dedicated
VM to develop something, we can shutdown a VM already running (that is part
of the queue manager as processor node) and then create this dedicated VM,
that uses the GPUs isolated.

There are some possibilities when using GPUs. For example, some models
accept virtualization when we can divide a GPU. In that case, Cloudstack
would need to support that, so it would manage the driver, creating the
virtual GPUs based on information input from the user, as memory size.
Then, it should manage the hypervisor to passthrough the virtual gpu to VM.

Another possibility that would help us in our scenario is to make some
control about PCI buses in hosts. For example, if Cloustack could check if
a PCI is being used in some host and then use this information in VM
scheduling, would be great. Cloudstack could launch VMs in a host that has
a PCI address free. This would be used not only for GPUs, but any PCI
device.

I hope this can help in some way, to think of new scenarios etc.

Thank you!

Em qui., 22 de fev. de 2024 às 07:56, Bryan Tiang <bryantian...@hotmail.com>
escreveu:

> Hi Guys,
>
> Anyone running Cloudstack with GPU Support in Production? Say NVIDIA H100
> or AMD M1300X?
>
> Just want to know if there is any support for this still on going, or
> anyone who is running a cloud business with GPUs.
>
> Regards,
> Bryan
>

-- 
__________________________
Aviso de confidencialidade

Esta mensagem da 
Empresa  Brasileira de Pesquisa  Agropecuaria (Embrapa), empresa publica 
federal  regida pelo disposto  na Lei Federal no. 5.851,  de 7 de dezembro 
de 1972,  e  enviada exclusivamente  a seu destinatario e pode conter 
informacoes  confidenciais, protegidas  por sigilo profissional.  Sua 
utilizacao desautorizada  e ilegal e  sujeita o infrator as penas da lei. 
Se voce  a recebeu indevidamente, queira, por gentileza, reenvia-la ao 
emitente, esclarecendo o equivoco.

Confidentiality note

This message from 
Empresa  Brasileira de Pesquisa  Agropecuaria (Embrapa), a government 
company  established under  Brazilian law (5.851/72), is directed 
exclusively to  its addressee  and may contain confidential data,  
protected under  professional secrecy  rules. Its unauthorized  use is 
illegal and  may subject the transgressor to the law's penalties. If you 
are not the addressee, please send it back, elucidating the failure.

Reply via email to