Hi Bryan! We are using here but in a different way, customized for our environment and using how it is possible the features of CloudStack. In documentation we can see support for some GPU models a little bit old today.
We are using pci passthrough. All hosts with GPU are configured to boot with IOMMU and vfio-pci, not loading kernel modules for each GPU. Then, we create a serviceoffering to describe VMs that will have GPU. In this serviceoffering we use the serviceofferingdetails[1].value field to insert a block of configuration related to the GPU. It is something like "<device> ... <hostdev> ... address type=pci" that describes the PCI bus from each GPU. Then, we use tags to force this computeoffering to run only in hosts with GPUs. We create a Cloudstack cluster with a lot of hosts equipped with GPUs. When a user needs a VM with GPU he/she should use the created computeoffering. VM will be instantiated in some host of the cluster and GPUs are passthrough to VM. There are no control executed by cloudstack. For example, it can try to instantiate a VM in a host when a GPU is already being used (will fail). Our management is that the ROOT admin always controls that creation. We launch all VMs using all GPUs from the infrastructure. Then we use a queue manager to run jobs in those VMs with GPUs. When a user needs a dedicated VM to develop something, we can shutdown a VM already running (that is part of the queue manager as processor node) and then create this dedicated VM, that uses the GPUs isolated. There are some possibilities when using GPUs. For example, some models accept virtualization when we can divide a GPU. In that case, Cloudstack would need to support that, so it would manage the driver, creating the virtual GPUs based on information input from the user, as memory size. Then, it should manage the hypervisor to passthrough the virtual gpu to VM. Another possibility that would help us in our scenario is to make some control about PCI buses in hosts. For example, if Cloustack could check if a PCI is being used in some host and then use this information in VM scheduling, would be great. Cloudstack could launch VMs in a host that has a PCI address free. This would be used not only for GPUs, but any PCI device. I hope this can help in some way, to think of new scenarios etc. Thank you! Em qui., 22 de fev. de 2024 às 07:56, Bryan Tiang <bryantian...@hotmail.com> escreveu: > Hi Guys, > > Anyone running Cloudstack with GPU Support in Production? Say NVIDIA H100 > or AMD M1300X? > > Just want to know if there is any support for this still on going, or > anyone who is running a cloud business with GPUs. > > Regards, > Bryan > -- __________________________ Aviso de confidencialidade Esta mensagem da Empresa Brasileira de Pesquisa Agropecuaria (Embrapa), empresa publica federal regida pelo disposto na Lei Federal no. 5.851, de 7 de dezembro de 1972, e enviada exclusivamente a seu destinatario e pode conter informacoes confidenciais, protegidas por sigilo profissional. Sua utilizacao desautorizada e ilegal e sujeita o infrator as penas da lei. Se voce a recebeu indevidamente, queira, por gentileza, reenvia-la ao emitente, esclarecendo o equivoco. Confidentiality note This message from Empresa Brasileira de Pesquisa Agropecuaria (Embrapa), a government company established under Brazilian law (5.851/72), is directed exclusively to its addressee and may contain confidential data, protected under professional secrecy rules. Its unauthorized use is illegal and may subject the transgressor to the law's penalties. If you are not the addressee, please send it back, elucidating the failure.