Boris-Michel Deschenes wrote:
> John,
>
> Sorry for my late response..
>
> It would be great to collaborate, like I said, I prefer to keep the libvirt 
> layer as it works great with openstack and many other techs (collectd, 
> virt-manager, etc.), the virsh tool is also very useful for us.
>
> You say:
> -----------
> We have GPU passthrough working with NVIDIA GPUs in Xen 4.1.2, if I recall 
> correctly.  We don't yet have a stable Xen + Libvirt installation working, 
> but we're looking at it.  Perhaps it would be worth collaborating since it 
> sounds like this could be a win for both of us.
> -----------
> I have Jim Fehlig in CC since this could be of interest to him.
>
> We managed to have the GPU passthrough of NVIDIA cards using Xen 4.1.2 but 
> ONLY with the xenapi (actually the whole XCP toolstack), with libvirt/Xen 
> 4.1.2 and even libvirt/Xen 4.1.3, I only manage to apss through radeon GPUs, 
> the reason could be:
>
> 1. The inability to pass the gfx_passthru parameter through libvirt (IIRC 
> this parameter passes the PCI device as the main VGA card and not a second 
> one).
> 2. Bad FLR reset  support (or other PCI low-level function) from the NVIDIA 
> boards
>   

I've noticed this issue with some Broadcom multifunction nics.  No FLR,
so fallback to secondary bus reset, which is problematic if another
function is being used by a different vm.

> 3. something else entirely.
>
> Anyway, like I said, this GPU passthrough of nvidia worked well with XCP 
> using xenapi but not with libvirt/Xen
>   

Hmm, would be nice to get that fixed.  To date, I haven't tried GPU
passthrough with Xen so I'm not familiar with the issues.

> Now, as for the libvirt/Xen setup we have, I don't know if I would call it 
> stable but it does the job as a POC cloud and is actually used by real people 
> with real GPU needs (for example developing on OpenCL 1.2), the main thing is 
> that it seamlessly integrates with openstack (because of libvirt) and  with 
> the instance_type_extra_specs, you can actually add a couple of these 
> "special" nodes to an existing plain KVM cloud and they will receive the 
> instances requesting GPUs without any problem.
>
> the setup:
> (this only refers to compute nodes as controller nodes are un-modified)
>
> 1. Install Centos 6.2 and make your own project Zeus (transforming a centos 
> in Xen) 
> http://www.howtoforge.com/virtualization-with-xen-on-centos-6.2-x86_64-paravirtualization-and-hardware-virtualization
>  (first page only and skip the bridge setup as openstack-nova-compute does 
> this at startup).  You end up with a Xen hypervisor with libvirt, the libvirt 
> patch is actually a single-line config change IIRC.  Pretty straight-forward.
>
> 2. Install openstack-nova from EPEL (so all this refers only to ESSEX, 
> openstack 2012.1)
>
> 3. configure the compute node accordingly (libvirt_type=xen)
>
> That's the first part, at this point, you can spawn a VM, and attach a GPU 
> manually with:
>
> virsh nodedev-dettach pci_0000_02_00_01
> (edit the VM's nova libvirt.xml to add a pci node dev definition like this: 
> http://docs.fedoraproject.org/en-US/Fedora/13/html/Virtualization_Guide/chap-Virtualization-PCI_passthrough.html
>  )
> virsh define libvirt.xml
> virsh start instance-0000000x
>
> Now, this is all manual and we wish to automate this in openstack, so this is 
> what I've done, I currently can launch VMs in my cloud and the passthrough 
> occurs without any intervention.
>
> These files were modified from an original essex installation to make this 
> possible:
>
> (on the controller)
> create a g1.small instance_type with {'free_gpus': '1'} as 
> instance_type_extra_specs
> select the compute_filter filter to enforce extra_specs in scheduling (also 
> the function host_passes of the filter is slightly modified so that it read 
> key>=value instead of key=value... (free_gpus>=1 is good, does not need to be 
> strictly equals to 1)
>   

I think this has already been done for you in Folsom via the
ComputeCapabilitiesFilter and Jinwoo Suh's addition of
instance_type_extra_specs operators.  See commit 90f77d71.

> (on the compute node)
> nova/virt/libvirt/gpu.py
>       a new file that contains functions like detach_all_gpus, get_free_gpus, 
> simple stuff 

Have you considered pushing this upstream?

> using virsh and lspci
> nova/virt/libvirt/connection.py
>       calls gpu.detach_all_gpus on startup (virsh nodedev-dettach)
>       builds the VM libvirt.xml as normal but also adds the pci nodedev 
> definition
>       advertises free_gpus capabilities so that the scheduler gets it through 
> host_state calls
>
> that's about it, with that we get:
>
> 1. compute nodes that detach all GPUS on startup
> 2. compute nodes that advertise the nb of free gpus to the scheduler
> 3. compute nodes that are able to build the VMs libvirt.xml with a valid, 
> free GPU definition when a VM is launched
> 4. controller that runs a scheduler that knows where to send VMs (free_gpus 
> >= 1)
>
> It does the trick for now, with RADEON 6950 I get 100% success, I spawn a VM 
> and in 20 seconds I get a windows 7 with a real GPU available through RDC.
>
> I'll try and get what the problem is regarding NVIDIA passthrough, If I do 
> I'll be sure to inform Jim Fehlig so that we can work this into libvirt.
>   

Yes, please do.

> All this is in openstack essex (2012.1) so I will probably never send the 
> code upstream as most of this has changed if folsom (for example the 
> extra_specs already is different in folsom) but if you want to have a look, 
> let me know.
>   

As mentioned above, I think that one has already been done for you. 
Seems you just need to work on getting your nova/virt/libvirt/gpu.py
addition upstream.

Regards,
Jim


_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to