Document updated to talk about network aware scheduling ( https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit#- section just before the use case list).
Based on yesterday's meeting, rkukura would also like to see network-aware scheduling to work for non-PCI cases - where servers are not necessarily connected to every physical segment and machines therefore need placing based on where they can reach the networks they need. I think this is an exact parallel to the PCI case, except that we're also constrained by a count of resources (you can connect an infinite number of VMs to a software bridge, of course). We should implement the scheduling changes as a separate batch of work that solves both problems, if we can - and this works with the two step approach, because step 1 brings us up to Neutron parity and step 2 will add network-aware scheduling for both PCI and non-PCI cases. -- Ian. On 20 January 2014 13:38, Ian Wells <ijw.ubu...@cack.org.uk> wrote: > On 20 January 2014 09:28, Irena Berezovsky <ire...@mellanox.com> wrote: > >> Hi, >> Having post PCI meeting discussion with Ian based on his proposal >> https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1# >> , >> I am not sure that the case that quite usable for SR-IOV based >> networking is covered well by this proposal. The understanding I got is >> that VM can land on the Host that will lack suitable PCI resource. >> > > The issue we have is if we have multiple underlying networks in the system > and only some Neutron networks are trunked on the network that the PCI > device is attached to. This can specifically happen in the case of > provider versus trunk networks, though it's very dependent on the setup of > your system. > > The issue is that, in the design we have, Neutron at present has no input > into scheduling, and also that all devices in a flavor are precisely > equivalent. So if I say 'I want a 10G card attached to network X' I will > get one of the cases in the 10G flavor with no regard as to whether it can > actually attach to network X. > > I can see two options here: > > 1. What I'd do right now is I would make it so that a VM that is given an > unsuitable network card fails to run in nova-compute when Neutorn discovers > it can't attach the PCI device to the network. This will get us a lot of > use cases and a Neutron driver without solving the problem elegantly. > You'd need to choose e.g. a provider or tenant network flavor, mindful of > the network you're connecting to, so that Neutron can actually succeed, > which is more visibility into the workings of Neutron than the user really > ought to need. > > 2. When Nova checks that all the networks exist - which, conveniently, is > in nova-api - it also gets attributes from the networks that can be used by > the scheduler to choose a device. So the scheduler chooses from a flavor > *and*, within that flavor, from a subset of those devices with appopriate > connectivity. If we do this then the Neutron connection code doesn't > change - it should still fail if the connection can't be made - but it > becomes an internal error, since it's now an issue of consistency of > setup. > > To do this, I think we would tell Neutron 'PCI extra-info X should be set > to Y for this provider network and Z for tenant networks' - the precise > implementation would be somewhat up to the driver - and then add the > additional check in the scheduler. The scheduling attributes list would > have to include that attribute. > > Can you please provide an example for the required cloud admin PCI related >> configurations on nova-compute and controller node with regards to the >> following simplified scenario: >> -- There are 2 provider networks (phy1, phy2), each one has associated >> range on vlan-ids >> -- Each compute node has 2 vendor adapters with SR-IOV enabled feature, >> exposing xx Virtual Functions. >> -- Every VM vnic on virtual network on provider network phy1 or phy2 >> should be pci pass-through vnic. >> > > So, we would configure Neutron to check the 'e.physical_network' attribute > on connection and to return it as a requirement on networks. Any PCI on > provider network 'phy1' would be tagged e.physical_network => 'phy1'. When > returning the network, an extra attribute would be supplied (perhaps > something like 'pci_requirements => { e.physical_network => 'phy1'}'. And > nova-api would know that, in the case of macvtap and PCI directmap, it > would need to pass this additional information to the scheduler which would > need to make use of it in finding a device, over and above the flavor > requirements. > > Neutron, when mapping a PCI port, would similarly work out from the > Neutron network the trunk it needs to connect to, and would reject any > mapping that didn't conform. If it did, it would work out how to > encapsulate the traffic from the PCI device and set that up on the PF of > the port. > > I'm not saying this is the only or best solution, but it does have the > advantage that it keeps all of the networking behaviour in Neutron - > hopefully Nova remains almost completely ignorant of what the network setup > is, since the only thing we have to do is pass on PCI requirements, and we > already have a convenient call flow we can use that's there for the network > existence check. > -- > Ian. >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev