2018-03-19 0:34 GMT+08:00 Nadathur, Sundar <sundar.nadat...@intel.com>:
> Sorry for the delayed response. I broadly agree with previous replies. > For the concerns about the impact of Cyborg weigher on scheduling > performance , there are some options (apart from filtering candidates as > much as possible in Placement): > * Handle hosts in bulk by extending BaseWeigher > <https://github.com/openstack/nova/blob/master/nova/weights.py#L67> and > overriding weigh_objects > <https://github.com/openstack/nova/blob/master/nova/weights.py#L92>(), > instead of handling one host at a time. > Still an external REST call, I guess people still doesn't like that. > * If we have to handle one host at a time for whatever reason, since the > weigher is maintained by Cyborg, it could directly query Cyborg DB rather > than go through Cyborg REST API. This will be not unlike other weighers. > That means when the cyborg DB schema changed, we have to restart the nova-scheduler to update the weigher also. We couple the two service upgrade together. > Given these and other possible optimizations, it may be too soon to worry > about the performance impact. > yea, maybe. What about the preferred traits? > > I am working on a spec that will capture the flow discussed in the PTG. I > will try to address these aspects as well. > > Thanks & Regards, > Sundar > > > On 3/8/2018 4:53 AM, Zhipeng Huang wrote: > > @jay I'm also against a weigher in nova/placement. This should be an > optional step depends on vendor implementation, not a default one. > > @Alex I think we should explore the idea of preferred trait. > > @Mathew: Like Sean said, Cyborg wants to support both reprogrammable FPGA > and pre-programed ones. > Therefore it is correct that in your description, the programming > operation should be a call from Nova to Cyborg, and cyborg will complete > the operation while nova waits. The only problem is that the weigher step > should be an optional one. > > > On Wed, Mar 7, 2018 at 9:21 PM, Jay Pipes <jaypi...@gmail.com> wrote: > >> On 03/06/2018 09:36 PM, Alex Xu wrote: >> >>> 2018-03-07 10:21 GMT+08:00 Alex Xu <sou...@gmail.com <mailto: >>> sou...@gmail.com>>: >>> >>> >>> >>> 2018-03-06 22:45 GMT+08:00 Mooney, Sean K <sean.k.moo...@intel.com >>> <mailto:sean.k.moo...@intel.com>>: >>> >>> __ __ >>> >>> __ __ >>> >>> *From:*Matthew Booth [mailto:mbo...@redhat.com >>> <mailto:mbo...@redhat.com>] >>> *Sent:* Saturday, March 3, 2018 4:15 PM >>> *To:* OpenStack Development Mailing List (not for usage >>> questions) <openstack-dev@lists.openstack.org >>> <mailto:openstack-dev@lists.openstack.org>> >>> *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple >>> functions____ >>> >>> __ __ >>> >>> On 2 March 2018 at 14:31, Jay Pipes <jaypi...@gmail.com >>> <mailto:jaypi...@gmail.com>> wrote:____ >>> >>> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:____ >>> >>> Hello Nova team, >>> >>> During the Cyborg discussion at Rocky PTG, we >>> proposed a flow for FPGAs wherein the request spec asks >>> for a device type as a resource class, and optionally a >>> function (such as encryption) in the extra specs. This >>> does not seem to work well for the usage model that I’ll >>> describe below. >>> >>> An FPGA device may implement more than one function. For >>> example, it may implement both compression and >>> encryption. Say a cluster has 10 devices of device type >>> X, and each of them is programmed to offer 2 instances >>> of function A and 4 instances of function B. More >>> specifically, the device may implement 6 PCI functions, >>> with 2 of them tied to function A, and the other 4 tied >>> to function B. So, we could have 6 separate instances >>> accessing functions on the same device.____ >>> >>> __ __ >>> >>> Does this imply that Cyborg can't reprogram the FPGA at all?____ >>> >>> */[Mooney, Sean K] cyborg is intended to support fixed function >>> acclerators also so it will not always be able to program the >>> accelerator. In this case where an fpga is preprogramed with a >>> multi function bitstream that is statically provisioned cyborge >>> will not be able to reprogram the slot if any of the fuctions >>> from that slot are already allocated to an instance. In this >>> case it will have to treat it like a fixed function device and >>> simply allocate a unused vf of the corret type if available. >>> ____/* >>> >>> >>> ____ >>> >>> >>> In the current flow, the device type X is modeled as a >>> resource class, so Placement will count how many of them >>> are in use. A flavor for ‘RC device-type-X + function A’ >>> will consume one instance of the RC device-type-X. But >>> this is not right because this precludes other functions >>> on the same device instance from getting used. >>> >>> One way to solve this is to declare functions A and B as >>> resource classes themselves and have the flavor request >>> the function RC. Placement will then correctly count the >>> function instances. However, there is still a problem: >>> if the requested function A is not available, Placement >>> will return an empty list of RPs, but we need some way >>> to reprogram some device to create an instance of >>> function A.____ >>> >>> >>> Clearly, nova is not going to be reprogramming devices with >>> an instance of a particular function. >>> >>> Cyborg might need to have a separate agent that listens to >>> the nova notifications queue and upon seeing an event that >>> indicates a failed build due to lack of resources, then >>> Cyborg can try and reprogram a device and then try >>> rebuilding the original request.____ >>> >>> __ __ >>> >>> It was my understanding from that discussion that we intend to >>> insert Cyborg into the spawn workflow for device configuration >>> in the same way that we currently insert resources provided by >>> Cinder and Neutron. So while Nova won't be reprogramming a >>> device, it will be calling out to Cyborg to reprogram a device, >>> and waiting while that happens.____ >>> >>> My understanding is (and I concede some areas are a little >>> hazy):____ >>> >>> * The flavors says device type X with function Y____ >>> >>> * Placement tells us everywhere with device type X____ >>> >>> * A weigher orders these by devices which already have an >>> available function Y (where is this metadata stored?)____ >>> >>> * Nova schedules to host Z____ >>> >>> * Nova host Z asks cyborg for a local function Y and blocks____ >>> >>> * Cyborg hopefully returns function Y which is already >>> available____ >>> >>> * If not, Cyborg reprograms a function Y, then returns it____ >>> >>> Can anybody correct me/fill in the gaps?____ >>> >>> */[Mooney, Sean K] that correlates closely to my recollection >>> also. As for the metadata I think the weigher may need to call >>> to cyborg to retrieve this as it will not be available in the >>> host state object./* >>> >>> Is it the nova scheduler weigher or we want to support weigh on >>> placement? Function is traits as I think, so can we have >>> preferred_traits? I remember we talk about that parameter in the >>> past, but we don't have good use-case at that time. This is good >>> use-case. >>> >>> >>> If we call the Cyborg from the nova scheduler weigher, that will slow >>> down the scheduling a lot also. >>> >> >> Right, which is why I don't want to do any weighing in Placement at all. >> If folks want to sort by things that require long-running code/callbacks or >> silly temporal things like metrics, they can do that in a custom weigher in >> the nova-scheduler and take the performance hit there. >> >> Best, >> -jay >> >> >> ____________________________________________________________ >> ______________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib >> e >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > > -- > Zhipeng (Howard) Huang > > Standard Engineer > IT Standard & Patent/IT Product Line > Huawei Technologies Co,. Ltd > Email: huangzhip...@huawei.com > Office: Huawei Industrial Base, Longgang, Shenzhen > > (Previous) > Research Assistant > Mobile Ad-Hoc Network Lab, Calit2 > University of California, Irvine > Email: zhipe...@uci.edu > Office: Calit2 Building Room 2402 > > OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribehttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev