Thanks Vish, that makes it clearer.    I guess the validation can be handled by 
which ever manager picks up the call rather than having to be validates on the 
manager of a specific host (assuming multi-host of course), which should mean 
it's still reasonably responsive.

Just looking through the code it looks to me that there a few things that might 
still need clearing up to make this separation work.  For example:

_add_flaoting_ip calls to the compute.api (makes sense now) - which could do 
whatever validation makes sense at the instance level and then passes on the 
network.api.  But _remove_floating_ip calls direct to network_api, so even if 
the Instance wanted to do some validation it can't.   Shouldn't both pass 
through compute.api in this new model ?

There are also a few other casts left in the Network API layer:
release_floating_ip
deallocate_for_instance
add_fixed_ip_to_instance
remove_fixed_ip_from_instance
add_network_to_project

If the network manager if now the only thing that can perform validation 
shouldn't all of these be turned into calls as well ?

Cheers,
Phil

From: Vishvananda Ishaya [mailto:vishvana...@gmail.com]
Sent: 28 March 2012 23:26
To: Day, Phil
Cc: openstack@lists.launchpad.net (openstack@lists.launchpad.net) 
(openstack@lists.launchpad.net)
Subject: Re: [Openstack] Validation of floating IP opertaions in Essex codebase 
?


On Mar 28, 2012, at 10:04 AM, Day, Phil wrote:


Hi Folks,

At the risk of looking lazy in my first question by following up with a second:

So I tracked this down in the code and can see that the validation has moved 
into network/manager.py, and what was a validation/cast in network/api.py has 
been replaced with a call - but that seems to make the system more tightly 
coupled across components (i.e. if my there is a problem getting the message to 
the Network Manager then even an invalid request will be blocked until the call 
returns or times out).

This is a side effect of trying to decouple compute and network, see the 
explanation below.



It also looks as if the validation for disassociate_floating_ip has also been 
moved to the manager, but this is still a cast from the api layer - so those 
error messages never get back to the user.

Good point.  This probably needs to be a call with the current model.



Coming from Diablo it all feels kind of odd to me - I thought we were trying to 
validate what we could of requests in the API server and return immediate 
errors at that stage and then cast into the system (so that only internal 
errors can stop something from working at this stage).     Was there a 
deliberate design policy around this at some stage ?

There are a few things going on here.

First we have spent a lot of time decoupling network and compute.  Ultimately 
network will be an external service, so we can't depend on having access to the 
network database on the compute api side. We can do a some checks in 
compute_api to make sure that it isn't attached to another instance that we 
know about, but ultimately the network service has to be responsible for saying 
what can happen with the ip address.

So the second part is about why it is happening in network_manager vs 
network_api.  This is a side-effect of the decision to plug in 
quantum/melange/etc. at the manager layer instead of the api layer.  The api 
layer is therefore being very dumb, just passing requests on to the manager.

So that explains where we are.  Here is the plan (as I understand) for the 
future:

a) move the quantum plugin to the api layer
(At this point we could move validation into the api if necessary.)

b) define a more complete network api which includes all of the necessary 
features that are currently compute extensions

c) make a client to talk to the api

d) make compute talk through the client to the api instead of using rabbit 
messages
(this decouples network completely, allowing us to deploy and run network as a 
completely separate service if need be.  At this point the quantum-api-plugin 
could be part of quantum or a new shared NaaS project.  More to decide at the 
summit here)

In general, we are hoping to switch to quantum as the default by Folsom, and 
not have to touch the legacy network code very much.  If there are serious 
performance issues we could make some optimizations by doing checks in 
network-api, but these will quickly become moot if we are moving towards using 
a client and talking through a rest interface.

So Looks like the following could be done in the meantime:

a) switch disassociate from a cast to a call -> i would consider this one a a 
bug and would appreciate someone verifying that it fails and reporting it

b) add some validation in compute api -> I'm not sure what we can assert here.  
Perhaps we could use the network_info cache and check for duplicates etc.

c) if we have serious performance issues, we could add another layer of checks 
in the compute_api, but we may have to make sure that we make sure it is 
ignored for quantum.
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to