Re: [openstack-dev] [neutron-dev] [neutron] Generalized issues in the unit testing of ML2 mechanism drivers

2017-12-18 Thread Mike Kolesnik
On Wed, Dec 13, 2017 at 2:30 PM, Michel Peterson  wrote:

> Through my work in networking-odl I've found what I believe is an issue
> present in a majority of ML2 drivers. An issue I think needs awareness so
> each project can decide a course of action.
>
> The issue stems from the adopted practice of importing
> `neutron.tests.unit.plugins.ml2.test_plugin` and creating classes with
> noop operation to "inherit" tests for free [1]. The idea behind is nice,
> you inherit >600 tests that cover several scenarios.
>
> There are several issues of adopting this pattern, two of which are
> paramount:
>
> 1. If the mechanism driver is not loaded correctly [2], the tests then
> don't test the mechanism driver but still succeed and therefore there is no
> indication that there is something wrong with the code. In the case of
> networking-odl it wasn't discovered until last week, which means that for
> >1 year it this was adding PASSed tests uselessly.
>
> 2. It gives a false sense of reassurance. If the code of those tests is
> analyzed it's possible to see that the code itself is mostly centered
> around testing the REST endpoint of neutron than actually testing that the
> mechanism succeeds on the operation it was supposed to test. As a result of
> this, there is marginally added value on having those tests. To be clear,
> the hooks for the respective operations are called on the mechanism driver,
> but the result of the operation is not asserted.
>
> I would love to hear more voices around this, so feel free to comment.
>

​i talked to a few guys from networking-ovn which are now processing this
info so they could chime in, but from what I've understood the issue wasn't
given much thought in networking-ovn (and I suspect other mechanism
drivers).
​

>
> Regarding networking-odl the solution I propose is the following:
>   **First**, discard completely the change mentioned in the footnote #2.
>   **Second**, create a patch that completely removes the tests that follow
> this pattern.
>   **Third**, incorporate the neutron tempest plugin into the CI and rely
> on that for assuring coverage of the different scenarios.
>

​This sounds like a good plan to me.
​

>
> Also to mention that when discovered this issue in networking-odl we took
> a decision not to merge more patches until the PS of footnote #2 was
> addressed. I think we can now decide to overrule that decision and proceed
> as usual.
>

​Agreed.
​

>
>
>
> [1]: http://codesearch.openstack.org/?q=class%20.*\(.*TestMl2
> 
> [2]: something that was happening in networking-odl and addressed by
> https://review.openstack.org/#/c/523934
>
> ___
> neutron-dev mailing list
> neutron-...@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/neutron-dev
>
>


-- 
Regards,
Mike
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][qos][ml2] extensions swallow exceptions

2015-08-04 Thread Mike Kolesnik
Don't know why subject wasn't set automatically..

On Tue, Aug 4, 2015 at 3:30 PM, Mike Kolesnik  wrote:

>
>
> On Tue, Aug 4, 2015 at 1:02 PM, Ihar Hrachyshka 
> wrote:
>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> Hi all,
>>
>> in feature/qos, we use ml2 extension drivers to handle additional
>> qos_policy_id field that can be provided thru API:
>>
>> http://git.openstack.org/cgit/openstack/neutron/tree/neutron/plugins/ml2
>> /extensions/qos.py?h=feature/qos
>> <http://git.openstack.org/cgit/openstack/neutron/tree/neutron/plugins/ml2/extensions/qos.py?h=feature/qos>
>>
>> What we do in qos extension is we create a database 'binding' object
>> between the updated port and the QoS policy that corresponds to
>> qos_policy_id. So we access the database. It means there may be some
>> complications there, f.e. the policy object is not available for the
>> tenant, or just does not exist. In that case, we raise an exception
>> from the extension, assuming that ml2 will propagate it to the user in
>> some form.
>>
>
> ​First of all maybe we should be asking this on the u/s mailing list to
> get a broader view?
>

Don't mind this, I must be drunk..​


​
>
>
>>
>> But it does not work. This is because _call_on_ext_drivers swallows
>> exceptions:
>>
>> http://git.openstack.org/cgit/openstack/neutron/tree/neutron/plugins/ml2
>> /managers.py#n766
>> <http://git.openstack.org/cgit/openstack/neutron/tree/neutron/plugins/ml2/managers.py#n766>
>>
>> It makes me ask some questions:
>>
>> - - first, do we use extensions as was expected? Can we extend
>> extensions to cover our use case?
>>
>
> ​I think we are, they mostly fit the case but as everything in Neutron
> it's unripe.
> However from my experience this was the ripest option available to us..
> ​
>
>
>>
>> - - second, what would be the right way to go assuming we want to
>> support the case? Should we just reraise? Or maybe postpone till all
>> extension drivers are called, and then propagate an exception top into
>> the stack? (Probably some extension manager specific exception?) Or
>> maybe we want extensions to claim whether they may raise, and handle
>> them accordingly?
>>
>
> ​I was thinking in order not to alter existing extension behaviours that
> we can define in the ML2 extension driver scope a special exception type
> (sort of exception container), and if an exception of this type is raised
> ​then we should re-raise it.
> I'm not sure there's much value to aggregating the exceptions right off
> the bat and this can be done later on.
>
>
>
>>
>> - - alternatively, if we abuse the API and should stop doing it, which
>> other options do we have to achieve similar behaviour without relying
>> on ml2 extensions AND without polluting ml2 driver with qos specific cod
>> e?
>>
>> Thanks for your answers,
>> Ihar
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v2
>>
>> iQEcBAEBCAAGBQJVwI29AAoJEC5aWaUY1u57yLYH/jhYmu4aR+ewZwSzDYXMcfdz
>> tD5BSYKD/YmDMIAYprmVCqOlk1jaioesFPMUOrsycpacZZWjg5tDSrpJ2Iz5/ZPw
>> BYLIPGaYF3Pu87LHrUKhIz4f2TfSWve/7GBCZ6AK6zVqCXky8A9MRfWrf774a8oF
>> kexP7qQVbyrOcXxZANDa1bJuLDsb4TiTcuuDizPtuUWlMfzmtZeauyieji/g1smq
>> HBO5h7zUFQ87YvBqq7ed2KhlRENxo26aSrpxTFkyyxJU9xH1J8q9W1gWO7Tw1uCV
>> psaijDmlxU/KySR97Ro8m5teu+7Pcb2cg/s57WaHWuAvPNW1CmfYc/XDn2I9KlI=
>> =Fo++
>> -END PGP SIGNATURE-
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
> --
> Regards,
> Mike
>



-- 
Regards,
Mike
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] (no subject)

2015-08-04 Thread Mike Kolesnik
On Tue, Aug 4, 2015 at 1:02 PM, Ihar Hrachyshka  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Hi all,
>
> in feature/qos, we use ml2 extension drivers to handle additional
> qos_policy_id field that can be provided thru API:
>
> http://git.openstack.org/cgit/openstack/neutron/tree/neutron/plugins/ml2
> /extensions/qos.py?h=feature/qos
>
> What we do in qos extension is we create a database 'binding' object
> between the updated port and the QoS policy that corresponds to
> qos_policy_id. So we access the database. It means there may be some
> complications there, f.e. the policy object is not available for the
> tenant, or just does not exist. In that case, we raise an exception
> from the extension, assuming that ml2 will propagate it to the user in
> some form.
>

​First of all maybe we should be asking this on the u/s mailing list to get
a broader view?
​


>
> But it does not work. This is because _call_on_ext_drivers swallows
> exceptions:
>
> http://git.openstack.org/cgit/openstack/neutron/tree/neutron/plugins/ml2
> /managers.py#n766
>
> It makes me ask some questions:
>
> - - first, do we use extensions as was expected? Can we extend
> extensions to cover our use case?
>

​I think we are, they mostly fit the case but as everything in Neutron it's
unripe.
However from my experience this was the ripest option available to us..
​


>
> - - second, what would be the right way to go assuming we want to
> support the case? Should we just reraise? Or maybe postpone till all
> extension drivers are called, and then propagate an exception top into
> the stack? (Probably some extension manager specific exception?) Or
> maybe we want extensions to claim whether they may raise, and handle
> them accordingly?
>

​I was thinking in order not to alter existing extension behaviours that we
can define in the ML2 extension driver scope a special exception type (sort
of exception container), and if an exception of this type is raised ​then
we should re-raise it.
I'm not sure there's much value to aggregating the exceptions right off the
bat and this can be done later on.



>
> - - alternatively, if we abuse the API and should stop doing it, which
> other options do we have to achieve similar behaviour without relying
> on ml2 extensions AND without polluting ml2 driver with qos specific cod
> e?
>
> Thanks for your answers,
> Ihar
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2
>
> iQEcBAEBCAAGBQJVwI29AAoJEC5aWaUY1u57yLYH/jhYmu4aR+ewZwSzDYXMcfdz
> tD5BSYKD/YmDMIAYprmVCqOlk1jaioesFPMUOrsycpacZZWjg5tDSrpJ2Iz5/ZPw
> BYLIPGaYF3Pu87LHrUKhIz4f2TfSWve/7GBCZ6AK6zVqCXky8A9MRfWrf774a8oF
> kexP7qQVbyrOcXxZANDa1bJuLDsb4TiTcuuDizPtuUWlMfzmtZeauyieji/g1smq
> HBO5h7zUFQ87YvBqq7ed2KhlRENxo26aSrpxTFkyyxJU9xH1J8q9W1gWO7Tw1uCV
> psaijDmlxU/KySR97Ro8m5teu+7Pcb2cg/s57WaHWuAvPNW1CmfYc/XDn2I9KlI=
> =Fo++
> -END PGP SIGNATURE-
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Regards,
Mike
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Should we document the using of "device:owner" of the PORT ?

2015-07-15 Thread Mike Kolesnik
- Original Message -

> Yes please.

> This would be a good starting point.
> I also think that the ability of editing it, as well as the value it could be
> set to, should be constrained.

FYI the oVirt project uses this field to identify ports it creates and manages. 
So if you're going to constrain it to something, it should probably be 
configurable so that managers other than Nova can continue to use Neutron. 

> As you have surely noticed, there are several code path which rely on an
> appropriate value being set in this attribute.
> This means a user can potentially trigger malfunctioning by sending PUT
> requests to edit this attribute.

> Summarizing, I think that document its usage is a good starting point, but I
> believe we should address the way this attribute is exposed at the API layer
> as well.

> Salvatore

> On 13 July 2015 at 11:52, Wang, Yalei < yalei.w...@intel.com > wrote:

> > Hi all,
> 
> > The device:owner the port is defined as a 255 byte string, and is widely
> > used
> > now, indicating the use of the port.
> 
> > Seems we can fill it freely, and user also could update/set it from cmd
> > line(port-update $PORT_ID --device_owner), and I don’t find the guideline
> > for using.
> 
> > What is its function? For indicating the using of the port, and seems
> > horizon
> > also use it to show the topology.
> 
> > And nova really need it editable, should we at least document all of the
> > possible values into some guide to make it clear? If yes, I can do it.
> 
> > I got these using from the code(maybe not complete, pls point it out):
> 
> > From constants.py,
> 
> > DEVICE_OWNER_ROUTER_HA_INTF = "network:router_ha_interface"
> 
> > DEVICE_OWNER_ROUTER_INTF = "network:router_interface"
> 
> > DEVICE_OWNER_ROUTER_GW = "network:router_gateway"
> 
> > DEVICE_OWNER_FLOATINGIP = "network:floatingip"
> 
> > DEVICE_OWNER_DHCP = "network:dhcp"
> 
> > DEVICE_OWNER_DVR_INTERFACE = "network:router_interface_distributed"
> 
> > DEVICE_OWNER_AGENT_GW = "network:floatingip_agent_gateway"
> 
> > DEVICE_OWNER_ROUTER_SNAT = "network:router_centralized_snat"
> 
> > DEVICE_OWNER_LOADBALANCER = "neutron:LOADBALANCER"
> 
> > And from debug_agent.py
> 
> > DEVICE_OWNER_NETWORK_PROBE = 'network:probe'
> 
> > DEVICE_OWNER_COMPUTE_PROBE = 'compute:probe'
> 
> > And setting from nova/network/neutronv2/api.py,
> 
> > 'compute:%s' % instance.availability_zone
> 
> > Thanks all!
> 
> > /Yalei
> 

> > __
> 
> > OpenStack Development Mailing List (not for usage questions)
> 
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> 
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [neutron] Adding results to extension callbacks

2015-07-13 Thread Mike Kolesnik
Hi, 

I sent a simple patch to check the possibility to add results to callbacks: 
https://review.openstack.org/#/c/201127/ 

This will allow us to decouple the callback logic from the ML2 plugin in the 
QoS scenario where we need to update the agents in case the profile_id on a 
port/network changes. 
It will also allow for a cleaner way to extend resource attributes as 
AFTER_READ callbacks can return a dict of fields to add to the original 
resource instead of mutating it directly. 

Please let me know what you think of this idea. 

Regards, 
Mike 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Why need br-int and br-tun in openstack neutron

2015-05-24 Thread Mike Kolesnik
- Original Message -
> Comments in-line.
> 
> - Original Message -
> > On 23 May 2015 at 04:43, Assaf Muller < amul...@redhat.com > wrote:
> > 
> > 
> > 
> > There's no real reason as far as I'm aware, just an implementation
> > decision.
> > 
> > This is inaccurate. There is a reason(s), and this has been asked before:
> > 
> > http://lists.openstack.org/pipermail/openstack/2014-March/005950.html
> 
> This link is to a thread asking why do we connect a Linux bridge between a
> tap
> device and br-int (For security groups).
> 
> > http://lists.openstack.org/pipermail/openstack/2014-April/006865.html
> 
> This link is to this thread itself.

No it's from another author but just the same text (almost exactly).
i.e. https://www.diffchecker.com/xl98zm9a

Either it's the same poster or some freak coincidence, or just some copy paste..

Also Vivek gave the correct answer on that thread:
http://lists.openstack.org/pipermail/openstack/2014-April/006868.html

In a nutshell, decoupling the overlay layer from the VM connectivity.
VMs are always connected to the br-int the same way, but the overlay
(vxlan/gre or vlans) is connected differently.

> 
> > 
> > In a nutshell, the design decision that led to the existing architecture is
> > due to the way OVS handles packets and interact with netfilter.
> 
> I think you're talking about the bridge between a tap device and br-int and
> not about br-tun.
> 
> > 
> > The fact that we keep asking the same question clearly shows lack of
> > documentation, both developer and user facing.
> > 
> > I'll get this fixed once and for all.
> 
> Thank you.
> 
> > 
> > Thanks,
> > Armando
> > 
> > 
> > 
> > 
> > 
> > 
> > On 21 במאי 2015, at 01:48, Na Zhu < na...@cn.ibm.com > wrote:
> > 
> > 
> > 
> > 
> > 
> > 
> > Dear,
> > 
> > 
> > When OVS plugin is used with GRE option in Neutron, I see that each compute
> > node has br-tun and br-int bridges created.
> > 
> > I'm trying to understand why we need the additional br-tun bridge here.
> > Can't we create tunneling ports in br-int bridge, and have br-int relay
> > traffic between VM ports and tunneling ports directly? Why do we have to
> > introduce another br-tun bridge?
> > 
> > 
> > Regards,
> > Juno Zhu
> > Staff Software Engineer, System Networking
> > China Systems and Technology Lab (CSTL), IBM Wuxi
> > Email: na...@cn.ibm.com
> > 
> > 
> > 
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org ?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> > 
> > 
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Request for comments for a possible solution

2015-01-14 Thread Mike Kolesnik
Hi Mathieu, 

Please see comments inline. 

Regards, 
Mike 

- Original Message -

> Hi Mike,

> after reviewing your latest patch [1], I think that a possible solution could
> be to add a new entry in fdb RPC message.
> This entry would specify whether the port is multi-bound or not.
> The new fdb message would look like this :
> {net_id:
> {port:
> {agent_ip:
> {mac, ip, multi-bound }
> }
> }
> network_type:
> vxlan,
> segment_id:
> id
> }

> When the multi-bound option would be set, the ARP responder would be
> provisioned but the underlying module (ovs or kernel vxlan) would be
> provisioned to flood the packet to every tunnel concerned by this overlay
> segment, and not only the tunnel to agent that is supposed to host the port.
> In the LB world, this means not adding fdb entry for the MAC of the
> multi-bound port, whereas in the OVS world, it means not adding a flow that
> send the trafic that matches the MAC of the multi-bound port to only one
> tunnel port, but to every tunnel port of this overlay segment.

So let me see if I understand what you suggest correctly.. 

You suggest that instead of not sending the FDB we do send it along with an 
optional third parameter? 

Mind you that FDBs are sent as a list so for example an l2pop message would 
look like: 
{ 
'61a00edd-018e-4923-9524-df91b3f3083b': { 
'ports': { 
'30.0.0.2': [ 
[ 
'00:00:00:00:00:00', 
'0.0.0.0' 
], 
[ 
'00:00:00:00:12:34', 
'10.0.0.1' 
] 
], 
'30.0.0.1': [ 
[ 
'00:00:00:00:00:00', 
'0.0.0.0' 
], 
[ 
'00:00:00:00:56:78', 
'10.1.1.1' 
] 
] 
}, 
'network_type': u'vxlan', 
'segment_id': 1 
} 
} 

So the parameter you suggest to add will be at index 2 of each FDB list? 

I'm not sure it will be optional then, otherwise it could be quite hard to 
decode these messages.. 

Also, you suggest that each agent will know what to do according to this 
parameter? 

> This way, traffic to multi-bound port will behave as unknown unicast traffic.
> First packet will be flood to every tunnel and local bridge will learn the
> correct tunnel for the following packets based on which tunnel received the
> answer.
> Once learning occurs with first ingress packet, following packets would be
> sent to the correct tunnel and not flooded anymore.

IIUC then we still need to send all nodes where the HA port is scheduled on, 
this just adds on top of it and moves out the decision regarding the FDB to the 
agent level. 

The FDB is then only needed for populating the ARP responder? 

> I've tested this with linuxbridge and it works fine. Based on code overview,
> this should work correctly with OVS too. I'll test it ASAP.

> I know that DVR team already add such a flag in RPC messages, but they revert
> it in later patches. I would be very interested in having their opinion on
> this proposal.
> It seems that DVR port could also use this flag. This would result in having
> ARP responder activated for DVR port too.

> This shouldn't need a bump in RPC versioning since this flag would be
> optionnal. So their shouldn't have any issue with backward compatibility.

I'm not sure if it's backwards compatible since you're actually changing the 
construct of the RPC message so it's a bit unexpected how the old agents will 
react. 
It's not adding a new key-value, it's modifying each fdb's list.. 

> Regards,

> Mathieu

> [1] https://review.openstack.org/#/c/141114/2

> On Sun, Dec 21, 2014 at 12:14 PM, Narasimhan, Vivekanandan <
> vivekanandan.narasim...@hp.com > wrote:

> > Hi Mike,
> 

> > Just one comment [Vivek]
> 

> > -Original Message-
> 
> > From: Mike Kolesnik [mailto: mkole...@redhat.com ]
> 
> > Sent: Sunday, December 21, 2014 11:17 AM
> 
> > To: OpenStack Development Mailing List (not for usage questions)
> 
> > Cc: Robert Kukura
> 
> > Subject: Re: [openstack-dev] [Neutron][L2Pop][HA Routers] Request for
> > comments for a possible solution
> 

> > Hi Mathieu,
> 

> > Comments inline
> 

> > Regards,
> 
> > Mike
> 

> > - Original Message -
> 
> > > Mike,
> 
> > >
> 
> > > I'm not even sure that your solution works without being able to bind
> 
> > > a router HA port to several hosts.
> 
> > > What's happening currently is that you :
> 
> > >
> 
> > > 1.create the router on two l3agent.
> 
> > > 2. those l3agent trigger the sync_router() on the l3plugin.
> 
> > > 3. l3plugin.sync_routers() will trigger
> > > l2plugin.update_port(host=l3agent)

Re: [openstack-dev] Request for comments for a possible solution

2014-12-20 Thread Mike Kolesnik
Hi Vivek,

Replies inline.

Regards,
Mike

- Original Message -
> Hi Mike,
> 
> Few clarifications inline [Vivek]
> 
> -Original Message-----
> From: Mike Kolesnik [mailto:mkole...@redhat.com]
> Sent: Thursday, December 18, 2014 10:58 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Neutron][L2Pop][HA Routers] Request for
> comments for a possible solution
> 
> Hi Mathieu,
> 
> Thanks for the quick reply, some comments inline..
> 
> Regards,
> Mike
> 
> - Original Message -
> > Hi mike,
> >
> > thanks for working on this bug :
> >
> > On Thu, Dec 18, 2014 at 1:47 PM, Gary Kotton  wrote:
> > >
> > >
> > > On 12/18/14, 2:06 PM, "Mike Kolesnik"  wrote:
> > >
> > >>Hi Neutron community members.
> > >>
> > >>I wanted to query the community about a proposal of how to fix HA
> > >>routers not working with L2Population (bug 1365476[1]).
> > >>This bug is important to fix especially if we want to have HA
> > >>routers and DVR routers working together.
> > >>
> > >>[1] https://bugs.launchpad.net/neutron/+bug/1365476
> > >>
> > >>What's happening now?
> > >>* HA routers use distributed ports, i.e. the port with the same IP &
> > >>MAC
> > >>  details is applied on all nodes where an L3 agent is hosting this
> > >>router.
> > >>* Currently, the port details have a binding pointing to an
> > >>arbitrary node
> > >>  and this is not updated.
> > >>* L2pop takes this "potentially stale" information and uses it to create:
> > >>  1. A tunnel to the node.
> > >>  2. An FDB entry that directs traffic for that port to that node.
> > >>  3. If ARP responder is on, ARP requests will not traverse the network.
> > >>* Problem is, the master router wouldn't necessarily be running on
> > >>the
> > >>  reported agent.
> > >>  This means that traffic would not reach the master node but some
> > >>arbitrary
> > >>  node where the router master might be running, but might be in
> > >>another
> > >>  state (standby, fail).
> > >>
> > >>What is proposed?
> > >>Basically the idea is not to do L2Pop for HA router ports that
> > >>reside on the tenant network.
> > >>Instead, we would create a tunnel to each node hosting the HA router
> > >>so that the normal learning switch functionality would take care of
> > >>switching the traffic to the master router.
> > >
> > > In Neutron we just ensure that the MAC address is unique per network.
> > > Could a duplicate MAC address cause problems here?
> >
> > gary, AFAIU, from a Neutron POV, there is only one port, which is the
> > router Port, which is plugged twice. One time per port.
> > I think that the capacity to bind a port to several host is also a
> > prerequisite for a clean solution here. This will be provided by
> > patches to this bug :
> > https://bugs.launchpad.net/neutron/+bug/1367391
> >
> >
> > >>This way no matter where the master router is currently running, the
> > >>data plane would know how to forward traffic to it.
> > >>This solution requires changes on the controller only.
> > >>
> > >>What's to gain?
> > >>* Data plane only solution, independent of the control plane.
> > >>* Lowest failover time (same as HA routers today).
> > >>* High backport potential:
> > >>  * No APIs changed/added.
> > >>  * No configuration changes.
> > >>  * No DB changes.
> > >>  * Changes localized to a single file and limited in scope.
> > >>
> > >>What's the alternative?
> > >>An alternative solution would be to have the controller update the
> > >>port binding on the single port so that the plain old L2Pop happens
> > >>and notifies about the location of the master router.
> > >>This basically negates all the benefits of the proposed solution,
> > >>but is wider.
> > >>This solution depends on the report-ha-router-master spec which is
> > >>currently in the implementation phase.
> > >>
> > >>It's important to note that these two solutions don't collide and
> > >>could be done independently. The one I'm proposing just makes more
> > >>sense f

Re: [openstack-dev] [Neutron][L2Pop][HA Routers] Request for comments for a possible solution

2014-12-20 Thread Mike Kolesnik
Hi Mathieu,

Comments inline

Regards,
Mike

- Original Message -
> Mike,
> 
> I'm not even sure that your solution works without being able to bind
> a router HA port to several hosts.
> What's happening currently is that you :
> 
> 1.create the router on two l3agent.
> 2. those l3agent trigger the sync_router() on the l3plugin.
> 3. l3plugin.sync_routers() will trigger l2plugin.update_port(host=l3agent).
> 4. ML2 will bind the port to the host mentioned in the last update_port().
> 
> From a l2pop perspective, this will result in creating only one tunnel
> to the host lastly specified.
> I can't find any code that forces that only the master router binds
> its router port. So we don't even know if the host which binds the
> router port is hosting the master router or the slave one, and so if
> l2pop is creating the tunnel to the master or to the slave.
> 
> Can you confirm that the above sequence is correct? or am I missing
> something?

Are you referring to the alternative solution?

In that case it seems that you're correct so that there would need to be
awareness of the master router at some level there as well.
I can't say for sure as I've been thinking on the proposed solution with
no FDBs so there would be some issues with the alternative that need to
be ironed out.

> 
> Without the capacity to bind a port to several hosts, l2pop won't be
> able to create tunnel correctly, that's the reason why I was saying
> that a prerequisite for a smart solution would be to first fix the bug
> :
> https://bugs.launchpad.net/neutron/+bug/1367391
> 
> DVR Had the same issue. Their workaround was to create a new
> port_binding tables, that manages the capacity for one DVR port to be
> bound to several host.
> As mentioned in the bug 1367391, this adding a technical debt in ML2,
> which has to be tackle down in priority from my POV.

I agree that this would simplify work but even without this bug fixed we
can achieve either solution.

We have already knowledge of the agents hosting a router so this is
completely doable without waiting for fix for bug 1367391.

Also from my understanding the bug 1367391 is targeted at DVR only, not
at HA router ports.

> 
> 
> On Thu, Dec 18, 2014 at 6:28 PM, Mike Kolesnik  wrote:
> > Hi Mathieu,
> >
> > Thanks for the quick reply, some comments inline..
> >
> > Regards,
> > Mike
> >
> > - Original Message -
> >> Hi mike,
> >>
> >> thanks for working on this bug :
> >>
> >> On Thu, Dec 18, 2014 at 1:47 PM, Gary Kotton  wrote:
> >> >
> >> >
> >> > On 12/18/14, 2:06 PM, "Mike Kolesnik"  wrote:
> >> >
> >> >>Hi Neutron community members.
> >> >>
> >> >>I wanted to query the community about a proposal of how to fix HA
> >> >>routers
> >> >>not
> >> >>working with L2Population (bug 1365476[1]).
> >> >>This bug is important to fix especially if we want to have HA routers
> >> >>and
> >> >>DVR
> >> >>routers working together.
> >> >>
> >> >>[1] https://bugs.launchpad.net/neutron/+bug/1365476
> >> >>
> >> >>What's happening now?
> >> >>* HA routers use distributed ports, i.e. the port with the same IP & MAC
> >> >>  details is applied on all nodes where an L3 agent is hosting this
> >> >>router.
> >> >>* Currently, the port details have a binding pointing to an arbitrary
> >> >>node
> >> >>  and this is not updated.
> >> >>* L2pop takes this "potentially stale" information and uses it to
> >> >>create:
> >> >>  1. A tunnel to the node.
> >> >>  2. An FDB entry that directs traffic for that port to that node.
> >> >>  3. If ARP responder is on, ARP requests will not traverse the network.
> >> >>* Problem is, the master router wouldn't necessarily be running on the
> >> >>  reported agent.
> >> >>  This means that traffic would not reach the master node but some
> >> >>arbitrary
> >> >>  node where the router master might be running, but might be in another
> >> >>  state (standby, fail).
> >> >>
> >> >>What is proposed?
> >> >>Basically the idea is not to do L2Pop for HA router ports that reside on
> >> >>the
> >> >>tenant network.
> >> >>Instead, we would create a tunnel to each node hosting the HA router so

Re: [openstack-dev] [Neutron][L2Pop][HA Routers] Request for comments for a possible solution

2014-12-18 Thread Mike Kolesnik
Hi Mathieu,

Thanks for the quick reply, some comments inline..

Regards,
Mike

- Original Message -
> Hi mike,
> 
> thanks for working on this bug :
> 
> On Thu, Dec 18, 2014 at 1:47 PM, Gary Kotton  wrote:
> >
> >
> > On 12/18/14, 2:06 PM, "Mike Kolesnik"  wrote:
> >
> >>Hi Neutron community members.
> >>
> >>I wanted to query the community about a proposal of how to fix HA routers
> >>not
> >>working with L2Population (bug 1365476[1]).
> >>This bug is important to fix especially if we want to have HA routers and
> >>DVR
> >>routers working together.
> >>
> >>[1] https://bugs.launchpad.net/neutron/+bug/1365476
> >>
> >>What's happening now?
> >>* HA routers use distributed ports, i.e. the port with the same IP & MAC
> >>  details is applied on all nodes where an L3 agent is hosting this
> >>router.
> >>* Currently, the port details have a binding pointing to an arbitrary node
> >>  and this is not updated.
> >>* L2pop takes this "potentially stale" information and uses it to create:
> >>  1. A tunnel to the node.
> >>  2. An FDB entry that directs traffic for that port to that node.
> >>  3. If ARP responder is on, ARP requests will not traverse the network.
> >>* Problem is, the master router wouldn't necessarily be running on the
> >>  reported agent.
> >>  This means that traffic would not reach the master node but some
> >>arbitrary
> >>  node where the router master might be running, but might be in another
> >>  state (standby, fail).
> >>
> >>What is proposed?
> >>Basically the idea is not to do L2Pop for HA router ports that reside on
> >>the
> >>tenant network.
> >>Instead, we would create a tunnel to each node hosting the HA router so
> >>that
> >>the normal learning switch functionality would take care of switching the
> >>traffic to the master router.
> >
> > In Neutron we just ensure that the MAC address is unique per network.
> > Could a duplicate MAC address cause problems here?
> 
> gary, AFAIU, from a Neutron POV, there is only one port, which is the
> router Port, which is plugged twice. One time per port.
> I think that the capacity to bind a port to several host is also a
> prerequisite for a clean solution here. This will be provided by
> patches to this bug :
> https://bugs.launchpad.net/neutron/+bug/1367391
> 
> 
> >>This way no matter where the master router is currently running, the data
> >>plane would know how to forward traffic to it.
> >>This solution requires changes on the controller only.
> >>
> >>What's to gain?
> >>* Data plane only solution, independent of the control plane.
> >>* Lowest failover time (same as HA routers today).
> >>* High backport potential:
> >>  * No APIs changed/added.
> >>  * No configuration changes.
> >>  * No DB changes.
> >>  * Changes localized to a single file and limited in scope.
> >>
> >>What's the alternative?
> >>An alternative solution would be to have the controller update the port
> >>binding
> >>on the single port so that the plain old L2Pop happens and notifies about
> >>the
> >>location of the master router.
> >>This basically negates all the benefits of the proposed solution, but is
> >>wider.
> >>This solution depends on the report-ha-router-master spec which is
> >>currently in
> >>the implementation phase.
> >>
> >>It's important to note that these two solutions don't collide and could
> >>be done
> >>independently. The one I'm proposing just makes more sense from an HA
> >>viewpoint
> >>because of it's benefits which fit the HA methodology of being fast &
> >>having as
> >>little outside dependency as possible.
> >>It could be done as an initial solution which solves the bug for mechanism
> >>drivers that support normal learning switch (OVS), and later kept as an
> >>optimization to the more general, controller based, solution which will
> >>solve
> >>the issue for any mechanism driver working with L2Pop (Linux Bridge,
> >>possibly
> >>others).
> >>
> >>Would love to hear your thoughts on the subject.
> 
> You will have to clearly update the doc to mention that deployment
> with Linuxbridge+l2pop are not compatible with HA.

Yes this should be added and this is already the situati

[openstack-dev] [Neutron][L2Pop][HA Routers] Request for comments for a possible solution

2014-12-18 Thread Mike Kolesnik
Hi Neutron community members.

I wanted to query the community about a proposal of how to fix HA routers not 
working with L2Population (bug 1365476[1]).
This bug is important to fix especially if we want to have HA routers and DVR
routers working together.

[1] https://bugs.launchpad.net/neutron/+bug/1365476

What's happening now?
* HA routers use distributed ports, i.e. the port with the same IP & MAC
  details is applied on all nodes where an L3 agent is hosting this router.
* Currently, the port details have a binding pointing to an arbitrary node
  and this is not updated.
* L2pop takes this "potentially stale" information and uses it to create: 
  1. A tunnel to the node.
  2. An FDB entry that directs traffic for that port to that node.
  3. If ARP responder is on, ARP requests will not traverse the network.
* Problem is, the master router wouldn't necessarily be running on the
  reported agent.
  This means that traffic would not reach the master node but some arbitrary
  node where the router master might be running, but might be in another
  state (standby, fail).

What is proposed?
Basically the idea is not to do L2Pop for HA router ports that reside on the
tenant network.
Instead, we would create a tunnel to each node hosting the HA router so that
the normal learning switch functionality would take care of switching the
traffic to the master router.
This way no matter where the master router is currently running, the data
plane would know how to forward traffic to it.
This solution requires changes on the controller only.

What's to gain?
* Data plane only solution, independent of the control plane.
* Lowest failover time (same as HA routers today).
* High backport potential:
  * No APIs changed/added.
  * No configuration changes.
  * No DB changes.
  * Changes localized to a single file and limited in scope.

What's the alternative?
An alternative solution would be to have the controller update the port binding
on the single port so that the plain old L2Pop happens and notifies about the
location of the master router.
This basically negates all the benefits of the proposed solution, but is wider.
This solution depends on the report-ha-router-master spec which is currently in
the implementation phase.

It's important to note that these two solutions don't collide and could be done
independently. The one I'm proposing just makes more sense from an HA viewpoint
because of it's benefits which fit the HA methodology of being fast & having as
little outside dependency as possible.
It could be done as an initial solution which solves the bug for mechanism
drivers that support normal learning switch (OVS), and later kept as an
optimization to the more general, controller based, solution which will solve
the issue for any mechanism driver working with L2Pop (Linux Bridge, possibly
others).

Would love to hear your thoughts on the subject.

Regards,
Mike

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Introducing 'wrapt' to taskflow breaks Jenkins builds on stable branches

2014-11-20 Thread Mike Kolesnik
Hi, 

Currently stable branch Jenkins builds are failing due to the error: 
Syncing /opt/stack/new/taskflow/requirements-py3.txt 
'wrapt' is not a global requirement but it should be,something went wrong 

It's my understanding that this is a side effect from your change in taskflow: 
https://review.openstack.org/#/c/129507/ 

This is currently blocking (amongst other things) a backport of a security fix: 
https://review.openstack.org/#/c/135624/ 

Joshua - Would you be so kind as to investigate this? 

Kind Regards, 
Mike 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Hide CI comments in Gerrit

2014-07-24 Thread Mike Kolesnik
Great script!

I have a fork that I made and improved it a bit:
https://gist.github.com/mkolesni/92076378d45c7b5e692b

This fork supports:
1. Button/link is integrated nicely to the gerrit UI (appears in the
comments title, just like the other ones).
2. Auto hide will hide comments by default (can be turned off).
3. "Regex" like bot detection which requires a shorter list of unique
bot names, and less maintenance of the script.
4. oVirt support (for those interested).

Regards,
Mike

- Original Message -
> Hi,
> 
> I created a small userscript that allows you to hide CI comments in Gerrit.
> That way you can read only comments written by humans and hide everything
> else. I’ve been struggling for a long time to follow discussions on changes
> with many patch sets because of the CI noise. So I came up with this
> userscript:
> 
> https://gist.github.com/rgerganov/35382752557cb975354a
> 
> It adds “Toggle CI” button at the bottom of the page that hides/shows CI
> comments. Right now it is configured for Nova CIs, as I contribute mostly
> there, but you can easily make it work for other projects as well. It
> supports both the “old” and “new” screens that we have.
> 
> How to install on Chrome: open chrome://extensions and drag&drop the script
> there
> How to install on Firefox: install Greasemonkey first and then open the
> script
> 
> Known issues:
>  - you may need to reload the page to get the new button
>  - I tried to add the button somewhere close to the collapse/expand links but
>  it didn’t work for some reason
> 
> Hope you will find it useful. Any feedback is welcome :)
> 
> Thanks,
> Rado
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] VIF event callbacks implementation

2014-05-07 Thread Mike Kolesnik
Hi,

Sorry for late reply, please see replies inline.

I have additional concern that API is something that's user facing so
basically now Nova is exposing some internal synchronization detail to
the outside world.
Does it make sense that the user would now be able to send messages to
this API?

- Original Message -
> > Aside from creating a sort of cyclic dependency between the two, it
> > is my understanding that Neutron is meant to be a "stand alone"
> > service capable of being consumed by other compute managers (i.e.
> > oVirt). This breaks that paradigm.
> 
> 
> 
> > So my question is: Why use API and not RPC?
> > 
> > I saw that there is already a notification system in Neutron that
> > notifies on each port update (among other things) which are
> > currently consumed by Ceilometer. Why not have Nova use those
> > notifications to decide that a VIF got plugged correctly, floating
> > IPs changed, and so on?
> 
> To your point above, having either service sit on the other's RPC bus
> ties them together far closer than having them consume each other's REST
> API, IMHO. Further, Nova's internal RPC mechanics are controlled pretty
> tightly for upgrade compatibility and I don't think I'd want another
> service sitting on that bus that we have to worry about when we're
> coordinating a change across releases (which we do quite often). We
> consume Neutron's services via the REST API because that's what is
> exposed externally and guaranteed to be stable -- the same goes for
> Neutron consuming anything from Nova.

Not sure why RPC is more coupled than API. Maybe you could explain?

Currently it's a very specific API putting a burden on Neutron to now know
what is VIF and what state is necessary for this VIF to be working,
instead of having these calculations in Nova (which is of course aware of
VIF, and of Port).

I wasn't suggesting touching Nova's RPC but rather utilize the existing
notifications sent from Neutron to achieve the same logic. So not sure what
changes you believe are to be coordinated from Nova's POV.

Also, RPC is just what is used today. It could similarly be a queue with
some defined message format. IMHO "RPC" is just that, it's not really
making any remote procedure calls in the notifications it sends.

We could alternatively provide a "callback" functionality that allows
various clients to receive notifications from Neutron, specifying an
address to send these details to.

> 
> > I believe the rationale here was that nova's API interface is only
> > currently exposed via a rest API over http so leveraging this
> > existing framework seemed like a good place to do it. In addition,
> > there didn't seem to be an obvious advantage to using RPC rather
> > than the rest interface. Lastly, this new interface that nova exposes
> > is generic and not neutron specific as it can be used for other type
> > of notifications that things want to send nova. I added Dan Smith to
> > CC to keep me honest here as I believe this was the rationale.
> 
> Yeah, we've already got plans in place to get Cinder to use the
> interface to provide us more detailed information and eliminate some
> polling. We also have a very purpose-built notification scheme between
> nova and cinder that facilitates a callback for a very specific
> scenario. I'd like to get that converted to use this mechanism as well,
> so that it becomes "the way you tell nova that things it's waiting for
> have happened."

Not sure how you consider this "mechanism" something generic since it's
facilitating only Nova while there might be a number of different services
interested in this information.
Now Neutron needs to be aware of VIF and Nova's expectations of Neutron
in regards to that VIF, which is highly tightly coupled.

Using a notification scheme where any subscriber can receive the event
from Neutron/Cinder/etc and handle it how it needs instead would be
much more decoupled, IMHO.

> 
> --Dan
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][neutron] VIF event callbacks implementation

2014-04-28 Thread Mike Kolesnik
Hi, 

I came across the implementation of 
https://blueprints.launchpad.net/neutron/+spec/nova-event-callback
and have a question about the way it was implemented.

I notice that now Neutron has a dependency on Nova and needs to be configured
to have nova details (API endpoint, user, password, tenant, etc).
Aside from creating a sort of cyclic dependency between the two, it is my
understanding that Neutron is meant to be a "stand alone" service capable of
being consumed by other compute managers (i.e. oVirt).
This breaks that paradigm.

So my question is: Why use API and not RPC?

I saw that there is already a notification system in Neutron that notifies on
each port update (among other things) which are currently consumed by 
Ceilometer.
Why not have Nova use those notifications to decide that a VIF got plugged 
correctly,
floating IPs changed, and so on?

I am willing to make the necessary changes to decouple Neutron from Nova, but
want to understand the rationale behind the original decision of using API
and not RPC notifications.

Regards,
Mike

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev