from:"Ian Wells"

Re: [openstack-dev] [tc][all] A culture change (nitpicking)

2018-05-29 Thread Ian Wells

On 29 May 2018 at 14:53, Jeremy Stanley  wrote:

> On 2018-05-29 15:25:01 -0500 (-0500), Jay S Bryant wrote:
> [...]
> > Maybe it would be different now that I am a Core/PTL but in the past I
> had
> > been warned to be careful as it could be misinterpreted if I was changing
> > other people's patches or that it could look like I was trying to pad my
> > numbers. (I am a nit-picker though I do my best not to be.
> [...]
>
> Most stats tracking goes by the Gerrit "Owner" metadata or the Git
> "Author" field, neither of which are modified in a typical new
> patchset workflow and so carry over from the original patchset #1
> (resetting Author requires creating a new commit from scratch or
> passing extra options to git to reset it, while changing the Owner
> needs a completely new Change-Id footer).
>

We know this, but other people don't, so the comment is wise.  Also,
arguably, if I badly fix someone else's patch, I'm making them look bad by
leaving them with the 'credit' for my bad work, so it's important to be
careful and tactful.  But the history is public record, at least.

-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc][all] A culture change (nitpicking)

2018-05-29 Thread Ian Wells

If your nitpick is a spelling mistake or the need for a comment where
you've pretty much typed the text of the comment in the review comment
itself, then I have personally found it easiest to use the Gerrit online
editor to actually update the patch yourself.  There's nothing magical
about the original submitter, and no point in wasting your time and theirs
to get them to make the change.  That said, please be a grown up; if you're
changing code or messing up formatting enough for PEP8 to be a concern,
it's your responsibility, not the original submitter's, to fix it.  Also,
do all your fixes in one commit if you don't want to make Zuul cry.
-- 
Ian.


On 29 May 2018 at 09:00, Neil Jerram  wrote:

> From my point of view as someone who is still just an occasional
> contributor (in all OpenStack projects other than my own team's networking
> driver), and so I think still sensitive to the concerns being raised here:
>
> - Nits are not actually a problem, at all, if they are uncontroversial and
> quick to deal with.  For example, if it's a point of English, and most
> English speakers would agree that a correction is better, it's quick and no
> problem for me to make that correction.
>
> - What is much more of a problem is:
>
>   - Anything that is more a matter of opinion.  If a markup is just the
> reviewer's personal opinion, and they can't say anything to explain more
> objectively why their suggestion is better, it would be wiser to defer to
> the contributor's initial choice.
>
>   - Questioning something unconstructively or out of proportion to the
> change being made.  This is a tricky one to pin down, but sometimes I've
> had comments that raise some random left-field question that isn't really
> related to the change being made, or where the reviewer could have done a
> couple minutes research themselves and then either made a more precise
> comment, or not made their comment at all.
>
>   - Asking - implicitly or explicitly - the contributor to add more
> cleanups to their change.  If someone usefully fixes a problem, and their
> fix does not of itself impair the quality or maintainability of the
> surrounding code, they should not be asked to extend their fix so as to fix
> further problems that a more regular developer may be aware of in that
> area, or to advance a refactoring / cleanup that another developer has in
> mind.  (At least, not as part of that initial change.)
>
> (Obviously the common thread of those problem points is taking up more
> time; psychologically I think one of the things that can turn a contributor
> away is the feeling that they've contributed a clearly useful thing, yet
> the community is stalling over accepting it for reasons that do not appear
> clearcut.)
>
> Hoping this is vaguely helpful...
>  Neil
>
>
> On Tue, May 29, 2018 at 4:35 PM Amy Marrich  wrote:
>
>> If I have a nit that doesn't affect things, I'll make a note of it and
>> say if you do another patch I'd really like it fixed but also give the
>> patch a vote. What I'll also do sometimes if I know the user or they are
>> online I'll offer to fix things for them, that way they can see what I've
>> done, I've sped things along and I haven't caused a simple change to take a
>> long amount of time and reviews.
>>
>> I think this is a great addition!
>>
>> Thanks,
>>
>> Amy (spotz)
>>
>> On Tue, May 29, 2018 at 6:55 AM, Julia Kreger <
>> juliaashleykre...@gmail.com> wrote:
>>
>>> During the Forum, the topic of review culture came up in session after
>>> session. During these discussions, the subject of our use of nitpicks
>>> were often raised as a point of contention and frustration, especially
>>> by community members that have left the community and that were
>>> attempting to re-engage the community. Contributors raised the point
>>> of review feedback requiring for extremely precise English, or
>>> compliance to a particular core reviewer's style preferences, which
>>> may not be the same as another core reviewer.
>>>
>>> These things are not just frustrating, but also very inhibiting for
>>> part time contributors such as students who may also be time limited.
>>> Or an operator who noticed something that was clearly a bug and that
>>> put forth a very minor fix and doesn't have the time to revise it over
>>> and over.
>>>
>>> While nitpicks do help guide and teach, the consensus seemed to be
>>> that we do need to shift the culture a little bit. As such, I've
>>> proposed a change to our principles[1] in governance that attempts to
>>> capture the essence and spirit of the nitpicking topic as a first
>>> step.
>>>
>>> -Julia
>>> -
>>> [1]: https://review.openstack.org/570940
>>>
>>> 
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
>>> unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>

Re: [openstack-dev] [neutron][neutron-lib]Service function defintion files

2017-12-29 Thread Ian Wells

On 28 December 2017 at 06:57, CARVER, PAUL  wrote:

> It was a gating criteria for stadium status. The idea was that the for a
> stadium project the neutron team would have review authority over the API
> but wouldn't necessarily review or be overly familiar with the
> implementation.
>
> A project that didn't have it's API definition in neutron-lib could do
> anything it wanted with its API and wouldn't be a neutron subproject
> because the neutron team wouldn't necessarily know anything at all about it.
>
> For a neutron subproject there would at least theoretically be members of
> the neutron team who are familiar with the API and who ensure some sort of
> consistency across APIs of all neutron subprojects.
>
> This is also a gating criteria for publishing API documentation on
> api.openstack.org vs publishing somewhere else. Again, the idea being
> that the neutron team would be able, at least in some sense, to "vouch for"
> the OpenStack networking APIs, but only for "official" neutron stadium
> subprojects.
>
> Projects that don't meet the stadium criteria, including having api-def in
> neutron-lib, are "anything goes" and not part of neutron because no one
> from the neutron team is assumed to know anything about them. They may work
> just fine, it's just that you can't assume that anyone from neutron has
> anything to do with them or even knows what they do.
>

OK - that makes logical sense, though it does seem that it would tie
specific versions of every service in that list to a common version of
neutron-lib as a byproduct, so it would be impossible to upgrade LBaaS
without also potentially having to upgrade bgpvpn, for instance.  I don't
know if that was the intention, but I wouldn't have expected it.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [neutron][neutron-lib]Service function defintion files

2017-12-27 Thread Ian Wells

Hey,

Can someone explain how the API definition files for several service
plugins ended up in neutron-lib?  I can see that they've been moved there
from the plugins themselves (e.g. networking-bgpvpn has
https://github.com/openstack/neutron-lib/commit/3d3ab8009cf435d946e206849e85d4bc9d149474#diff-11482323575c6bd25b742c3b6ba2bf17)
and that there's a stadium element to it judging by some earlier commits on
the same directory, but I don't understand the reasoning why such service
plugins wouldn't be self-contained - perhaps someone knows the history?

Thanks,
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [neutron][networking-vpp]networking-vpp 17.10 for VPP 17.10 is available

2017-11-20 Thread Ian Wells

In conjunction with the release of VPP 17.10, I'd like to invite you all to
try out networking-vpp 17.10(*) for VPP 17.10.  VPP is a fast userspace
forwarder based on the DPDK toolkit, and uses vector packet processing
algorithms to minimise the CPU time spent on each packet and maximise
throughput.  networking-vpp is a ML2 mechanism driver that controls VPP on
your control and compute hosts to provide fast L2 forwarding under Neutron.

This version has a few additional enhancements, along with supporting the
VPP 17.10 API:
- we can now optionally sign data stored in etcd, the communication
mechanism we use, for additional security
- The L3 functionality has been reworked in preparation for HA.

Along with this, there have been the usual upkeep as Neutron versions
change, bug fixes, code and test improvements.

The README [1] explains how you can try out VPP using devstack: the
devstack plugin will deploy the mechanism driver and VPP itself and should
give you a working system with a minimum of hassle.  It will now use the
etcd version deployed by newer versions of devstack.

We will continuing development between now and VPP's 18.02 release in
February.  There are several features we're planning to work on (you'll
find a list in our RFE bugs at [2]), and we welcome anyone who would like
to come help us.

Everyone is welcome to join our biweekly IRC meetings, every other Monday
(the next one is due in a week), 0800 PST = 1600 GMT.
-- 
Ian.

(*) Yes, I know we're in November, but VPP was released last month just
before the summit, and then I went on holiday.  It's called 17.10 to
correspond to the VPP release.
[1]https://github.com/openstack/networking-vpp/blob/master/README.rst
[2]http://goo.gl/i3TzAt
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] MTU native ovs firewall driver

2017-09-20 Thread Ian Wells

Since OVS is doing L2 forwarding, you should be fine setting the MTU to as
high as you choose, which would probably be the segment_mtu in the config,
since that's what it defines - the largest MTU that (from the Neutron API
perspective) is usable and (from the OVS perspective) will be used in the
system.  A 1500MTU Neutron network will work fine over a 9000MTU OVS switch.

What won't work is sending a 1500MTU network to a 9000MTU router port.  So
if you're doing any L3 (where the packet arrives at an interface, rather
than travels a segment) you need to consider those MTUs in light of the
Neutron network they're attached to.
-- 
Ian.

On 20 September 2017 at 09:58, Ihar Hrachyshka  wrote:

> On Wed, Sep 20, 2017 at 9:33 AM, Ajay Kalambur (akalambu)
>  wrote:
> > So I was forced to explicitly set the MTU on br-int
> > ovs-vsctl set int br-int mtu_request=9000
> >
> >
> > Without this the tap device added to br-int would get MTU 1500
> >
> > Would this be something the ovs l2 agent can handle since it creates the
> bridge?
>
> Yes, I guess we could do that if it fixes your problem. The issue
> stems from the fact that we use a single bridge for different networks
> with different MTUs, and it does break some assumptions kernel folks
> make about a switch (that all attached ports steer traffic in the same
> l2 domain, which is not the case because of flows we set). You may
> want to report a bug against Neutron and we can then see how to handle
> that. I will probably not be as simple as setting the value to 9000
> because different networks have different MTUs, and plugging those
> mixed ports in the same bridge may trigger MTU updates on unrelated
> tap devices. We will need to test how kernel behaves then.
>
> Also, you may be interested in reviewing an old openvswitch-dev@
> thread that I once started here:
> https://mail.openvswitch.org/pipermail/ovs-dev/2016-June/316733.html
> Sadly, I never followed up with a test scenario that wouldn't involve
> OpenStack, for OVS folks to follow up on, so it never moved anywhere.
>
> Cheers,
> Ihar
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [neutron][networking-vpp]networking-vpp 17.07.1 for VPP 17.07 is available

2017-07-30 Thread Ian Wells

In conjunction with the release of VPP 17.07, I'd like to invite you all to
try out networking-vpp 17.07.1 for VPP 17.07.  VPP is a fast userspace
forwarder based on the DPDK toolkit, and uses vector packet processing
algorithms to minimise the CPU time spent on each packet and maximise
throughput.  networking-vpp is a ML2 mechanism driver that controls VPP on
your control and compute hosts to provide fast L2 forwarding under Neutron.

This version has a few additional enhancements, along with supporting the
VPP 17.07 API:
- remote security group IDs are now supported
- VXLAN GPE support now includes proxy ARP at the local forwarder

Along with this, there have been the usual bug fixes, code and test
improvements.

The README [1] explains how you can try out VPP using devstack: the
devstack plugin will deploy etcd, the mechanism driver and VPP itself and
should give you a working system with a minimum of hassle.

We will continuing development between now and VPP's 17.10 release in
October.  There are several features we're planning to work on (you'll find
a list in our RFE bugs at [2]), and we welcome anyone who would like to
come help us.

Everyone is welcome to join our biweekly IRC meetings, every other Monday
(the next one is due in a week), 0900 PDT = 1600 GMT.
-- 
Ian.

[1]https://github.com/openstack/networking-vpp/blob/master/README.rst
[2]http://goo.gl/i3TzAt
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] writable mtu

2017-07-07 Thread Ian Wells

On 7 July 2017 at 12:14, Ihar Hrachyshka  wrote:

> > That said: what will you do with existing VMs that have been told the
> MTU of
> > their network already?
>
> Same as we do right now when modifying configuration options defining
> underlying MTU: change it on API layer, update data path with the new
> value (tap to brq to router/dhcp legs) and hope instances will get
> there too (by means of dhcp lease refresh eventually happening, or
> rebooting instances, or else). There is no silver bullet here, we have
> no way to tell instances to update their interface MTUs.
>

Indeed, and I think that's my point.

Let me propose an option 2.

Refuse to migrate if it would invalidate the MTU property on an existing
network.  If this happens, the operator can delete such networks, or clear
them out and recreate them with a smaller MTU.  The point being, since the
automation can't reliably fix the MTU of the running VMs, the automation
shouldn't change the MTU of the network - it's not in the power of the
network control code to get the results right - and you should instead tell
the operator he has to make some decisions to make about whether VMs have
to be restarted, networks deleted or recreated, etc. that can't be judged
automatically.

However, explain in the documentation how to make a migration that won't
invalidate your existing virtual networks' MTUs, allowing you to preserve
all your networks with the same MTU they already have.  If you migrate
encap-A to bigger-encap-B (and you lose some more bytes from the infra MTU)
it would refuse to migrate most networks *unless* you simultaneously
increased the path_mtu to allow for the extra bytes.  So, B takes 10 extra
bytes, you fiddle with your switches to increase their MTU by 10, your
auto-migration itself fiddles with the MTUs on host interfaces and
vswitches, and the MTU of the virtual network remains the same (because
phys MTU - encap >= biggest allowed virtual network MTU before the upgrade).

> At least not till we get both new ovs and virtio-net in the guests
> that will know how to deal with MTU hints:
> https://bugzilla.redhat.com/show_bug.cgi?id=1408701
> https://bugzilla.redhat.com/show_bug.cgi?id=1366919
> (there should also be ovs integration piece but I can't find it right
> away.)
>

... and every OS on the planet actually uses it, and no-one uses an e1000
NIC or an SRIOV NIC, and and and...

> Though even with that, I don't know if guest will be notified about
> changes happening during its execution, or only on boot (that probably
> depends on whether virtio polls the mtu storage). And anyway, it
> depends on guest kernel, so no luck for windows guests and such.
>
> >
> > Put a different way, a change to the infrastructure can affect MTUs in
> two
> > ways:
> >
> > - I increase the MTU that a network can pass (by, for instance,
> increasing
> > the infrastructure of the encap).  I don't need to change its MTU because
> > VMs that run on it will continue to work.  I have no means to tell the
> VMs
> > they have a bigger MTU now, and whatever method I might use needs to be
> 100%
> > certain to work or left-out VMs will become impossible to talk to, so
> > leaving the MTU alone is sane.
>
> In this scenario, it sounds like you assume everything will work just
> fine. But you don't consider neutron routers that will enforce the new
> larger MTU for fragmentation, that may end up sending frames to
> unaware VMs of size that they can't choke.
>

Actually, no.  I'm saying here that I increase the *MTU that the network
can pass* - for instance, I change the MTU on my physical switch from 1500
to 9000 - but I don't change anything about my OpenStack network
properties.  Thus if I were to send a packet of 9000 (and the property on
the virtual network still says the MTU is 1500) it gets to its destination,
because the API doesn't guarantee that the packets are dropped; it just
makes no guarantee that the packet will be passed, so this is undefined
behaviour territory.  The virtual network's MTU *property* is still 1500,
we can still guarantee that the network will pass packets up to and
including 1500 bytes, and the router interfaces, just like VM interfaces,
are set from the MTU property to a 1500 MTU - so they emit transmissible
packets and they all agree on the MTU size, which is what's necessary for a
network to work.  The fact that the fabric will now pass 9000 byte packets
isn't relevant.

> > - I decrease the MTU that a network can pass (by, for instance, using an
> > encap with larger headers).  The network comprehensively breaks; VMs
> > frequently fail to communicate regardless of whether I change the network
> > MTU property, because running VMs have already learned their MTU value
> and,
> > again, there's no way to update their idea of what it is reliably.
> > Basically, this is not a migration that can be done with running VMs.
>
> Yeah. You may need to do some multiple step dance, like:
>
> - before mtu reduction,

Re: [openstack-dev] [neutron] writable mtu

2017-07-05 Thread Ian Wells

OK, so I should read before writing...

On 5 July 2017 at 18:11, Ian Wells <ijw.ubu...@cack.org.uk> wrote:

> On 5 July 2017 at 14:14, Ihar Hrachyshka <ihrac...@redhat.com> wrote:
>
>> Heya,
>>
>> we have https://bugs.launchpad.net/neutron/+bug/1671634 approved for
>> Pike that allows setting MTU for network on creation.
>
>
> This was actually in the very first MTU spec (in case no one looked),
> though it never got implemented.  The spec details a whole bunch of stuff
> about how to calculate whether the proposed MTU will fit within the encap,
> incidentally, and will reject network creations when it doesn't.
>

OK, even referenced in the bug, so apologies, we're all good.

So, I wonder if we can instead lay the ground for updatable MTU right
>> away, and allow_post: True from the start, even while implementing
>> create only as a phase-1. Then we can revisit the decision if needed
>> without touching api. What do you think?
>>
>
I think I misinterpreted: you'd enable all options and then deal with the
consequences in the backend code which has to implement one the of the
previously listed behaviours?  That seems sane to me provided the required
behaviours are documented somewhere where a driver implementer has to trip
over them.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] writable mtu

2017-07-05 Thread Ian Wells

On 5 July 2017 at 14:14, Ihar Hrachyshka  wrote:

> Heya,
>
> we have https://bugs.launchpad.net/neutron/+bug/1671634 approved for
> Pike that allows setting MTU for network on creation.

This was actually in the very first MTU spec (in case no one looked),
though it never got implemented.  The spec details a whole bunch of stuff
about how to calculate whether the proposed MTU will fit within the encap,
incidentally, and will reject network creations when it doesn't.

Note that the MTU attribute was intended to represent an MTU that will
definitely transit.  I guess no-one would actually rely on this, but to
clarify, it's not intended to indicate that bigger packets will be dropped,
only that smaller packets will not be dropped (which is the guarantee you
need for two VMs to talk to each other.  Thus the MTU doesn't need to be
increased just because the infrastructure MTU has become larger; it just
means that future networks can be created with larger MTUs from this point,
and the current MTU will still be valid.

This is also the MTU that all VMs on that network will be told, because
they need to use the same value to function.  If you change it, VMs after
the event will have problems talking to their earlier friends because they
will now disagree on MTU (and routers will have problems talking to at
least one of those sets).

(but not update,
> as per latest comment from Kevin there) I already see a use case to
> modify MTU for an existing network (for example, where you enable
> Jumbo frames for underlying infrastructure, and want to raise the
> ceiling; another special case is when you migrate between different
> encapsulation technologies, like in case of ml2/ovs to networking-ovn
> migration where the latter doesn't support VXLAN but Geneve only).
>

You look like you're changing the read-only segmentation type of the
network on this migration - presumably in the DB directly - so you're
changing non-writeable fields already.  Couldn't the MTU be changed in a
similarly offline manner?

That said: what will you do with existing VMs that have been told the MTU
of their network already?

Put a different way, a change to the infrastructure can affect MTUs in two
ways:

- I increase the MTU that a network can pass (by, for instance, increasing
the infrastructure of the encap).  I don't need to change its MTU because
VMs that run on it will continue to work.  I have no means to tell the VMs
they have a bigger MTU now, and whatever method I might use needs to be
100% certain to work or left-out VMs will become impossible to talk to, so
leaving the MTU alone is sane.
- I decrease the MTU that a network can pass (by, for instance, using an
encap with larger headers).  The network comprehensively breaks; VMs
frequently fail to communicate regardless of whether I change the network
MTU property, because running VMs have already learned their MTU value and,
again, there's no way to update their idea of what it is reliably.
Basically, this is not a migration that can be done with running VMs.

If I go and implement the RFE as-is, and later in Queens we pursue
> updating MTU for existing networks, we will have three extensions for
> the same thing.
>
> - net-mtu (existing read only attribute)
> - net-mtu-enhanced (allow write on create)
> - net-mtu-enhanced-enhanced (allow updates)
>
> Not to mention potential addition of per-port MTU that some folks keep
> asking for (and we keep pushing against so far).
>
> So, I wonder if we can instead lay the ground for updatable MTU right
> away, and allow_post: True from the start, even while implementing
> create only as a phase-1. Then we can revisit the decision if needed
> without touching api. What do you think?
>

It's trivially detectable that an MTU value can't be set at all, or can be
set initially but not changed.  Could we use that approach?  That way, we
don't need multiple extensions, the current one is sufficient (and - on the
assumption that you don't rely on 'read-only attribute' errors in normal
code, I think we can call this backward compatible).

> Another related question is, how do we expose both old and new
> extensions at the same time? I would imagine that implementations
> capable of writing to the mtu attribute would advertise both old and
> new extensions. Is it correct? Does neutron api layer allow for
> overlapping attribute maps?
>

Extension net-mtu: MTU attr exists, can't set MTU at all, passing an MTU
returns a bad argument error
Extension net-mtu: MTU attr exists, can set MTU on startup, failed (too
big) MTU values return a more specific MTU too big error
Extension net-mtu: MTU attr exists, can set after creation, setting MTU
after creation fails as for startup write (which it appears you already
have in mind)

-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

Re: [openstack-dev] [api][neutron][nova][Openstack-operators][interop] Time for a bikeshed - help me name types of networking

2017-05-15 Thread Ian Wells

I'm coming to this cold, so apologies when I put my foot in my mouth.  But
I'm trying to understand what you're actually getting at, here - other than
helpful simplicity - and I'm not following the detail of you're thinking,
so take this as a form of enquiry.

On 14 May 2017 at 10:02, Monty Taylor  wrote:

> First off, we need to define two terms:
> "external" - an address that can be used for north-south communication off
> the cloud
> "internal" - an address that can be used for east-west communication with
> and only with other things on the same cloud
>

I'm going through the network detail of this and picking out shortcomings,
so please understand that before you read on.

I think I see what you're trying to accomplish, but the details don't add
up for me.  The right answer might be 'you don't use this if you want fine
detailed network APIs' - and that's fine - but I think the issue is not
coming up with a model that contradicts the fine detail of what you can do
with a network today and how you can put it to use.

1. What if there are more domains of connectivity, since a cloud can be
connected to multiple domains?  I see this in its current form as intended
for public cloud providers as much as anything, in which case there is
probably only one definition of 'external', for instance, but if you want
to make it more widely useful you could define off-cloud routing domain
names, of which 'external' (or, in that context, 'internet') is one with a
very specific meaning.

2. What is 'internal', precisely?  It seems to be in-cloud, though I don't
entirely understand how NAT comes into that.  What of a route to the
provider's internal network?  How does it apply when I have multiple tenant
networks that can't talk to each other, when they're provisioned for me and
I can't create them, and so on?  Why doesn't it apply to IPv6?

3. Why doesn't your format tell me how to get a port/address of the type in
question?  Do you feel that everything will be consistent in that regard?
To my mind it's more useful - at the least - to tell me the *identity* of
the network I should be using rather than saying 'such a thing is possible
in the abstract'.

[...]

"get me a server with only an internal ipv4 and please fail if that isn't
> possible"
>
>   create_server(
>   'my-server', external_network=False, internal_network=True)
>

A comment on all of these: are you considering this to be an argument that
is acted upon in the library, or available on the server?

Doing this in the library makes more sense to me.  I prefer the idea of
documenting in machine-readable form how to use the APIs, because it means
I can use a cloud without the cloud supporting the API.  For many clouds,
the description could be a static file, but for more complex situations it
would be possible to generate it programmatically per tenant.

Doing it the other way could also lead to cloud-specific code, and without
some clearer specification it might also lead to cloud-specific behaviour.

It's also complexity that simply doesn't need to be in the cloud; putting
it in the application gives an application with a newer library the
opportunity to use an older cloud.

2) As information on networks themselves:
>
> GET /networks.json
> {
>   "networks": [
> {
>   "status": "ACTIVE",
>   "name": "GATEWAY_NET_V6",
>   "id": "54753d2c-0a58-4928-9b32-084c59dd20a6",
>   "network-models": [
> "ipv4-internal-direct",
> "ipv6-direct"
>   ]
> },
>

[...]

I think the problem with this as a concept, if this is what you're
eventually driving towards, is how you would enumerate this for a network.

IPv6 may be routed to the internet (or other domains) or it may not, but if
it is it's not currently optional to be locally routed and not internet
routed on a given network as it is for a v4 address to be fixed without a
floating component.  (You've skipped this by listing only ipv6-direct, I
think, as an option, where you have ipv4-fixed).

ipv4 may be routed to the internet if a router is connected, but I can
connect a router after the fact and I can add a floating IP to a port after
the fact too.  If you're just thinking in terms of 'when starting a VM, at
this instant in time' then that might not be quite so much of an issue.

I'm not suggesting putting info on subnets, since one requests connectivity
> from a network, not a subnet.
>

Not accurate - I can select a subnet on a network, and it can change who I
can talk to based on routes.  Neutron routers are attached to subnets, not
networks.

On a final note, this is really more about 'how do I make a port with this
sort of connectivity' with the next logical step being that many VMs only
need one port.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [neutron] multiple vlan provider and tenant networks at the same time?

2017-05-12 Thread Ian Wells

There are two steps to how this information is used:

Step 1: create a network - the type driver config on the neutron-server
host will determine which physnet and VLAN ID to use when you create it.
It gets stored in the DB.  No networking is actually done, we're just
making a reservation here.  The network_vlan_ranges are used to work out
which VLANs can be used automatically for tenant networks (and if you
specify a provider network then the infrormation in the Neutron call is
just copied into the DB).

Step 2: bind a port - we look up the network in the DB to find all that
information out, then tell the OVS agent to attach a port to a specific
physnet and VLAN on a specific host.  The OVS agent on that host uses the
bridge_mappings to work out how to do that.  And note, we don't care
whether the network was created as a tenant or provider network at this
point - it's just a network.

On 12 May 2017 at 06:26,  wrote:

> [ml2_type_vlan]
> network_vlan_ranges = provider0,provider1,tenant-vla
> n2:200:299,tenant-vlan3:300:399
>

So here you're telling the controller (neutron-server) that two physical
networks, provider0 and provider1, exist that can only be used for VLAN
provider networks (because you haven't granted any ranges to neutron-server
for it to use automatically), and you've set up two physical networks with
VLAN ranges that Neutron may consume automatically for its tenant networks
(it will use VLANs out of the ranges you gave) *or* you can use for
provider networks (by specifying a VLAN using the provider network
properties when you create the Neutron network).  This tells Neutron
information it can use in step 1 above, at allocation time.

A side note: different physical networks *should* be wired to entirely
independent physical switches that are not connected to each other; that's
what they're designed for, networks that are physically separated.  That
said, I've seen cases where people do actually wire them together for
various reasons, like getting separate bandwidth through a different
interface for external networks.  If you do that you have to be careful
which VLANs you use for provider networks on your provider physnets -
Neutron will not be able to work out that provider0 vlan 200 and
tenant-vlan2 vlan 200 are actually the same network, for instance, if you
connect both uplinks to the same switch.

[ovs]
> bridge_mappings = provider0:br-ext0,provider1:br
> -ext1,tenant-vlan2:br-vlan2,tenant-vlan3:br-vlan3
>

The 'bridge_mappings' setting is for compute and network nodes, where you
plan on connecting things to the network.  It tells the OVS agent how to
get packets to and from a specific physical network.  It gets used for port
binding - step 2 - and *not* when networks are created.  You've specified a
means to get packets from all four of your physnets, which is normal.  It
doesn't say anything about how those physnets are used - it doesn't even
say they're used for VLANs, I could put a flat network on there if I wanted
- and it certainly doesn't say why those VLANs might have been chosen.

How can neutron decide on choosing correct vlan mapping for tenant? Will it
> try to pick provider0 if normal user creates a tenant network?
>

Neutron-server will choose VLANs for itself, when you create a network, and
when you don't hand over provider network properties.  And it will only
choose VLANs from the ranges you specified - so it will never choose a VLAN
automatically from the providerX physnets, given your configuration.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [neutron][networking-vpp]networking-vpp for VPP 17.04 is available

2017-04-25 Thread Ian Wells

In conjunction with the release of VPP 17.04, I'd like to invite you all to
try out networking-vpp for VPP 17.04.  VPP is a fast userspace forwarder
based on the DPDK toolkit, and uses vector packet processing algorithms to
minimise the CPU time spent on each packet and maximise throughput.
networking-vpp is a ML2 mechanism driver that controls VPP on your control
and compute hosts to provide fast L2 forwarding under Neutron.

This version has a few additional features:
- resync - this allows you to upgrade the agent while packets continue to
flow through VPP, and to update VPP and get it promptly reconfigured, and
should mean you can do maintenance operations on your cloud with little to
no network service interruption (per NFV requirements)
- VXLAN GPE - this is a VXLAN overlay with a LISP-based control plane to
provide horizontally scalable networking with L2FIB propagation.  You can
also continue to use the standard VLAN and flat networking.
- L3 support - networking-vpp now includes a L3 plugin and driver code
within the agent to use the L3 functionality of VPP to provide Neutron
routers.

Along with this, there have been the usual bug fixes, code and test
improvements.

The README [1] explains how you can try out VPP using devstack, which is
even simpler than before the devstack plugin will deploy etcd, the
mechanism driver and VPP itself and should give you a working system with a
minimum of hassle.

We will continuing development between now and VPP's 17.07 release in
July.  There are several features we're planning to work on (you'll find a
list in our RFE bugs at [2]), and we welcome anyone who would like to come
help us.

Everyone is welcome to join our new biweekly IRC meetings, every Monday
(including next Monday), 0900 PST = 1600 GMT.

[1]https://github.com/openstack/networking-vpp/blob/master/README.rst
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] - Neutron team social in Atlanta on Thursday

2017-02-22 Thread Ian Wells

+1

On 21 February 2017 at 16:18, Ichihara Hirofumi  wrote:

> +1
>
> 2017-02-17 14:18 GMT-05:00 Kevin Benton :
>
>> Hi all,
>>
>> I'm organizing a Neutron social event for Thursday evening in Atlanta
>> somewhere near the venue for dinner/drinks. If you're interested, please
>> reply to this email with a "+1" so I can get a general count for a
>> reservation.
>>
>> Cheers,
>> Kevin Benton
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] API models [was: PTL candidacy]

2017-01-25 Thread Ian Wells

On 25 January 2017 at 18:07, Kevin Benton  wrote:

> >Setting aside all the above talk about how we might do things for a
> moment: to take one specific feature example, it actually took several
> /years/ to add VLAN-aware ports to OpenStack.  This is an example of a
> feature that doesn't affect or interest the vast majority of the user
> community
>
> Now that it has been standardized on, other services can actually use it
> to build things (e.g. Kuryr, Octavia). If each vendor just went and wrote
> up their own extension for this, no project could reliably build anything
> on top of it that would work with multiple backend vendors.
>

In the weirdest way, I think we're agreeing.  The point is that the API
definition is what matters and the API definition is what we have to
agree.  That means that we should make it easy to develop the API
definition so that we can standardise it promptly, and it also means that
we would like to have the opportunity to agree that extensions are
'standard' (versus today - write an extension attribute that works for you,
call the job done).  It certainly doesn't remove that standardisation step,
which is why I say that governance, here is key - choosing wisely the
things we agree are standards.

To draw an IETF parallel for a moment, it's easy to write and implement a
networking protocol - you need no help or permission - and easy to submit a
draft that describes that protocol.  It's considerably harder to turn that
draft into an accepted standard.

>and perhaps we need a governance model around ensuring that they are sane
> and resuable.
> >Gluon, specifically, is about solving exclusively the technical question
> of how to introduce independent APIs
>
> Doesn't that make Gluon support the opposite of having a governance model
> and standardization for APIs?
>

No.  If you want an in-house API then off you go, write your own.  If you
want to experiment in code, off you go, experimentation is good.  This is
why it should be easy to do.  The easier it is, the more Neutron work
you'll have time for, in fact.

But I can't unilaterally make something standard by saying it is, and
that's where the community can and should get involved.  You have a team of
core devs that can examine those proposed APIs - which are in a simple DSL
- and decide if they should be added to a repository, which you can look at
as an approval step when you decide they're worthy.

Making it hard to write an API is putting a barrier to entry and
experimentation in the way.  Making it hard to standardise that API - by,
for instance, putting technical requirements on it that it be supportable
and maintainable, generally usable and of no impact to people who don't use
it - that's far more useful.

We can't stop people writing bad APIs.  What we have in place today doesn't
stop that, as made by the Contrail point previously.  But that has no
relevance to the difficulty of writing APIs, which slows down both good and
bad implementations equally.

> >I do not care if we make Neutron extensible in some different way to
> permit this, if we start a new project or whatever, I just want it to
> happen.
>
> I would like to promote experimentation with APIs with Neutron as well,
> but I don't think it's ever going to be as flexible as the
> configuration-defined behavior Gluon allows.
>

You refer to it as 'configuration defined' but the API description is not a
configuration language to be changed on a whim by the end user - it's a
DSL, it's code (of a sort), it's a part of the thing you ship.

> My goal is getting new features that extend beyond one backend and without
> some community agreement on new APIs, I don't see how that's possible.
>

Again, I agree with this.  But again - one way of making standards is to
have a documented standard and two independent implementations.  We don't
see that today because that would be a huge effort.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [neutron][networking-vpp]networking-vpp for VPP 17.01 is available

2017-01-25 Thread Ian Wells

In conjunction with the release of VPP 17.01, I'd like to invite you all to
try out networking-vpp for VPP 17.01.  VPP is a fast userspace forwarder
based on the DPDK toolkit, and uses vector packet processing algorithms to
minimise the CPU time spent on each packet and maximise throughput.
networking-vpp is a ML2 mechanism driver that controls VPP on your control
and compute hosts to provide fast L2 forwarding under Neutron.

The latest version has been updated to work with the new featuers of VPP
17.01, including security group support based on VPP's ACL functionality.

The README [1] explains how you can try out VPP using devstack, which is
now pleasantly simple; the devstack plugin will deploy the mechanism driver
and VPP itself and should give you a working system with a minimum of
hassle.

We plan on continuing development between now and VPP's 17.04 release in
April.  There are several features we're planning to work on (you'll find a
list in our RFE bugs at [2]), and we welcome anyone who would like to come
help us.

Everyone is welcome to join our new biweekly IRC meetings, Monday 0800 PST
= 1600 GMT, due to start next Monday.
-- 
Ian.

[1]https://github.com/openstack/networking-vpp/blob/17.01/README.rst
[2]
https://bugs.launchpad.net/networking-vpp/+bugs?orderby=milestone_name=0
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] PTL candidacy

2017-01-25 Thread Ian Wells

On 25 January 2017 at 14:17, Monty Taylor  wrote:

> > Adding an additional networking project to try to solve this will only
> > make things work. We need one API. If it needs to grow features, it
> > needs to grow features - but they should be features that all of
> > OpenStack users get.
>
> WORSE - will make things WORSE - not work. Sorry for potentially
> completely misleading typo.
>

I should perhaps make clear that whenever I talk about 'other networking
APIs' I am not saying 'I think we should throw Neutron away' or 'we should
invent a shiny new API and compete with Neutron'.  I am saying that there's
value keeping the API of new features from intersecting the old when we add
things that are logically well separated from what we currently have.  As
it is, when you extend Neutron using current techniques, you have to touch
several elements of the existing API and you have no separation -
effectively you have to build outcroppings onto an API monolith.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] API models [was: PTL candidacy]

2017-01-25 Thread Ian Wells

I would certainly be interested in dicussing this, though I'm not currently
signed up for the PTG.  Obviously this is close to my interests, and I see
Kevin's raised Gluon as the bogeyman (which it isn't trying to be).

Setting aside all the above talk about how we might do things for a moment:
to take one specific feature example, it actually took several /years/ to
add VLAN-aware ports to OpenStack.  This is an example of a feature that
doesn't affect or interest the vast majority of the user community, and
which is almost certainly not installed in the cloud you're currently
working on, and you probably have no intention of ever programming, and
which even had cross-vendor support.  It's useful and there are times that
you can't do without it; there were people ready to write the code.  So why
was it so very hard?

I hope we will all agree on these points:

- Neutron's current API of networks, subnets and ports is fine for what it
does.  We like it, we write apps using it, it doesn't need to change
- The backward compatibility and common understanding of Neutron's API is
paramount - applications should work everywhere, and should continue to
work as Neutron evolves
- Some people want to do different things with networks, and that doesn't
make them bad people
- What is important about APIs is that they are *not* tied to an
implementation or reinvented by every backend provider, but commonly agreed

This says we find pragmatic ways to introduce sane, consumable APIs for any
new thing we want to do, and perhaps we need a governance model around
ensuring that they are sane and resuable.  None of this says that every API
should fit neatly into the network/port/subnet model we have - it was
designed for, and is good at describing, L2 broadcast domains.  (Gluon,
specifically, is about solving exclusively the technical question of how to
introduce independent APIs, and not the governance one of how to avoid
proliferation.)

For any new feature, I would suggest that we fold it in to the current API
if it's widely useful and closely compatible with the current model.  There
are clearly cases where changing the current API in complex ways to serve
1% of the audience is not necessary and not helpful, and I think this is
where our problems arise.  And by 'the current API' I mean the existing
network/port/subnet model that is currently the only way to describe how
traffic moves from one port to another.  I do not mean 'we must start
another project' or 'we must have another endpoint'.

However, we should have a way to avoid affecting this API if it makes no
sense to put it there.  We should find a way of offering new forwarding
features without finding increasingly odd ways of making networks, ports
and subnets serve the purpose.  They were created to describe L2 overlays,
which is still mostly what they are used to do - to the point that the most
widely used plugin by far is the modular *L2* plugin.  It's hardly a
surprise that these APIs don't map to every possible networking setup in
the world.  My argument is that it's *sane* to want to invent quite
different APIs, it's not competition or dilution, and should we choose to
do so it's even a reasonable thing to do within Neutron's current framework.

I do not care if we make Neutron extensible in some different way to permit
this, if we start a new project or whatever, I just want it to happen. If
you think that the Gluon approach to this is the wrong way to make it
happen, and I'm seeing general consensus here that this could be done
within Neutron, then I welcome alternative suggestions.  But I do honestly
believe that we make our own pain by insisting on one API that must be
ridigly backward- and cross-compatible and simultaneously insisting that
all novel ideas be folded into it.
-- 
Ian.

On 25 January 2017 at 10:19, Sukhdev Kapur  wrote:

> Folks,
>
> This thread has gotten too long and hard to follow.
> It is clear that we should discuss/address this.
> My suggestion is that we organize a session in Atlanta PTG meeting and
> discuss this.
>
> I am going to add this on the Neutron etherpad - should this be included
> in any other session as well?
>
> -Sukhdev
>
>
>
>
> On Tue, Jan 24, 2017 at 12:33 AM, Ihar Hrachyshka 
> wrote:
>
>> Hi team,
>>
>> I would like to propose my PTL candidacy for Pike.
>>
>> Some of you already know me. If not, here is my brief OpenStack bio. I
>> joined the community back in Havana, and managed to stick around till
>> now. During the time, I fit several project roles like serving as a
>> Neutron liaison of different kinds (stable, oslo, release), fulfilling
>> my Neutron core reviewer duties, taking part in delivering some
>> longstanding features, leading QoS and upgrades subteams, as well as
>> being part of Neutron Drivers team. I also took part in miscellaneous
>> cross project efforts.
>>
>> I think my experience gives me broad perspective on how the OpenStack
>> community and

Re: [openstack-dev] [Neutron][networking-*] Attention for upcoming refactoring

2017-01-03 Thread Ian Wells

I see this changes a function's argument types without changing the
function's name - for instance, in the proposed networking-cisco change,
https://review.openstack.org/#/c/409045/ .  This makes it hard to detect
that there's been a change and react accordingly.  What's the recommended
way to write a mechanism driver that is compatible with both pre- and
post-change Neutron versions?
-- 
Ian.

On 27 December 2016 at 02:29, Anna Taraday 
wrote:

> Hello everyone!
>
> Please, note that all changes to Neutron merged.
>
> Changes that needs to be merged for external repos:
> segments db refactor - https://review.openstack.org/#
> /q/status:open+branch:master+topic:segmentsdb
> ml2 db refactor - https://review.openstack.org/#/q/status:open+branch:
> master+topic:refactor_ml2db
>
> Happy holidays for everyone!
>
>
> On Thu, Dec 22, 2016 at 7:36 AM Russell Bryant  wrote:
>
>>
>> On Wed, Dec 21, 2016 at 10:50 AM, Anna Taraday <
>> akamyshnik...@mirantis.com> wrote:
>>
>> Hello everyone!
>>
>> I've got two changes with refactor of TypeDriver [1] and segments db [2]
>> which is needed for implementation new engine facade [3].
>>
>> Reviewers of networking-cisco, networking-arista, networking-nec
>> 
>> , networking-midonet
>> 
>> , networking-edge-vpn, networking-bagpipe, tricircle, group-based-policy -
>> pay attention for [4].
>>
>> Also recently merged refactor of ml2/db.py [5]. Fixes
>> for networking-cisco, networking-cisco, networking-cisco - are on review
>> [6]
>>
>> [1] - https://review.openstack.org/#/c/398873/
>> [2] - https://review.openstack.org/#/c/406275/
>> [3] - https://blueprints.launchpad.net/neutron/+spec/enginefacade-switch
>> [4] - https://review.openstack.org/#/q/topic:segmentsdb
>> [5] - https://review.openstack.org/#/c/404714/
>> [6] - https://review.openstack.org/#/q/status:open++branch:
>> master+topic:refactor_ml2db
>>
>>
>> Thanks a lot for looking out for the various networking-* projects when
>> working on changes like this.  It's really great to see.
>>
>> --
>> Russell Bryant
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> --
> Regards,
> Ann Taraday
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Neutron team social event in Barcelona

2016-10-19 Thread Ian Wells

+1

On 14 October 2016 at 11:30, Miguel Lavalle  wrote:

> Dear Neutrinos,
>
> I am organizing a social event for the team on Thursday 27th at 19:30.
> After doing some Google research, I am proposing Raco de la Vila, which is
> located in Poblenou: http://www.racodelavila.com/en/index.htm. The menu
> is here: http://www.racodelavila.com/en/carta-racodelavila.htm
>
> It is easy to get there by subway from the Summit venue:
> https://goo.gl/maps/HjaTEcBbDUR2. I made a reservation for 25 people
> under 'Neutron' or "Miguel Lavalle". Please confirm your attendance so we
> can get a final count.
>
> Here's some reviews: https://www.tripadvisor.com/
> Restaurant_Review-g187497-d1682057-Reviews-Raco_De_La_
> Vila-Barcelona_Catalonia.html
>
> Cheers
>
> Miguel
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron][networking-vpp]Introducing networking-vpp

2016-10-18 Thread Ian Wells

On 6 October 2016 at 10:43, Jay Pipes  wrote:

> On 10/06/2016 11:58 AM, Naveen Joy (najoy) wrote:
>
>> It’s primarliy because we have seen better stability and scalability
>> with etcd over rabbitmq.
>>
>
> Well, that's kind of comparing apples to oranges. :)
>
> One is a distributed k/v store. The other is a message queue broker.
>
> The way that we (IMHO) over-use the peer-to-peer RPC communication
> paradigm in Nova and Neutron has resulted in a number of design choices and
> awkward code in places like oslo.messaging because of the use of
> broker-based message queue systems as the underlying transport mechanism.
> It's not that RabbitMQ or AMQP isn't scalable or reliable. It's that we're
> using it in ways that don't necessarily fit well.
>
> One might argue that in using etcd and etcd watches in the way you are in
> networking-vpp, that you are essentially using those tools to create a
> simplified pub-sub messaging system and that isn't really what etcd was
> built for and you will end up running into similar fitness issues
> long-term. But, who knows? It might end up being a genius implementation. :)
>
> I'm happy to see innovation flourish here and encourage new designs and
> strategies. Let's just make sure we compare apples to apples when making
> statements about performance or reliability.
>

Sorry to waken an old thread, but I chose a perfect moment to go on
holiday...

So yes: I don't entirely trust the way we use RabbitMQ, and that's largely
because what we're doing with it - distributing state, or copies of state,
or information derived from state - leads to some fragility and odd
situations when using a tool perhaps better suited to listing off tasks.
We've tried to find a different model of working that is closer to the
behaviour we're after.  It is, I believe, similar to the Calico team's
thinking, but not derived from their code.  I have to admit at this point
that it's not been tested at scale in our use of it, and that's something
we will be doing, but I can say that this is working in a way that is in
line with how etcd is intended to be used, we have tested representative
etcd performance, and we don't expect problems.

As mentioned before, Neutron's SQL database is the source of truth - you
need to have one, and that one represents what the client asked for in its
purest form.  In the nature of keeping two datastores in sync, there is a
worker thread outside of the REST call to do the synchronisation (because
we don't want the cloud user to be waiting on our internal workings, and
because consistently committing to two databases is a recipe for disaster)
- etcd lags the Neutron DB commits very slightly, and the Neutron DB is
always right.  This allows the API to be quick while the backend will run
as efficiently as possible.

It does also mean that failures to communicate in the backend don't result
in failed API calls - the call succeeds but state updates don't happen.
This is in line with a 'desired state' model.  A user tells Neutron what
they want to do and Neutron should generally accept the request if it's
well formatted and consistent.  Exceptional error codes like 500s are
annoying to deal with, as you never know if that means 'I failed to save
that' or 'I failed to implement that' or 'I saved and implemented that, but
didn't quite get the answer to you' - having simple frontend code ensures
the answer is highly likely to be 'I will do that it in a moment', in
keeping with with the eventually consistent model OpenStack has.  The
driver will then work its magic and update object states when the work is
finally complete.

Watching changes - and the pub-sub model you end up with - is a means of
being efficient, but should we miss notifications there's a fallback
mechanism to get back into state sync with the most recent version of the
state.  In the worst case, we focus on the currently desired state, and not
the backlog of recent changes to state.

And Jay, you're right.  What we should be comparing here is how well it
works.  Is it easy to use, is it easy to maintain, is it annoyingly
fragile, and does it eat network or CPU?  I believe so (or I wouldn't have
chosen to do it this way), and I hope we've produced something simple to
understand while being easier to operate.  However, the proof of the
pudding is in the eating, so let's see how this works as we continue to
develop and test it.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [neutron][networking-vpp]Introducing networking-vpp

2016-10-05 Thread Ian Wells

We'd like to introduce the VPP mechanism driver, networking-vpp[1], to the
developer community.

networking-vpp is an ML2 mechanism driver to control DPDK-based VPP
user-space forwarders on OpenStack compute nodes.  The code does what
mechanism drivers do - it connects VMs to each other and to other
Neutron-provided network services like routers.  It also does it with care
- we aim to make sure this is a robust design that can withstand common
cloud problems and failures - and with clarity - so that it's
straightforward to see what it's chosen to do and what it's thinking.

To give some background:

VPP is an open source network forwarder originally developed by Cisco and
now part of the Linux Foundation FD.io project for fast dataplanes.  It's
very very good at moving packets around, and has demonstrated performance
up to and well beyond 10Gbps - of tiny packets: ten times the number of
packets iperf uses to fill a 10Gbps link.  This makes it especially useful
for NFV use cases.  It's a user space forwarder, which has other benefits
versus kernel packet forwarding: it can be stopped and upgraded without
rebooting the host, and (in the worst case) it can crash without bringing
down the whole system.

networking-vpp is its driver for OpenStack.  We've written about 3,000
lines of code, consisting of a mechanism driver and an agent to program VPP
through its Python API, and we use etcd to be a robust datastore and
communication channel living between the two.


The code, at the moment, is in a fairly early stage, so please play with it
and fix or report any problems you find.  It will move packets between
VLANs and flat networks and VMs, and will connect to DHCP servers, routers
and the metadata server in your cloud, so for basic uses it will work just
the way you expect.  However, we certainly don't support every feature of
Neutron just yet.  In particular, we haven't tested some things like LBaaS
and VPNaaS with it - they should work, we just haven't tried - and, most
obviously, security groups are not yet implemented - that's on the way.
However, we'd like to get it into your hands so that you can have a go with
it, see what you like and don't like about it, and help us file down the
rough edges if you feel like joining us.  Enjoy!

[1]
https://github.com/openstack/networking-vpp for all your code needs
https://review.openstack.org/#/q/status:open+project:
openstack/networking-vpp to help
https://launchpad.net/networking-vpp for bugs
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][massively distributed][architecture]Coordination between actions/WGs

2016-09-05 Thread Ian Wells

On 5 September 2016 at 17:08, Flavio Percoco  wrote:

> We should probably start by asking ourselves who's really being bitten by
> the
> messaging bus right now? Large (and please, let's not bikeshed on what a
> Large
> Cloud is) Clouds? Small Clouds? New Clouds? Everyone?
> The we can start asking ourselves things like: Would a change of the
> API/underlying technology help them? Why? How? What technology exactly and
> why?
> What technology would make their lives simpler and why?
>

Well, as far as RabbitMQ goes, then I would certainly say in deployment
it's not a pleasant thing to work with.  Even if you consider it good
enough day to day (which is debatable) then consider its upgradeability -
it's impractical to keep it running as you upgrade it, is my
understanding.  It would also seem to be a big factor in our scale
limitations - I wonder if we could do without such complexities as cells if
we had something a bit more performant (with perhaps a more lax operating
model).

But this is not about blaming Rabbit for all our problems.  The original
statement was that RPC is a bad pattern to use in occasionally unreliable
distributed systems, and Rabbit in no ways forces us to use RPC patterns.
That we don't see the RPC pattern's problems so clearly is because a fault
happening at just the right time in a call sequence to show up the problem
rarely happens, and testing such a fault using injection is not practical -
but it does happen in reality and things do go weird when it happens.

The proposal was to create a better interface in oslo for a comms model
(that we could implement - and regardless of how we chose to implement it -
and that would encourage people to code for the corner cases) and then
encourage people to move across.

I'm not saying this research/work is not useful/important (in fact, I've
> been
> advocating for it for almost 2 years now) but I do want us to be more
> careful
> and certainly I don't think this change should be anything but transparent
> for
> every deployment out there.
>

That is a perfectly reasonable thing to ask.  I presume by transparent you
mean that the standard upgrade approaches will work.

To answer this topic more directly. As much as being opinionated would help
> driving focus and providing a better result here, I believe we are not
> there yet
> and I also believe a backend agnostic API would be more benefitial to begin
> with. We're not going to move 98% of the OpenStack deployments out there
> off of
> rabbitmq just like that.
>

Again, this originally wasn't about Rabbit, or having a choice of
backends.  One backend would do if that backend were perfect for the job.
There are other reasons for doing this that would hopefully make OpenStack
more robust.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][massively distributed][architecture]Coordination between actions/WGs

2016-09-01 Thread Ian Wells

On 1 September 2016 at 06:52, Ken Giusti <kgiu...@gmail.com> wrote:

> On Wed, Aug 31, 2016 at 3:30 PM, Ian Wells <ijw.ubu...@cack.org.uk> wrote:
>
> > I have opinions about other patterns we could use, but I don't want to
push

> > my solutions here, I want to see if this is really as much of a problem
> as
> > it looks and if people concur with my summary above.  However, the right
> > approach is most definitely to create a new and more fitting set of oslo
> > interfaces for communication patterns, and then to encourage people to
> move
> > to the new ones from the old.  (Whether RabbitMQ is involved is neither
> here
> > nor there, as this is really a question of Oslo APIs, not their
> > implementation.)
> >
>
> Hmm... maybe.   Message bus technology is varied, and so is it's
> behavior.  There are brokerless, point-to-point backends supported by
> oslo.messaging [1],[2] which will exhibit different
> capabilities/behaviors from the traditional broker-based
> store-and-forward backend (e.g. message acking end-to-end vs to the
> intermediary).
>

The important thing is that you shouldn't have to look behind the curtain.
We can offer APIs that are driven by the implementation (designed for test,
and trivial to implement correctly given handy open source projects we know
and trust) and the choice of design will therefore be dependent on the
backend mechanisms we consider for use to implement the API.  APIs are
always a point of negotiation between what the caller needs and what can be
implemented in a practical amount of time.  But *I do not care* whether
you're using rabbits or carrier pigeons just so long as what you have
documented that the API promises me is actually true.  I *do not expect* to
have to read RabbitMQ ior AMQP documentation to work out what behaviour I
should expect for my messaging.  And its behaviour should be consistent if
I have a choice of messaging backends.

> All the more reason to have explicit delivery guarantees and well
> understood failure scenarios defined by the API.

And on this point we totally agree.

I think the point of an API is to subdivide who carries which
responsibilities - the caller for handling exceptional cases and the
implementer for having predictable behaviour.  Documentation is the means
of agreement.

Sorry to state basic good practice - I'm sure we do all accept that this is
good behaviour - but with a component that's this central to what we do and
so frequently used by so many people I think it's worth reiterating.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][massively distributed][architecture]Coordination between actions/WGs

2016-08-31 Thread Ian Wells

On 31 August 2016 at 10:12, Clint Byrum  wrote:

> Excerpts from Duncan Thomas's message of 2016-08-31 12:42:23 +0300:
> > On 31 August 2016 at 11:57, Bogdan Dobrelya 
> wrote:
> >
> > > I agree that RPC design pattern, as it is implemented now, is a major
> > > blocker for OpenStack in general. It requires a major redesign,
> > > including handling of corner cases, on both sides, *especially* RPC
> call
> > > clients. Or may be it just have to be abandoned to be replaced by a
> more
> > > cloud friendly pattern.
> >
> >
> > Is there a writeup anywhere on what these issues are? I've heard this
> > sentiment expressed multiple times now, but without a writeup of the
> issues
> > and the design goals of the replacement, we're unlikely to make progress
> on
> > a replacement - even if somebody takes the heroic approach and writes a
> > full replacement themselves, the odds of getting community by-in are very
> > low.
>
> Right, this is exactly the sort of thing I'd like to gather a group of
> design-minded folks around in an Architecture WG. Oslo is busy with the
> implementations we have now, but I'm sure many oslo contributors would
> like to come up for air and talk about the design issues, and come up
> with a current design, and some revisions to it, or a whole new one,
> that can be used to put these summit hallway rumors to rest.
>

I'd say the issue is comparatively easy to describe.  In a call sequence:

1. A sends a message to B
2. B receives messages
3. B acts upon message
4. B responds to message
5. A receives response
6. A acts upon response

... you can have a fault at any point in that message flow (consider
crashes or program restarts).  If you ask for something to happen, you wait
for a reply, and you don't get one, what does it mean?  The operation may
have happened, with or without success, or it may not have gotten to the
far end.  If you send the message, does that mean you'd like it to cause an
action tomorrow?  A year from now?  Or perhaps you'd like it to just not
happen?  Do you understand what Oslo promises you here, and do you think
every person who ever wrote an RPC call in the whole OpenStack solution
also understood it?

I have opinions about other patterns we could use, but I don't want to push
my solutions here, I want to see if this is really as much of a problem as
it looks and if people concur with my summary above.  However, the right
approach is most definitely to create a new and more fitting set of oslo
interfaces for communication patterns, and then to encourage people to move
to the new ones from the old.  (Whether RabbitMQ is involved is neither
here nor there, as this is really a question of Oslo APIs, not their
implementation.)
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][massively distributed][architecture]Gluon; was Coordination between actions/WGs

2016-08-29 Thread Ian Wells

On 29 August 2016 at 03:48, Jay Pipes  wrote:

> On 08/27/2016 11:16 AM, HU, BIN wrote:
>
>> So telco use cases is not only the innovation built on top of OpenStack.
>> Instead, telco use cases, e.g. Gluon (NFV networking), vCPE Cloud, Mobile
>> Cloud, Mobile Edge Cloud, brings the needed requirement for innovation in
>> OpenStack itself. If OpenStack don't address those basic requirements,
>>
>
> That's the thing, Bin, those are *not* "basic" requirements. The Telco
> vCPE and Mobile "Edge cloud" (hint: not a cloud) use cases are asking for
> fundamental architectural and design changes to the foundational components
> of OpenStack. Instead of Nova being designed to manage a bunch of hardware
> in a relatively close location (i.e. a datacenter or multiple datacenters),
> vCPE is asking for Nova to transform itself into a micro-agent that can be
> run on an Apple Watch and do things in resource-constrained environments
> that it was never built to do.
>

This conversation started above, but in a war of analogies became this:

> And, honestly, I have no idea what Gluon is trying to do. Ian sent me some
> information a while ago on it. I read it. I still have no idea what Gluon
> is trying to accomplish other than essentially bypassing Neutron entirely.
> That's not "innovation". That's subterfuge.

Gluon, as written, does allow you to bypass Neutron, but as I'm sure you
understand, I did have more useful features on my mind than 'subterfuge'.
Let me lay this out in the clear again, since I've had this conversation
with you and others and I'm not getting my point across.  And in keeping
with the discussion I'll start with an analogy.

When we started out, Nova was the compute component for OpenStack.  I'd
like to remind you of the problems we had with the docker driver, because
docker containers are like, but also unlike, virtual machines.  They're
compute containers, they support networking, but their storage requirements
are weird; they tend to use technologies unlike conventional disk images,
they're not exactly capable of using block devices without help, and so
on.  You can change Nova to support that, or you can say 'these are
sufficiently different that we should have a separate API'.  I see we have
Zun now, so someone's trying that approach.  They're 'bypassing' Nova.

Here, we're talking about different compute components that have
similarities and differences to virtual machines.  If they're similar
enough, then building into Nova is logical - it will require some change to
the APIs (let's forget the internal code for a moment) but not drastic
ones; ones that are backward compatible.  If they're different enough you
sit something new alongside Nova.

Neutron is the networking component for OpenStack the same way that Nova is
compute.  It brings together the sorts of thing you would want to run cloud
applications on a public cloud, and as it happens these concepts also work
reasonably nicely for many other cloud use cases.  But - today - it is not
'all networking everywhere', it's 'networking with a specific focus on L2
domains' - because this solves the majority of its users' problems.  (We
can quibble about whether a 'network' in Neutron must be L2, because it's
not exactly documented as such, but I would like to point out the plugin
that most people use today to implement networks is called 'ML2' and the
only way to attach a port to anything is to attach it to a network with
location-independent subnets.  Suffice it to say that the consumer of the
API can treat it like an L2 network.)

There comes a question, then.  If it is to be the only networking project
in OpenStack, for it to be 'all networking everywhere', then we need to
address the problem that its current API does not suit every form of
networking in existence.  We need to do this without affecting every single
person who uses OpenStack as it is and doesn't want or need every new bit
of functionality.  For that we have extensions within Neutron, but they're
still constrained to operate within Neutron's existing API structure.  The
most complex ones tend to work on the principle of 'networks work as you
expect until the extension steps in, then they become a bit weird and
special'.  This isn't the way to write a system with widely understood and
easy-to-use APIs.  Really you're just tolerating the history of Neutron
because you don't have a choice.  It also makes something which turns a bit
monolithic and complex in practice (e.g. forwarding elements being
programmed by multiple bits of independent code).

Some of the APIs we were experimenting with were things that already
existed as Neutron extensions, such as MPLS/BGP overlays.  Some that we'd
like to try in the future include things like point-to-point connectivity,
or comprehensively routed networks.  But as much as anything the point is
that we know that networking changes over time and people have new ideas of
how to use what exists, so we're trying to make

Re: [openstack-dev] Neutron and MTU advertisements -- post newton

2016-07-11 Thread Ian Wells

On 11 July 2016 at 12:52, Sam Yaple  wrote:

> After lots of fun on IRC I have given up this battle. I am giving up
> quickly because frickler has purposed a workaround (or better solution
> depending on who you ask). So for all of you keeping track at home, if you
> want your vxlan and your vlan networks to have the same MTU, here are the
> relevant options to set as of Mitaka.
>

> [DEFAULT]
> global_physnet_mtu = 1550
> [ml2]
> path_mtu = 1550
> physical_network_mtus = physnet1:1500
>
> This should go without saying, but i'll say it anyway: Your underlying
> network interface must be at least 1550 MTU for the above config to result
> in all instances receiving 1500 mtu regardless of network type. If you want
> some extra IRC reading, there was a more extensive conversation about this
> [1]. Good luck, you'll need it.
>
> [1]
> http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/%23openstack-neutron.2016-07-11.log.html#t2016-07-11T13:39:45
>

To be fair, what was said on there doesn't apply to this fix (which I
hadn't seen - I think it was in the channel before I joined).

You were suggesting that you should still be able to disable MTU
advertisement - which, as I pointed out, wouldn't do what you wanted, as it
would leave the ports used for DHCP, routing and metatdata servers all with
different IPs to your instances in some cases, and likely this would break
your system in exciting and unpredictable ways.

The fix proposed does *not* disable MTU advertisement, however.  It will
make Neutron calculate a 1500 MTU for VLAN and VXLAN networks, both - which
it will advertise, but that's effectively a no-op and I imagine you don't
really care.  The more important thing, though, is that Neutron now
understands that the MTUs are 1500 and it will apply that properly to its
service ports.  The solution above makes good sense and is much better than
disabling advertisement - go with it.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Neutron and MTU advertisements -- post newton

2016-07-11 Thread Ian Wells

On 11 July 2016 at 11:49, Sean M. Collins  wrote:

> Sam Yaple wrote:
> > In this situation, since you are mapping real-ips and the real world runs
> > on 1500 mtu
>
> Don't be so certain about that assumption. The Internet is a very big
> and diverse place

OK, I'll contradict myself now - the original question wasn't L2 transit.
Never mind.

That 'inter' bit is actually rather important.  MTU applies to a layer 2
domain, and routing is designed such that the MTUs on the two ports of a
router are irrelevant to each other.  What the world does has no bearing on
the MTU I want on my L2 domain, and so Sam's point - 'I must choose the MTU
other people use' - is simply invalid.  You might reasonably want your
Neutron router to have an external MTU of 1500, though, to do what he asks
in the face of some thoughtful soul filtering out PMTU exceeded messages.
I still think it comes back to the same thing as I suggested in my other
mail.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Neutron and MTU advertisements -- post newton

2016-07-11 Thread Ian Wells

On 11 July 2016 at 11:12, Chris Friesen  wrote:

> On 07/11/2016 10:39 AM, Jay Pipes wrote:
>
> Out of curiosity, in what scenarios is it better to limit the instance's
>> MTU to
>> a value lower than that of the maximum path MTU of the infrastructure? In
>> other
>> words, if the infrastructure supports jumbo frames, why artificially
>> limit an
>> instance's MTU to 1500 if that instance could theoretically communicate
>> with
>> other instances in the same infrastructure via a higher MTU?
>>
>
> It is my understanding that using the max path MTU is ideal, but that not
> all software does path MTU discovery.  Also, some badly-designed security
> devices can mess up PMTUD.

Ignoring PMTUD cases, which are routed (the original question was L2
transit), there are several use cases for specific networks having MTUs
other than 9000.  Maybe you're talking to a device on a provider network
that has a 1500 MTU not under your control, for instance.  It's also
reasonable to have a cloud with a high internal MTU and a low external
network MTU - maybe you have control over your own domain but not the whole
network in which you're situated.

That is *not*, in fact, the same as having no MTU advertisement (but it
seems to address the use case originally mentioned; if you could be
selective about the MTU you used and the advertisements were corrected
accordingly you could simply choose 1500).  There are also ports that need
an MTU - router ports, DHCP ports - that are not receiving it via MTU
advertisement, so  turning advertisement off and letting nature take its
course doesn't work for everything.

The original MTU spec [1] - which never got fully implemented - detailed
the intent, which was that OpenStack would choose a sensible default MTU
value and advertise it, and that you could override that value for your own
purposes.  I still think we need to implement the missing bits.
-- 
Ian.

[1]
https://specs.openstack.org/openstack/neutron-specs/specs/kilo/mtu-selection-and-advertisement.html
- the bit that wasn't completed was 'the tenant can request a specific MTU
on a network' - which, incidentally, is not 'the network will not pass
packets bigger than X', but simply 'OpenStack and the tenant will agree
that X is the MTU that ports shall use on that network; packets that size
or less are guaranteed to pass over the L2 domain unmolested, and where an
MTU is set on a port or advertised to it that will be the one'.  If you
want a lecture on cloud MTUs I can give one.  But you really, really don't.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron][kilo] - vxlan's max bandwidth

2016-04-19 Thread Ian Wells

On 18 April 2016 at 04:33, Ihar Hrachyshka  wrote:

> Akihiro Motoki  wrote:
>
> 2016-04-18 15:58 GMT+09:00 Ihar Hrachyshka :
>>
>>> Sławek Kapłoński  wrote:
>>>
>>> Hello,

 What MTU have You got configured on VMs? I had issue with performance on
 vxlan network with standard MTU (1500) but when I configured Jumbo
 frames on vms and on hosts then it was much better.

>>>
>>>
>>> Right. Note that custom MTU works out of the box only starting from
>>> Mitaka.
>>
>>
It's been in from at least Kilo (give or take a some bugfixes, it seems,
all of which deserve backporting).

You can find details on how to configure Neutron for Jumbo frames in the
>>> official docs:
>>>
>>> http://docs.openstack.org/mitaka/networking-guide/adv-config-mtu.html
>>>
>>
>> If you want to advertise MTU using DHCP in releases before Mitaka,
>> you can prepare your custom dnsmasq config file like below and
>> set it to dhcp-agent dnsmasq_config_file config option.
>> You also need to set network_device_mtu config parameters appropriately.
>>
>> sample dnsmasq config file:
>> --
>> dhcp-option-force=26,8950
>> --
>> dhcp option 26 specifies MTU.
>>
>
> Several notes:
>
> - In Liberty, above can be achieved by setting advertise_mtu in
> neutron.conf on nodes hosting DHCP agents.
> - You should set [ml2] segment_mtu on controller nodes to MTU value for
> underlying physical networks. After that, DHCP agents will advertise
> correct MTU for all new networks created after the configuration applied.
> - It won’t work in OVS hybrid setup, where intermediate devices (qbr) will
> still have mtu = 1500, that will result in Jumbo frames dropped. We have
> backports to fix it in Liberty at: https://review.openstack.org/305782
> and https://review.openstack.org/#/c/285710/
>

Indeed, you can actively request the MTU per virtual network as you create
them, subject to segment_mtu and path_mtu indicating they're achievable.

In this instance, configure your switches with a 9000 MTU and set
segment_mtu = path_mtu = 9000.  The virtual network MTU will then default
to 8950 for a VXLAN network (the biggest possible packet inside VXLAN in
that circumstance) and you can choose to set it to anything else below that
number as you net-create.  The MTU should be correctly advertised by DHCP
when set.

I hope you don't find you have to do what Akihiro suggests.  That was good
advice about three releases back but nowadays it actually breaks the code
that's there to deal with MTUs properly.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron] New BP for live migration with direct pci passthru

2016-02-16 Thread Ian Wells

In general, while you've applied this to networking (and it's not the first
time I've seen this proposal), the same technique will work with any device
- PF or VF, networking or other:

- notify the VM via an accepted channel that a device is going to be
temporarily removed
- remove the device
- migrate the VM
- notify the VM that the device is going to be returned
- reattach the device

Note that, in the above, I've not used said 'PF', 'VF', 'NIC' or 'qemu'.

You would need to document what assumptions the guest is going to make (the
reason I mention this is I think it's safe to assume the device has been
recently reset here, but for a network device you might want to consider
whether the device will have the same MAC address or number of tx and rx
buffers, for instance).

The method of notification I've deliberately skipped here; you have an
answer for qemu, qemu is not the only hypervisor in the world so this will
clearly be variable.  A metadata server mechanism is another possibility.

Half of what you've described is one model of how the VM might choose to
deal with that (and a suggestion that's come up before, in fact) - that's a
model we would absolutely want Openstack to support (and I think the above
is sufficient to support it), but we can't easily mandate how VMs behave,
so from the Openstack perspective it's more a recommendation than anything
we can code up.


On 15 February 2016 at 23:25, Xie, Xianshan  wrote:

> Hi, Fawad,
>
>
>
>
>
> > Can you please share the link?
>
>
> https://blueprints.launchpad.net/nova/+spec/direct-pci-passthrough-live-migration
>
>
>
> Thanks in advance.
>
>
>
>
>
> Best regards,
>
> xiexs
>
>
>
> *From:* Fawad Khaliq [mailto:fa...@plumgrid.com]
> *Sent:* Tuesday, February 16, 2016 1:19 PM
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [nova][neutron] New BP for live migration
> with direct pci passthru
>
>
>
> On Mon, Feb 1, 2016 at 3:25 PM, Xie, Xianshan 
> wrote:
>
> Hi, all,
>   I have registered a new BP about the live migration with a direct pci
> passthru device.
>   Could you please help me to review it? Thanks in advance.
>
>
>
> Can you please share the link?
>
>
>
>
> The following is the details:
>
> --
> SR-IOV has been supported for a long while, in the community's point of
> view,
> the pci passthru with Macvtap can be live migrated possibly, but the
> direct pci passthru
> seems hard to implement the migration as the passthru VF is totally
> controlled by
> the VMs so that some internal states may be unknown by the hypervisor.
>
> But we think the direct pci passthru model can also be live migrated with
> the
> following combination of a series of technology/operation based on the
> enhanced
> Qemu-Geust-Agent(QGA) which has already been supported by nova.
>1)Bond the direct pci passthru NIC with a virtual NIC.
>  This will keep the network connectivity during the live migration.
>2)Unenslave the direct pci passthru NIC
>3)Hot-unplug the direct pci passthru NIC
>4)Live-migrate guest with the virtual NIC
>5)Hot-plug the direct pci passthru NIC on the target host
>6)Enslave the direct pci passthru NIC
>
> And more inforation about this concept can refer to [1].
> [1]https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
>
> --
>
> Best regards,
> Xiexs
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Announcing Ekko -- Scalable block-based backup for OpenStack

2016-01-28 Thread Ian Wells

On 27 January 2016 at 11:06, Flavio Percoco  wrote:

> FWIW, the current governance model does not prevent competition. That's
> not to
> be understood as we encourage it but rather than there could be services
> with
> some level of overlap that are still worth being separate.
>

There should always be the possibility to compete; it's always possible
that rethinking an idea produces a better implementation of the same API.
But we don't separate API from implementation in Openstack - the 'XXX API'
cannot easily be divorced from the project containing the implementation
when the definition of the 'XXX API' is 'the API implemented by the XXX
code'.  We should separate them - the API is the only thing that a tenant
will ever actually care about, not the implementation choice behind it.

What Jay is referring to is that regardless the projects do similar things,
> the
> same or totally different things, we should strive to have different APIs.
> The
> API shouldn't overlap in terms of endpoints and the way they are exposed.
>

And for this, perhaps we should have an API namespace registry, so that
when two groups implement the same endpoints they have to implement the
same API?  I think Jay's point is that we muddy the waters here by having
confusingly similar-but-different APIs.

[The counterargument is that service discovery usually determines what API
you're getting, so that if two services claim to be different service types
in Keystone then they are *not* the same API and should be allowed free
reign of their URI namespace, but I see that's not working for us.]

And, coming back to the original point, if Freezer and Ekko both implement
backups, and they can come to an agreement on what 'a backup' is and an API
definition for it, that means that they could exist as independent projects
with independent codebases that both implement /backup - but, importantly,
in a consistent way that doesn't confuse app developers.  That will only
work if the API definition stands separate from the projects, though.

-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] MTU configuration pain

2016-01-26 Thread Ian Wells

As I recall, network_device_mtu sets up the MTU on a bunch of structures
independently of whatever the correct value is.  It was a bit of a
workaround back in the day and is still a bit of a workaround now.  I'd
sooner we actually fix up the new mechanism (which is kind of hard to do
when the closest I have to information is 'it probably doesn't work').

On 26 January 2016 at 09:59, Sean M. Collins  wrote:

> On Mon, Jan 25, 2016 at 08:16:03PM EST, Fox, Kevin M wrote:
> > Another place to look...
> > I've had to use network_device_mtu=9000 in nova's config as well to get
> mtu's working smoothly.
> >
>
> I'll have to read the code on the Nova side and familiarize myself, but
> this sounds like a case of DRY that needs to be done. We should just set
> it once *somewhere* and then communicate it to related OpenStack
> components.
> --
> Sean M. Collins
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] MTU configuration pain

2016-01-25 Thread Ian Wells

On 25 January 2016 at 07:06, Matt Kassawara  wrote:
> Overthinking and corner cases led to the existing implementation which
doesn't solve the MTU problem and arguably makes the situation worse
because options in the configuration files give operators the impression
they can control it.

We are giving the impression we solved the problem because we tried to
comprehensively solve the problem (documentation aside, apparently).  It's
complex when you want to do complex things, but the right answer for basic
end users is adding these two lines to neutron.conf, which I don't think is
asking too much:

path_mtu = 1500 # for VXLAN and GRE; MTU is 1450 on ports on VXLAN networks
segment_mtu = 1500 # for VLAN; MTU is 1500 on ports on VLAN networks

(while leaving the floor open for the other 1% of cases, where the options
cover pretty much everything you'd want to do).

So.  I don't know what path_mtu and segment_mtu settings you used that
disappointed you; could you recap?  Can you tell me whether the two options
above help?

> For example, the segment_mtu does nothing in the in-tree drivers, the
network_device_mtu option only impacts parts of some in-tree drivers, and
path_mtu only provides a way to change the MTU for VMs for all in-tree
drivers.

I was reading what documentation I could find (I may have written the spec,
but I didn't write the code, so I have to check the docs like everyone
else) and it says it should work - so anything else is a bug, which we
should go out and fix.  What test cases did you try?

network_device_mtu is an old hack, this much I know, and path_mtu and
segment_mtu are intended to be the correct modern way of doing things.

path_mtu should not apply to all in tree drivers, specifically it should
only apply to L3 overlays (as segment_mtu should only apply to VLANs) (and
by the wording of your statement I have to ask - are you seeing VM MTU =
path MTU, because you shouldn't be).

I see there are plausible looking unit tests for segment_mtu, so if it's
not working then in what specific configuration is it not working?

>
> I ran my experiments without any of these options to provide a clean
slate for empirically analyzing the problem and finding a solution for the
majority of operators.

I'm afraid you've not been clear about what setups you've tested where
path_mtu and segment_mtu *are* set - you dismissed them so I presume you
tried.  When you say they don't do what you want, what do they do wrong?

>
>
> Matt
>
> On Mon, Jan 25, 2016 at 6:31 AM, Sean M. Collins 
wrote:
>>
>> On Mon, Jan 25, 2016 at 01:37:55AM EST, Kevin Benton wrote:
>> > At a minimum I think we should pick a default in devstack and dump a
>> > warning in neutron if operators don't specify it.
>>
>> Here's the DevStack change that implements this.
>>
>> https://review.openstack.org/#/c/267604/
>>
>> Again this just fixes it for DevStack. Deployers still need to set the
>> MTUs by hand in their deployment tool of choice. I would hope that we
>> can still move forward with some sort of automatic discovery - and also
>> figure out a way to take it from 3 different config knobs down to like
>> one master knob, for the sake of sanity.
>>
>> --
>> Sean M. Collins
>>
>>
__
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] MTU configuration pain

2016-01-24 Thread Ian Wells

On 23 January 2016 at 11:27, Adam Lawson  wrote:

> For the sake of over-simplification, is there ever a reason to NOT enable
> jumbo frames in a cloud/SDN context where most of the traffic is between
> virtual elements that all support it? I understand that some switches do
> not support it and traffic form the web doesn't support it either but
> besides that, seems like a default "jumboframes = 1" concept would work
> just fine to me.
>

Offhand:

1. you don't want the latency increase that comes with 9000 byte packets,
even if it's tiny (bearing in mind that in a link shared between tenants it
affects everyone when one packet holds the line for 6 times longer)
2. not every switch in the world is going to (a) be configurable or (b)
pass 9000 byte packets
3. not every VM has a configurable MTU that you can set on boot, or
supports jumbo frames, and someone somewhere will try and run one of those
VMs
4. when you're using provider networks, not every device attached to the
cloud has a 9000 MTU (and this one's interesting, in fact, because it
points to the other element the MTU spec was addressing, that *not all
networks, even in Neutron, will have the same MTU*).
5. similarly, if you have an external network in Openstack, and you're
using VXLAN, the MTU of the external network is almost certainly 50 bytes
bigger than that of the inside of the VXLAN overlays, so no one number can
ever be right for every network in Neutron.

Also, I say 9000, but why is 9000 even the right number?  We need a
number... and 'jumbo' is not a number.  I know devices that will let you
transmit 9200 byte packets.  Conversely, if the native L2 is 9000 bytes,
then the MTU in a Neutron virtual network is less than 9000 - so what MTU
do you want to offer your applications?  If your apps don't care, why not
tell them what MTU they're getting (e.g. 1450) and be done with it?
(Memory says that the old problem with that was that github had problems
with PMTUD in that circumstance, but I don't know if that's still true, and
even if it is it's not technically our problem.)

Per the spec, I would like to see us do the remaining fixes to make that
work as intended - largely 'tell the VMs what they're getting' - and then,
as others have said, lay out simple options for deployments, be they jumbo
frame or otherwise.

If you're seeing MTU related problems at this point, can you file bugs on
them and/or report back the bugs here, so that we can see what we're
actually facing?
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] MTU configuration pain

2016-01-24 Thread Ian Wells

I wrote the spec for the MTU work that's in the Neutron API today.  It
haunts my nightmares.  I learned so many nasty corner cases for MTU, and
you're treading that same dark path.

I'd first like to point out a few things that change the implications of
what you're reporting in strange ways. [1] points out even more strange
ways, but these are the notable ones from what I've been reading here...

RFC7348: "VTEPs MUST NOT fragment VXLAN packets. ... The destination VTEP
MAY silently discard such VXLAN fragments."  The VXLAN VTEP implementations
we use today may fragment, but it's not according to the RFC, and I
wouldn't rely that every implementation you come across knows to do it.
So, the largest L2 packet you can send over VXLAN is a function of path MTU.

Even if VXLAN is fragmenting, you actually actively want to avoid it
fragmenting, because - in the typical case of bulk TCP transfers using
max-MTU packets - you're *invisibly* fragmenting the packets into two and
adding about 80 bytes of overhead in the process and then reassembling them
at the far end.  You've just expicitly guaranteed that, just as you send
the most data, your connection will slow down. And the MTU problem will be
undetectable to the VMs (which can't find out that a VXLAN encapped packet
has been fragmented; the packet *they* sent didn't fragment, but the one
it's carried in did, not to mention the fragmentation didn't even happen at
an L3 node in the virtual network so DF and therefore PMTUD wouldn't work).

Path MTU is not fixed, because your path can vary according to network
weather (failures, congestion, whatever).  It's an oddity, and perhaps a
rarity, but you can get many weirdnesses: you fail over from one link to a
link with a smaller MTU and the path MTU shrinks; some switches are jumbo
frame and some aren't, so the path MTU might vary from host to host; and so
on.  Granted, these are weird cases, but the point here is that Openstack
cannot *discover* this number.  An installer might attempt something,
knowing how to read switch config; or it might attempt to validate a number
it's been given, as best it can; but even then it's best effort, it's not a
guarantee.  For all these reasons, the only way to really get the minimum
path MTU is from the operator themselves, which is why this is a
configuration parameter to Neutron (path_mtu).

The aim of the changes in the spec [1] were threefold:

1. To ensure that an app that absolutely required a certain minimum MTU to
operate could guarantee it would receive it
2. To allow the network to say what the MTU was, so that the VM could be
programmed accordingly
3. To ensure that the MTU for the network would - by default - settle on
the optimal value, per all the stuff above.

So what could we do in this environment to improve matters?

1. We should advertise MTU in the RA and DHCP messages that Openstack
sends.  I thought we'd already done this work, but this thread suggests not.

[Note, though, that you can't reliably set an MTU higher than 1500 on IPv6
using an RA, thanks to RFC4861 referencing RFC2464 which goes with the
standard, but not the practice, that the biggest ethernet packet is 1500
bytes.  You've been violating the standard all these years, you bad
people.  Unfortunately, Linux enforces this RA rule, albeit slightly
strangely.]

2. We should also put the MTU in any config-drive settings for VMs that
don't respect such things in DHCP and RAs, or don't do DHCP.  This is
Nova-side, reacting to the MTU property of the network.

3. Installers should determine the appropriate MTU settings on interfaces
and ensure they're set.  Openstack can't do this in some cases (VXLAN - no
interfaces) - and probably shouldn't in others (VLAN - the interface MTU is
input to the MTU selection algorithm above, and the installer should set
the interface MTU to match what the operator says the fabric MTU is).

4. We need to check the Neutron network drivers to see which ones are
accepting, but not properly respecting, the MTU setting on the network.  I
suspect we're short of testing to make sure that veths, bridges, switches
and so on are all correctly configured.

-- 
Ian.

[1] https://review.openstack.org/#/c/105989/ and
https://github.com/openstack/neutron-specs/blob/master/specs/kilo/mtu-selection-and-advertisement.rst

On 22 January 2016 at 19:13, Matt Kassawara  wrote:

> The fun continues, now using an OpenStack deployment on physical hardware
> that supports jumbo frames with 9000 MTU and IPv4/IPv6. This experiment
> still uses Linux bridge for consistency. I'm planning to run similar
> experiments with Open vSwitch and Open Virtual Network (OVN) in the next
> week.
>
> I highly recommend reading further, but here's the TL;DR: Using physical
> network interfaces with MTUs larger than 1500 reveals an additional problem
> with veth pair for the neutron router interface on the public network.
> Additionally, IP protocol version does not impact MTU calculation for
>

Re: [openstack-dev] [neutron][networking-calico] To be or not to be an ML2 mechanism driver?

2016-01-24 Thread Ian Wells

On 22 January 2016 at 10:35, Neil Jerram  wrote:

> * Why change from ML2 to core plugin?
>
> - It could be seen as resolving a conceptual mismatch.
> networking-calico uses
>   IP routing to provide L3 connectivity between VMs, whereas ML2 is
> ostensibly
>   all about layer 2 mechanisms.

You've heard my view on this before, but to reiterate: Neutron *itself* is
all about layer 2 mechanisms (at least at the level of what a 'network'
is).  A Neutron plugin implements the Neutron API, so if you choose to use
a plugin you will still have one network with one subnet and ports that
should receive addresses on creation, which constrains what you can do.  As
such, I'm not sure what constraints you're escaping.

What I think might be interesting to you is that you would no longer expect
to work with the ML2 DHCP system (which I guess probably doesn't do what
you need) or the router system.  Part of what ML2 provides is that you need
*only* implement the L2 bit of what a core plugin does and can reuse the
rest, and that was frequently because people were doing exactly that in
less elegant ways with the plugins they wrote prior to it.

  Let's look at Types first.  ML2 supports multiple provider network types,
>   with the Type for each network being specified explicitly by the
> provider API
>   extension (provider:network_type), or else defaulting to the
>   'external_network_type' ML2 config setting.  However, would a cloud
> operator
>   ever actually use more than one provider Type?

Up front: there's really no distinction between provider and tenant
networks, when it comes to it.  Really tenant networks are just provider
networks where Neutron has chosen the type and segment.  The resulting
network is indistinguishable once created.

It's possible, and sometimes useful, to mix VLAN and VXLAN types.  You can
use VLANs for your provider networks over physical segments that also
communicate with external devices, and VXLAN for cloud-local tenant
networking.  This means your tenant networks scale and your interface to
the world is straightforward.

I don't think I've ever set up Neutron deliberately with multiple
tenant-only network types.  However, if the properties of a Calico network
were sufficiently different to other types, then I might in some
circumstances choose to use a Calico network or another network for a
specific use.  That's possible in ML2 with the provider system, but not
really end-user consumable for non-admins (I can't think of a policy that
would really do the trick).  You'd really need some means of choosing a
network on properties, of which probably the only candidate today is VLAN
transparency.

ML2 also supports multiple mechanism drivers.  When a new Port is being
> created, ML2 calls each mechanism driver to give it the chance to do
> binding
>   and connectivity setup for that Port.  In principle, if mechanism
> drivers are
>   present, I guess each one is supposed to look at some of the available
> Port
>   data - and perhaps the network Type - and thereby infer whether it
> should be
>   responsible for that Port, and so do the setup for it.  But I wonder if
>   anyone runs a cloud where that really happens?  If so, have I got it
> right?
>

This *does* happen, and 'responsible' is the wrong phrase.  No one
mechanism driver is 'responsible', but only one 'binds' the port to a
segment (normally, OVS, LB or SRIOV in the open source drivers).  Other
drivers might not actually do the final binding, but they support it by,
for instance, reconfiguring switches (the Cisco Nexus switch driver being
an example).  Other mechanism drivers may not be interested in that type of
network and will be skipped over.  This is of benefit to what exists but
probably not terribly useful for Calico.

> All in all, if hybrid ML2 networking is a really used thing, I'd like to
> make
> sure I fully understand it, and would tend to prefer networking-calico
> remaining as an ML2 mechanism driver.  (Which means I also need to discuss
> further about conceptually extending 'ML2' to L3-only implementations, and
> raise another point about what happens when the service_plugin that you
> need
> for some extension - say a floating IP - depends on which mechanism
> driver was
> used to set up the relevant Port...)

This would be the argument I was making at the summit for Gluon - if you
have strayed from the Neutron datamodel of what a network is (or even if a
network is needed; with L3, 'VRF' would be a better term, given its
behaviour is quite different), there comes a point that you're not actually
implementing Neutron at all and you should probably set all of it aside
rather than trying to adapt it to do two very different tasks.  Come talk
to me if you want to experiment - I've got the code up on github but the
instructions are a little convoluted at the moment.
-- 
Ian.
__
OpenStack Development Mailing List

Re: [openstack-dev] [Neutron] MTU configuration pain

2016-01-24 Thread Ian Wells

On 24 January 2016 at 20:18, Kevin Benton <blak...@gmail.com> wrote:

> I believe the issue is that the default is unspecified, which leads to
> nothing being advertised to VMs via dhcp/ra. So VMs end up using 1500,
> which leads to a catastrophe when running on an overlay on a 1500 underlay.
>
That's not quite the point I was making here, but to answer that: looks to
me like (for the LB or OVS drivers to appropriately set the network MTU for
the virtual network, at which point it will be advertised because
advertise_mtu defaults to True in the code) you *must* set one or more of
path_mtu (for L3 overlays), segment_mtu (for L2 overlays) or physnet_mtu
(for L2 overlays with differing MTUs on different physical networks).
That's a statement of faith - I suspect if we try it we'll find a few
niggling problems - but I can find the code, at least.

The reason for that was in the other half of the thread - it's not possible
to magically discover these things from within Openstack's own code because
the relevant settings span more than just one server.  They have to line up
with both your MTU settings for the interfaces in use, and the MTU settings
for the other equipment within and neighbouring the cloud - switches,
routers, nexthops.  So they have to be provided by the operator - then
everything you want should kick in.

If all of that is true, it really is just a documentation problem - we have
the idea in place, we're just not telling people how to make use of it.  We
can also include a checklist or a check script with that documentation -
you might not be able to deduce the MTU values, but you can certainly run
some checks to see if the values you have been given are obviously wrong.

In the meantime, Matt K, you said you hadn't set path_mtu in your tests,
but [1] says you have to ([1] is far from end-user consumable
documentation, which again illustrates our problem).

Can you set both path_mtu and segment_mtu to whatever value your switch MTU
is (1500 or 9000), confirm your outbound interface MTU is the same (1500 or
9000), and see if that changes things?  At this point, you should find that
your networks get appropriate 1500/9000 MTUs on VLAN based networks and
1450/8950 MTUs on VXLAN networks, that they're advertised to your VMs via
DHCP and RA, and that your routers even know that different interfaces have
different MTUs in a mixed environment, at least if everything is working as
intended.
-- 
Ian.

[1]
https://github.com/openstack/neutron/blob/544ff57bcac00720f54a75eb34916218cb248213/releasenotes/notes/advertise_mtu_by_default-d8b0b056a74517b8.yaml#L5


> On Jan 24, 2016 20:48, "Ian Wells" <ijw.ubu...@cack.org.uk> wrote:
>
>> On 23 January 2016 at 11:27, Adam Lawson <alaw...@aqorn.com> wrote:
>>
>>> For the sake of over-simplification, is there ever a reason to NOT
>>> enable jumbo frames in a cloud/SDN context where most of the traffic is
>>> between virtual elements that all support it? I understand that some
>>> switches do not support it and traffic form the web doesn't support it
>>> either but besides that, seems like a default "jumboframes = 1" concept
>>> would work just fine to me.
>>>
>>
>> Offhand:
>>
>> 1. you don't want the latency increase that comes with 9000 byte packets,
>> even if it's tiny (bearing in mind that in a link shared between tenants it
>> affects everyone when one packet holds the line for 6 times longer)
>> 2. not every switch in the world is going to (a) be configurable or (b)
>> pass 9000 byte packets
>> 3. not every VM has a configurable MTU that you can set on boot, or
>> supports jumbo frames, and someone somewhere will try and run one of those
>> VMs
>> 4. when you're using provider networks, not every device attached to the
>> cloud has a 9000 MTU (and this one's interesting, in fact, because it
>> points to the other element the MTU spec was addressing, that *not all
>> networks, even in Neutron, will have the same MTU*).
>> 5. similarly, if you have an external network in Openstack, and you're
>> using VXLAN, the MTU of the external network is almost certainly 50 bytes
>> bigger than that of the inside of the VXLAN overlays, so no one number can
>> ever be right for every network in Neutron.
>>
>> Also, I say 9000, but why is 9000 even the right number?  We need a
>> number... and 'jumbo' is not a number.  I know devices that will let you
>> transmit 9200 byte packets.  Conversely, if the native L2 is 9000 bytes,
>> then the MTU in a Neutron virtual network is less than 9000 - so what MTU
>> do you want to offer your applications?  If your apps don't care, why not
>> tell them what MTU they're getting (e.g. 1450) and be done with it?
>> (Memory says that the old problem

Re: [openstack-dev] [Neutron] MTU configuration pain

2016-01-24 Thread Ian Wells

I like both of those ideas.

On 24 January 2016 at 22:37, Kevin Benton <blak...@gmail.com> wrote:

> At a minimum I think we should pick a default in devstack and dump a
> warning in neutron if operators don't specify it.
>
> I would still be preferable to changing the default even though it's a
> behavior change considering the current behavior is annoying. :)
> On Jan 24, 2016 23:31, "Ian Wells" <ijw.ubu...@cack.org.uk> wrote:
>
>> On 24 January 2016 at 22:12, Kevin Benton <blak...@gmail.com> wrote:
>>
>>> >The reason for that was in the other half of the thread - it's not
>>> possible to magically discover these things from within Openstack's own
>>> code because the relevant settings span more than just one server
>>>
>>> IMO it's better to have a default of 1500 rather than let VMs
>>> automatically default to 1500 because at least we will deduct the encap
>>> header length when necessary in the dhcp/ra advertised value so overlays
>>> work on standard 1500 MTU networks.
>>>
>>> In other words, our current empty default is realistically a terrible
>>> default of 1500 that doesn't account for network segmentation overhead.
>>>
>> It's pretty clear that, while the current setup is precisely the old
>> behaviour (backward compatibility, y'know?), it's not very useful.  Problem
>> is, anyone using the 1550+hacks and other methods of today will find their
>> system changes behaviour if we started setting that specific default.
>>
>> Regardless, we need to take that documentation and update it.  It was a
>> nasty hack back in the day and not remotely a good idea now.
>>
>>
>>
>>> On Jan 24, 2016 23:00, "Ian Wells" <ijw.ubu...@cack.org.uk> wrote:
>>>
>>>> On 24 January 2016 at 20:18, Kevin Benton <blak...@gmail.com> wrote:
>>>>
>>>>> I believe the issue is that the default is unspecified, which leads to
>>>>> nothing being advertised to VMs via dhcp/ra. So VMs end up using 1500,
>>>>> which leads to a catastrophe when running on an overlay on a 1500 
>>>>> underlay.
>>>>>
>>>> That's not quite the point I was making here, but to answer that: looks
>>>> to me like (for the LB or OVS drivers to appropriately set the network MTU
>>>> for the virtual network, at which point it will be advertised because
>>>> advertise_mtu defaults to True in the code) you *must* set one or more of
>>>> path_mtu (for L3 overlays), segment_mtu (for L2 overlays) or physnet_mtu
>>>> (for L2 overlays with differing MTUs on different physical networks).
>>>> That's a statement of faith - I suspect if we try it we'll find a few
>>>> niggling problems - but I can find the code, at least.
>>>>
>>>> The reason for that was in the other half of the thread - it's not
>>>> possible to magically discover these things from within Openstack's own
>>>> code because the relevant settings span more than just one server.  They
>>>> have to line up with both your MTU settings for the interfaces in use, and
>>>> the MTU settings for the other equipment within and neighbouring the cloud
>>>> - switches, routers, nexthops.  So they have to be provided by the operator
>>>> - then everything you want should kick in.
>>>>
>>>> If all of that is true, it really is just a documentation problem - we
>>>> have the idea in place, we're just not telling people how to make use of
>>>> it.  We can also include a checklist or a check script with that
>>>> documentation - you might not be able to deduce the MTU values, but you can
>>>> certainly run some checks to see if the values you have been given are
>>>> obviously wrong.
>>>>
>>>> In the meantime, Matt K, you said you hadn't set path_mtu in your
>>>> tests, but [1] says you have to ([1] is far from end-user consumable
>>>> documentation, which again illustrates our problem).
>>>>
>>>> Can you set both path_mtu and segment_mtu to whatever value your switch
>>>> MTU is (1500 or 9000), confirm your outbound interface MTU is the same
>>>> (1500 or 9000), and see if that changes things?  At this point, you should
>>>> find that your networks get appropriate 1500/9000 MTUs on VLAN based
>>>> networks and 1450/8950 MTUs on VXLAN networks, that they're advertised to
>>>> your VMs via DHCP and RA, and that your routers even know that different

Re: [openstack-dev] [Neutron] MTU configuration pain

2016-01-24 Thread Ian Wells

On 24 January 2016 at 22:12, Kevin Benton <blak...@gmail.com> wrote:

> >The reason for that was in the other half of the thread - it's not
> possible to magically discover these things from within Openstack's own
> code because the relevant settings span more than just one server
>
> IMO it's better to have a default of 1500 rather than let VMs
> automatically default to 1500 because at least we will deduct the encap
> header length when necessary in the dhcp/ra advertised value so overlays
> work on standard 1500 MTU networks.
>
> In other words, our current empty default is realistically a terrible
> default of 1500 that doesn't account for network segmentation overhead.
>
It's pretty clear that, while the current setup is precisely the old
behaviour (backward compatibility, y'know?), it's not very useful.  Problem
is, anyone using the 1550+hacks and other methods of today will find their
system changes behaviour if we started setting that specific default.

Regardless, we need to take that documentation and update it.  It was a
nasty hack back in the day and not remotely a good idea now.



> On Jan 24, 2016 23:00, "Ian Wells" <ijw.ubu...@cack.org.uk> wrote:
>
>> On 24 January 2016 at 20:18, Kevin Benton <blak...@gmail.com> wrote:
>>
>>> I believe the issue is that the default is unspecified, which leads to
>>> nothing being advertised to VMs via dhcp/ra. So VMs end up using 1500,
>>> which leads to a catastrophe when running on an overlay on a 1500 underlay.
>>>
>> That's not quite the point I was making here, but to answer that: looks
>> to me like (for the LB or OVS drivers to appropriately set the network MTU
>> for the virtual network, at which point it will be advertised because
>> advertise_mtu defaults to True in the code) you *must* set one or more of
>> path_mtu (for L3 overlays), segment_mtu (for L2 overlays) or physnet_mtu
>> (for L2 overlays with differing MTUs on different physical networks).
>> That's a statement of faith - I suspect if we try it we'll find a few
>> niggling problems - but I can find the code, at least.
>>
>> The reason for that was in the other half of the thread - it's not
>> possible to magically discover these things from within Openstack's own
>> code because the relevant settings span more than just one server.  They
>> have to line up with both your MTU settings for the interfaces in use, and
>> the MTU settings for the other equipment within and neighbouring the cloud
>> - switches, routers, nexthops.  So they have to be provided by the operator
>> - then everything you want should kick in.
>>
>> If all of that is true, it really is just a documentation problem - we
>> have the idea in place, we're just not telling people how to make use of
>> it.  We can also include a checklist or a check script with that
>> documentation - you might not be able to deduce the MTU values, but you can
>> certainly run some checks to see if the values you have been given are
>> obviously wrong.
>>
>> In the meantime, Matt K, you said you hadn't set path_mtu in your tests,
>> but [1] says you have to ([1] is far from end-user consumable
>> documentation, which again illustrates our problem).
>>
>> Can you set both path_mtu and segment_mtu to whatever value your switch
>> MTU is (1500 or 9000), confirm your outbound interface MTU is the same
>> (1500 or 9000), and see if that changes things?  At this point, you should
>> find that your networks get appropriate 1500/9000 MTUs on VLAN based
>> networks and 1450/8950 MTUs on VXLAN networks, that they're advertised to
>> your VMs via DHCP and RA, and that your routers even know that different
>> interfaces have different MTUs in a mixed environment, at least if
>> everything is working as intended.
>> --
>> Ian.
>>
>> [1]
>> https://github.com/openstack/neutron/blob/544ff57bcac00720f54a75eb34916218cb248213/releasenotes/notes/advertise_mtu_by_default-d8b0b056a74517b8.yaml#L5
>>
>>
>>> On Jan 24, 2016 20:48, "Ian Wells" <ijw.ubu...@cack.org.uk> wrote:
>>>
>>>> On 23 January 2016 at 11:27, Adam Lawson <alaw...@aqorn.com> wrote:
>>>>
>>>>> For the sake of over-simplification, is there ever a reason to NOT
>>>>> enable jumbo frames in a cloud/SDN context where most of the traffic is
>>>>> between virtual elements that all support it? I understand that some
>>>>> switches do not support it and traffic form the web doesn't support it
>>>>> either but besides that, seems like a default "jumboframes = 1" concept
>>&g

Re: [openstack-dev] [Neutron] MTU configuration pain

2016-01-24 Thread Ian Wells

Actually, I note that that document is Juno and there doesn't seem to be
anything at all in the Liberty guide now, so the answer is probably to add
settings for path_mtu and segment_mtu in the recommended Neutron
configuration.

On 24 January 2016 at 22:26, Ian Wells <ijw.ubu...@cack.org.uk> wrote:

> On 24 January 2016 at 22:12, Kevin Benton <blak...@gmail.com> wrote:
>
>> >The reason for that was in the other half of the thread - it's not
>> possible to magically discover these things from within Openstack's own
>> code because the relevant settings span more than just one server
>>
>> IMO it's better to have a default of 1500 rather than let VMs
>> automatically default to 1500 because at least we will deduct the encap
>> header length when necessary in the dhcp/ra advertised value so overlays
>> work on standard 1500 MTU networks.
>>
>> In other words, our current empty default is realistically a terrible
>> default of 1500 that doesn't account for network segmentation overhead.
>>
> It's pretty clear that, while the current setup is precisely the old
> behaviour (backward compatibility, y'know?), it's not very useful.  Problem
> is, anyone using the 1550+hacks and other methods of today will find their
> system changes behaviour if we started setting that specific default.
>
> Regardless, we need to take that documentation and update it.  It was a
> nasty hack back in the day and not remotely a good idea now.
>
>
>
>> On Jan 24, 2016 23:00, "Ian Wells" <ijw.ubu...@cack.org.uk> wrote:
>>
>>> On 24 January 2016 at 20:18, Kevin Benton <blak...@gmail.com> wrote:
>>>
>>>> I believe the issue is that the default is unspecified, which leads to
>>>> nothing being advertised to VMs via dhcp/ra. So VMs end up using 1500,
>>>> which leads to a catastrophe when running on an overlay on a 1500 underlay.
>>>>
>>> That's not quite the point I was making here, but to answer that: looks
>>> to me like (for the LB or OVS drivers to appropriately set the network MTU
>>> for the virtual network, at which point it will be advertised because
>>> advertise_mtu defaults to True in the code) you *must* set one or more of
>>> path_mtu (for L3 overlays), segment_mtu (for L2 overlays) or physnet_mtu
>>> (for L2 overlays with differing MTUs on different physical networks).
>>> That's a statement of faith - I suspect if we try it we'll find a few
>>> niggling problems - but I can find the code, at least.
>>>
>>> The reason for that was in the other half of the thread - it's not
>>> possible to magically discover these things from within Openstack's own
>>> code because the relevant settings span more than just one server.  They
>>> have to line up with both your MTU settings for the interfaces in use, and
>>> the MTU settings for the other equipment within and neighbouring the cloud
>>> - switches, routers, nexthops.  So they have to be provided by the operator
>>> - then everything you want should kick in.
>>>
>>> If all of that is true, it really is just a documentation problem - we
>>> have the idea in place, we're just not telling people how to make use of
>>> it.  We can also include a checklist or a check script with that
>>> documentation - you might not be able to deduce the MTU values, but you can
>>> certainly run some checks to see if the values you have been given are
>>> obviously wrong.
>>>
>>> In the meantime, Matt K, you said you hadn't set path_mtu in your tests,
>>> but [1] says you have to ([1] is far from end-user consumable
>>> documentation, which again illustrates our problem).
>>>
>>> Can you set both path_mtu and segment_mtu to whatever value your switch
>>> MTU is (1500 or 9000), confirm your outbound interface MTU is the same
>>> (1500 or 9000), and see if that changes things?  At this point, you should
>>> find that your networks get appropriate 1500/9000 MTUs on VLAN based
>>> networks and 1450/8950 MTUs on VXLAN networks, that they're advertised to
>>> your VMs via DHCP and RA, and that your routers even know that different
>>> interfaces have different MTUs in a mixed environment, at least if
>>> everything is working as intended.
>>> --
>>> Ian.
>>>
>>> [1]
>>> https://github.com/openstack/neutron/blob/544ff57bcac00720f54a75eb34916218cb248213/releasenotes/notes/advertise_mtu_by_default-d8b0b056a74517b8.yaml#L5
>>>
>>>
>>>> On Jan 24, 2016 20:48, "Ian Wells" <ijw.ub

Re: [openstack-dev] Scheduler proposal

2015-10-13 Thread Ian Wells

On 12 October 2015 at 21:18, Clint Byrum  wrote:

> We _would_ keep a local cache of the information in the schedulers. The
> centralized copy of it is to free the schedulers from the complexity of
> having to keep track of it as state, rather than as a cache. We also don't
> have to provide a way for on-demand stat fetching to seed scheduler 0.
>

I'm not sure that actually changes.  On restart of a scheduler, it wouldn't
have enough knowledge to schedule, but the other schedulers are not and can
service requests while it waits for data.  Using ZK, that takes fewer
seconds because it can get a braindump, but during that window in either
case the system works at n-1/n capacity assuming queries are only done in
memory.

Also, you were seeming to tout the ZK option would take less memory, but it
seems it would take more.  You can't schedule without a relatively complete
set of information or some relatively intricate query language, which I
didn't think ZK was up to (but I'm open to correction there, certainly).
That implies that when you notify a scheduler of a change to the data
model, it's going to grab the fresh data and keep it locally.

> > Also, the notification path here is that the compute host notifies ZK and
> > ZK notifies many schedulers, assuming they're all capable of handling all
> > queries.  That is in fact N * (M+1) messages, which is slightly more than
> > if there's no central node, as it happens.  There are fewer *channels*,
> but
> > more messages.  (I feel like I'm overlooking something here, but I can't
> > pick out the flaw...)  Yes, RMQ will suck at this - but then let's talk
> > about better messaging rather than another DB type.
> >
>
> You're calling transactions messages, and that's not really fair to
> messaging or transactions. :)
>

I was actually talking about the number of messages crossing the network.
Your point is that the transaction with ZK is heavier weight than the
update processing at the schedulers, I think.  But then removing ZK as a
nexus removes that transaction, so both the number of messages and the
number of transactions goes down.

However, it's important to note that in
> this situation, compute nodes do not have to send anything anywhere if
> nothing has changed, which is very likely the case for "full" compute
> nodes, and certainly will save many many redundant messages.

Now that's a fair comment, certainly, and would drastically reduce the
number of messages in the system if we can keep the nodes from updating
just because their free memory has changed by a couple of pages.

> Forgive me
> if nova already makes this optimization somehow, it didn't seem to when
> I was tinkering a year ago.
>

Not as far as I know, it doesn't.

There is also the complexity of designing a scheduler which is fault
> tolerant and scales economically. What we have now will overtax the
> message bus and the database as the number of compute nodes increases.
> We want to get O(1) complexity out of that, but we're getting O(N)
> right now.
>

O(N) will work providing O is small. ;)

I think our cost currently lies in doing 1 MySQL DB update per node per
minute, and one really quite mad query per schedule.  I agree that ZK would
be less costly for that in both respects, which is really more about
lowering O than N.  I'm wondering if we can do better still, that's all,
but we both agree that this approach would work.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Scheduler proposal

2015-10-12 Thread Ian Wells

On 11 October 2015 at 00:23, Clint Byrum  wrote:

> I'm in, except I think this gets simpler with an intermediary service
> like ZK/Consul to keep track of this 1GB of data and replace the need
> for 6, and changes the implementation of 5 to "updates its record and
> signals its presence".
>

OK, so we're not keeping a copy of the information in the schedulers,
saving us 5GB of information, but we are notifying the schedulers of the
updated information to that they can update their copies?

Also, the notification path here is that the compute host notifies ZK and
ZK notifies many schedulers, assuming they're all capable of handling all
queries.  That is in fact N * (M+1) messages, which is slightly more than
if there's no central node, as it happens.  There are fewer *channels*, but
more messages.  (I feel like I'm overlooking something here, but I can't
pick out the flaw...)  Yes, RMQ will suck at this - but then let's talk
about better messaging rather than another DB type.

Again, the saving here seems to be that a freshly started scheduler can get
an infodump rather than waiting 60s to be useful.  I wonder if that's
necessary.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Scheduler proposal

2015-10-12 Thread Ian Wells

On 10 October 2015 at 23:47, Clint Byrum  wrote:

> > Per before, my suggestion was that every scheduler tries to maintain a
> copy
> > of the cloud's state in memory (in much the same way, per the previous
> > example, as every router on the internet tries to make a route table out
> of
> > what it learns from BGP).  They don't have to be perfect.  They don't
> have
> > to be in sync.  As long as there's some variability in the decision
> making,
> > they don't have to update when another scheduler schedules something (and
> > you can make the compute node send an immediate update when a new VM is
> > run, anyway).  They all stand a good chance of scheduling VMs well
> > simultaneously.
> >
>
> I'm quite in favor of eventual consistency and retries. Even if we had
> a system of perfect updating of all state records everywhere, it would
> break sometimes and I'd still want to not trust any record of state as
> being correct for the entire distributed system. However, there is an
> efficiency win gained by staying _close_ to correct. It is actually a
> function of the expected entropy. The more concurrent schedulers, the
> more entropy there will be to deal with.
>

... and the fewer the servers in total, the larger the entropy as a
proportion of the whole system (if that's a thing, it's a long time since I
did physical chemistry).  But consider the use cases:

1. I have a small cloud, I run two schedulers for redundancy.  There's a
good possibility that, when the cloud is loaded, the schedulers make poor
decisions occasionally.  We'd have to consider how likely that was,
certainly.

2. I have a large cloud, and I run 20 schedulers for redundancy.  There's a
good chance that a scheduler is out of date on its information.  But there
could be several hundred hosts willing to satisfy a scheduling request, and
even of the ones with incorrect information a low chance that any of those
are close to the threshold where they won't run the VM in question, so good
odds it will pick a host that's happy to satsify the request.


> But to be fair, we're throwing made up numbers around at this point.
> Maybe
> > it's time to work out how to test this for scale in a harness - which is
> > the bit of work we all really need to do this properly, or there's no
> proof
> > we've actually helped - and leave people to code their ideas up?
>
> I'm working on adding meters for rates and amounts of messages and
> queries that the system does right now for performance purposes. Rally
> though, is the place where I'd go to ask "how fast can we schedule things
> right now?".
>

My only concern is that we're testing a real cloud at scale and I haven't
got any more firstborn to sell for hardware, so I wonder if we can fake up
a compute node in our test harness.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Scheduler proposal

2015-10-09 Thread Ian Wells

On 9 October 2015 at 18:29, Clint Byrum  wrote:

> Instead of having the scheduler do all of the compute node inspection
> and querying though, you have the nodes push their stats into something
> like Zookeeper or consul, and then have schedulers watch those stats
> for changes to keep their in-memory version of the data up to date. So
> when you bring a new one online, you don't have to query all the nodes,
> you just scrape the data store, which all of these stores (etcd, consul,
> ZK) are built to support atomically querying and watching at the same
> time, so you can have a reasonable expectation of correctness.
>

We have to be careful about our definition of 'correctness' here.  In
practice, the data is never going to be perfect because compute hosts
update periodically and the information is therefore always dated.  With
ZK, it's going to be strictly consistent with regard to the updates from
the compute hosts, but again that doesn't really matter too much because
the scheduler is going to have to make a best effort job with a mixed bag
of information anyway.

In fact, putting ZK in the middle basically means that your compute hosts
now synchronously update a majority of nodes in a minimum 3 node quorum -
not the fastest form of update - and then the quorum will see to notifying
the schedulers.  In practice this is just a store-and-fanout again. Once
more it's not clear to me whether the store serves much use, and as for the
fanout, I wonder if we'll need >>3 schedulers running so that this is
reducing communication overhead.

Even if you figured out how to make the in-memory scheduler crazy fast,
> There's still value in concurrency for other reasons. No matter how
> fast you make the scheduler, you'll be slave to the response time of
> a single scheduling request. If you take 1ms to schedule each node
> (including just reading the request and pushing out your scheduling
> result!) you will never achieve greater than 1000/s. 1ms is way lower
> than it's going to take just to shove a tiny message into RabbitMQ or
> even 0mq. So I'm pretty sure this is o-k for small clouds, but would be
> a disaster for a large, busy cloud.
>

Per before, my suggestion was that every scheduler tries to maintain a copy
of the cloud's state in memory (in much the same way, per the previous
example, as every router on the internet tries to make a route table out of
what it learns from BGP).  They don't have to be perfect.  They don't have
to be in sync.  As long as there's some variability in the decision making,
they don't have to update when another scheduler schedules something (and
you can make the compute node send an immediate update when a new VM is
run, anyway).  They all stand a good chance of scheduling VMs well
simultaneously.

If, however, you can have 20 schedulers that all take 10ms on average,
> and have the occasional lock contention for a resource counter resulting
> in 100ms, now you're at 2000/s minus the lock contention rate. This
> strategy would scale better with the number of compute nodes, since
> more nodes means more distinct locks, so you can scale out the number
> of running servers separate from the number of scheduling requests.
>

If you have 20 schedulers that take 1ms on average, and there's absolutely
no lock contention, then you're at 20,000/s.  (Unfair, granted, since what
I'm suggesting is more likely to make rejected scheduling decisions, but
they could be rare.)

But to be fair, we're throwing made up numbers around at this point.  Maybe
it's time to work out how to test this for scale in a harness - which is
the bit of work we all really need to do this properly, or there's no proof
we've actually helped - and leave people to code their ideas up?
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Scheduler proposal

2015-10-09 Thread Ian Wells

On 9 October 2015 at 12:50, Chris Friesen 
wrote:

> Has anybody looked at why 1 instance is too slow and what it would take to
>
>> make 1 scheduler instance work fast enough? This does not preclude the
>> use of
>> concurrency for finer grain tasks in the background.
>>
>
> Currently we pull data on all (!) of the compute nodes out of the database
> via a series of RPC calls, then evaluate the various filters in python code.
>

I'll say again: the database seems to me to be the problem here.  Not to
mention, you've just explained that they are in practice holding all the
data in memory in order to do the work so the benefit we're getting here is
really a N-to-1-to-M pattern with a DB in the middle (the store-to-DB is
rather secondary, in fact), and that without incremental updates to the
receivers.

I suspect it'd be a lot quicker if each filter was a DB query.
>

That's certainly one solution, but again, unless you can tell me *why* this
information will not all fit in memory per process (when it does right
now), I'm still not clear why a database is required at all, let alone a
central one.  Even if it doesn't fit, then a local DB might be reasonable
compared to a centralised one.  The schedulers don't need to work off of
precisely the same state, they just need to make different choices to each
other, which doesn't require a that's-mine-hands-off approach; and they
aren't going to have a perfect view of the state of a distributed system
anyway, so retries are inevitable.

On a different topic, on the weighted choice: it's not 'optimal', given
this is a packing problem, so there isn't a perfect solution.  In fact,
given we're trying to balance the choice of a preferable host with the
chance that multiple schedulers make different choices, it's likely worse
than even weighting.  (Technically I suspect we'd want to rethink whether
the weighting mechanism, is actually getting us a benefit.)
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Scheduler proposal

2015-10-08 Thread Ian Wells

On 8 October 2015 at 13:28, Ed Leafe <e...@leafe.com> wrote:

> On Oct 8, 2015, at 1:38 PM, Ian Wells <ijw.ubu...@cack.org.uk> wrote:
> > Truth be told, storing that data in MySQL is secondary to the correct
> functioning of the scheduler.
>
> I have no problem with MySQL (well, I do, but that's not relevant to this
> discussion). My issue is that the current system poorly replicates its data
> from MySQL to the places where it is needed.
>

Well, the issue is that the data shouldn't be replicated from the database
at all.  There doesn't need to be One True Copy of data here (though I
think the point further down is why we're differing on that).

> > Is there any reason why the duplication (given it's not a huge amount of
> data - megabytes, not gigabytes) is a problem?  Is there any reason why
> inconsistency is a problem?
>
> I'm sure that many of the larger deployments may have issues with the
> amount of data that must be managed in-memory by so many different parts of
> the system.
>

I wonder about that.  If I have a scheduler making a scheduling decision I
don't want it calling out to a database and the database calling out to
offline storage just to find the information, at least not if I can
possibly avoid it.  It's a critical path element in every boot call.

Given that what we're talking about is generally a bunch of resource values
for each host, I'm not sure how big this gets, even in the 100k host range,
but do you have a particularly sizeable structure in mind?

> Inconsistency is a problem, but one that has workarounds. The primary
> issue is scalability: with the current design, increasing the number of
> scheduler processes increases the raciness of the system.
>

And again, given your point below I see where you're coming from here, but
I think the key here is to make two schedulers considerably *less* likely
to make the same choice on the same information.

> I do sympathise with your point in the following email where you have 5
> VMs scheduled by 5 schedulers to the same host, but consider:
> >
> > 1. if only one host suits the 5 VMs this results in the same behaviour:
> 1 VM runs, the rest don't.  There's more work to discover that but arguably
> less work than maintaining a consistent database.
>
> True, but in a large scale deployment this is an extremely rare case.
>

Indeed; I'm trying to get that one out of the way.

> 2. if many hosts suit the 5 VMs then this is *very* unlucky, because we
> should be choosing a host at random from the set of suitable hosts and
> that's a huge coincidence - so this is a tiny corner case that we shouldn't
> be designing around
>
> Here is where we differ in our understanding. With the current system of
> filters and weighers, 5 schedulers getting requests for identical VMs and
> having identical information are *expected* to select the same host. It is
> not a tiny corner case; it is the most likely result for the current system
> design. By catching this situation early (in the scheduling process) we can
> avoid multiple RPC round-trips to handle the fail/retry mechanism.
>

And so maybe this would be a different fix - choose, at random, one of the
hosts above a weighting threshold, not choose the top host every time?
Technically, any host passing the filter is adequate to the task from the
perspective of an API user (and they can't prove if they got the highest
weighting or not), so if we assume weighting an operator preference, and
just weaken it slightly, we'd have a few more options.

Again, we want to avoid overscheduling to a host, which will eventually
cause a decline and a reschedule.  But something that on balance probably
won't overschedule is adequate; overscheduling sucks but is not in fact the
end of the world as long as it's not every single time.

I'm not averse to the central database if we need the central database, but
I'm not sure how much we do at this point, and a central database will
become a point of contention, I would think, beyond the cost of the above
idea.
 --
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Scheduler proposal

2015-10-08 Thread Ian Wells

On 7 October 2015 at 22:17, Chris Friesen <chris.frie...@windriver.com>
wrote:

> On 10/07/2015 07:23 PM, Ian Wells wrote:
>
>>
>> The whole process is inherently racy (and this is inevitable, and
>> correct),
>>
>>
> Why is it inevitable?
>

It's inevitable because everything takes time, and some things are
unpredictable.

The amount of free RAM on a machine - as we do it today - is, literally,
what the kernel reports to be free.  That's known by the host,
unpredictable, occasionally reported to the scheduler (which takes time),
and if you stored it in a database (which takes time) and recovered it from
a database (which takes time) the number you got would not be guaranteed to
be current.

Other things - like CPUs - can theoretically be centrally tracked, but the
whole thing is distributed at the moment - compute nodes are the source of
truth, not the database - which makes some sense when you consider that a
compute node knows best what VMs are running and what VMs have died at any
given moment.  In truth, if the central service is in any way wrong (for
instance, processes outside of Openstack are using a lot of CPU, which you
can't predict, again) then it makes sense for the compute node to be the
final arbiter, so (occasional, infrequent) reschedules are probably
appropriate anyway.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Scheduler proposal

2015-10-08 Thread Ian Wells

On 8 October 2015 at 09:10, Ed Leafe  wrote:

> You've hit upon the problem with the current design: multiple, and
> potentially out-of-sync copies of the data.

Arguably, this is the *intent* of the current design, not a problem with
it.  The data can never be perfect (ever) so go with 'good enough' and run
with it, and deal with the corner cases.  Truth be told, storing that data
in MySQL is secondary to the correct functioning of the scheduler.  The one
thing it helps with is when the scheduler restarts - it stands a chance of
making sensible decisions before it gets its full picture back.  (This is
all very like route distribution protocols, you know: make the best
decision on the information you have to hand, assuming the rest of the
system will deal with your mistakes.  And hold times, and graceful restart,
and...)

> What you're proposing doesn't really sound all that different than the
> current design, which has the compute nodes send the updates in their state
> to the scheduler both on a scheduled task, and in response to changes. The
> impetus for the Cassandra proposal was to eliminate this duplication, and
> have the resources being scheduled and the scheduler all working with the
> same data.

Is there any reason why the duplication (given it's not a huge amount of
data - megabytes, not gigabytes) is a problem?  Is there any reason why
inconsistency is a problem?

What you propose is a change in behaviour.  The scheduler today is intended
to make the best decision based on the available information, without
locks, and on the assumption that other things might be scheduling at the
same time.  Your proposal comes across as making all schedulers work on one
accurate copy of information that they keep updated (not, I think, entirely
synchronously, so they can still be working on outdated information, but
rather closer to it).  But when you have hundreds of hosts willing to take
a machine then there's typically no one answer to a scheduling decision and
we can tolerate really quite a lot of variability.

I do sympathise with your point in the following email where you have 5 VMs
scheduled by 5 schedulers to the same host, but consider:

1. if only one host suits the 5 VMs this results in the same behaviour: 1
VM runs, the rest don't.  There's more work to discover that but arguably
less work than maintaining a consistent database.
2. if many hosts suit the 5 VMs then this is *very* unlucky, because we
should be choosing a host at random from the set of suitable hosts and
that's a huge coincidence - so this is a tiny corner case that we shouldn't
be designing around

The worst case, is, however

3. we attempt to pick the optimal host, and the optimal host for all 5 VMs
is the same despite there being other less perfect choices out there.  That
would get you a stampeding herd and a bunch of retries.

I admit that the current system does not solve well for (3).
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Scheduler proposal

2015-10-07 Thread Ian Wells

On 7 October 2015 at 16:00, Chris Friesen 
wrote:

> 1) Some resources (RAM) only require tracking amounts.  Other resources
> (CPUs, PCI devices) require tracking allocation of specific individual host
> resources (for CPU pinning, PCI device allocation, etc.).  Presumably for
> the latter we would have to actually do the allocation of resources at the
> time of the scheduling operation in order to update the database with the
> claimed resources in a race-free way.
>

The whole process is inherently racy (and this is inevitable, and correct),
which is why the scheduler works the way it does:

- scheduler guesses at a host based on (guaranteed - hello distributed
systems!) outdated information
- VM is scheduled to a host that looks like it might work, and host
attempts to run it
- VM run may fail (because the information was outdated or has become
outdated), in which case we retry the schedule

In fact, with PCI devices the code has been written rather carefully to
make sure that they fit into this model.  There is central per-device
tracking (which, fwiw, I argued against back in the day) but that's not how
allocation works (or, considering how long it is since I looked, worked).

PCI devices are actually allocated from pools of equivalent devices, and
allocation works in the same manner as other scheduling: you work out from
the nova boot call what constraints a host must satisfy (in this case, in
number of PCI devices in specific pools), you check your best guess at
global host state against those constraints, and you pick one of the hosts
that meets the constraints to schedule on.

So: yes, there is a central registry of devices, which we try to keep up to
date - but this is for admins to refer to, it's not a necessity of
scheduling.  The scheduler input is the pool counts, which work largely the
same way as the available memory works as regards scheduling and updating.

No idea on CPUs, sorry, but again I'm not sure why the behaviour would be
any different: compare suspected host state against needs, schedule if it
fits, hope you got it right and tolerate if you didn't.

That being the case, it's worth noting that the database can be eventually
consistent and doesn't need to be transactional.  It's also worth
considering that the database can have multiple (mutually inconsistent)
copies.  There's no need to use a central datastore if you don't want to -
one theoretical example is to run multiple schedulers and let each
scheduler attempt to collate cloud state from unreliable messages from the
compute hosts.  This is not quite what happens today, because messages we
send over Rabbit are reliable and therefore costly.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] -1 due to line length violation in commit messages

2015-09-26 Thread Ian Wells

Can I ask a different question - could we reject a few simple-to-check
things on the push, like bad commit messages?  For things that take 2
seconds to fix and do make people's lives better, it's not that they're
rejected, it's that the whole rejection cycle via gerrit review (push/wait
for tests to run/check website/swear/find change/fix/push again) is out of
proportion to the effort taken to fix it.

It seems here that there's benefit to 72 line messages - not that everyone
sees that benefit, but it is present - but it doesn't outweigh the current
cost.
-- 
Ian.


On 25 September 2015 at 12:02, Jeremy Stanley  wrote:

> On 2015-09-25 16:15:15 + (+), Fox, Kevin M wrote:
> > Another option... why are we wasting time on something that a
> > computer can handle? Why not just let the line length be infinite
> > in the commit message and have gerrit wrap it to  > number here> length lines on merge?
>
> The commit message content (including whitespace/formatting) is part
> of the data fed into the hash algorithm to generate the commit
> identifier. If Gerrit changed the commit message at upload, that
> would alter the Git SHA compared to your local copy of the same
> commit. This quickly goes down a Git madness rabbit hole (not the
> least of which is that it would completely break signed commits).
> --
> Jeremy Stanley
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] cloud-init IPv6 support

2015-09-09 Thread Ian Wells

Neutron already offers a DNS server (within the DHCP namespace, I think).
It does forward on non-local queries to an external DNS server, but it
already serves local names for instances; we'd simply have to set one
aside, or perhaps use one in a 'root' but nonlocal domain
(metadata.openstack e.g.).  In fact, this improves things slightly over the
IPv4 metadata server: IPv4 metadata is usually reached via the router,
whereas in ipv6 if we have a choice over addresses with can use a link
local address (and any link local address will do; it's not an address that
is 'magic' in some way, thanks to the wonder of service advertisement).

And per previous comments about 'Amazon owns this' - the current metadata
service is a de facto standard, which Amazon initiated but is not owned by
anybody, and it's not the only standard.  If you'd like proof of the
former, I believe our metadata service offers /openstack/ URLs, unlike
Amazon (mirroring the /openstack/ files on the config drive); and on the
latter, config-drive and Amazon-style metadata are only two of quite an
assortment of data providers that cloud-init will query.  If it makes you
think of it differently, think of this as the *Openstack* ipv6 metadata
service, and not the 'will-be-Amazon-one-day-maybe' service.

On 8 September 2015 at 17:03, Clint Byrum  wrote:

> Neutron would add a soft router that only knows the route to the metadata
> service (and any other services you want your neutron private network vms
> to be able to reach). This is not unique to the metadata service. Heat,
> Trove, etc, all want this as a feature so that one can poke holes out of
> these private networks only to the places where the cloud operator has
> services running.
>
> Excerpts from Fox, Kevin M's message of 2015-09-08 14:44:35 -0700:
> > How does that work with neutron private networks?
> >
> > Thanks,
> > Kevin
> > 
> > From: Clint Byrum [cl...@fewbar.com]
> > Sent: Tuesday, September 08, 2015 1:35 PM
> > To: openstack-dev
> > Subject: Re: [openstack-dev] [Neutron] cloud-init IPv6 support
> >
> > Excerpts from Nir Yechiel's message of 2014-07-07 09:15:09 -0700:
> > > AFAIK, the cloud-init metadata service can currently be accessed only
> by sending a request to http://169.254.169.254, and no IPv6 equivalent is
> currently implemented. Does anyone working on this or tried to address this
> before?
> > >
> >
> > I'm not sure we'd want to carry the way metadata works forward now that
> > we have had some time to think about this.
> >
> > We already have DHCP6 and NDP. Just use one of those, and set the host's
> > name to a nonce that it can use to lookup the endpoint for instance
> > differentiation via DNS SRV records. So if you were told you are
> >
> > d02a684d-56ea-44bc-9eba-18d997b1d32d.region.cloud.com
> >
> > Then you look that up as a SRV record on your configured DNS resolver,
> > and connect to the host name returned and do something like  GET
> > /d02a684d-56ea-44bc-9eba-18d997b1d32d
> >
> > And viola, metadata returns without any special link local thing, and
> > it works like any other dual stack application on the planet.
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][L3] Representing a networks connected by routers

2015-07-21 Thread Ian Wells

On 21 July 2015 at 07:52, Carl Baldwin c...@ecbaldwin.net wrote:

  Now, you seem to generally be thinking in terms of the latter model,
 particularly since the provider network model you're talking about fits
 there.  But then you say:

 Actually, both.  For example, GoDaddy assigns each vm an ip from the
 location based address blocks and optionally one from the routed location
 agnostic ones.  I would also like to assign router ports out of the
 location based blocks which could host floating ips from the other blocks.

Well, routed IPs that are not location-specific are no different to normal
ones, are they?  Why do they need special work that changes the API?

  On 20 July 2015 at 10:33, Carl Baldwin c...@ecbaldwin.net wrote:
 
  When creating a
  port, the binding information would be sent to the IPAM system and the
  system would choose an appropriate address block for the allocation.

 Implicit in both is a need to provide at least a hint at host binding.
 Or, delay address assignment until binding.  I didn't mention it because my
 email was already long.
 This is something and discussed but applies equally to both proposals.

No, it doesn't - if the IP address is routed and not relevant to the
location of the host then yes, you would want to *inject a route* at
binding, but you wouldn't want to delay address assignment till binding
because it's location-agnostic.

  No, it wouldn't, because creating and binding a port are separate
 operations.  I can't give the port a location-specific address on creation
 - not until it's bound, in fact, which happens much later.
 
  On proposal 1: consider the cost of adding a datamodel to Neutron.  It
 has to be respected by all developers, it frequently has to be deployed by
 all operators, and every future change has to align with it.  Plus either
 it has to be generic or optional, and if optional it's a burden to some
 proportion of Neutron developers and users.  I accept proposal 1 is easy,
 but it's not universally applicable.  It doesn't work with Neil Jerram's
 plans, it doesn't work with multiple interfaces per host, and it doesn't
 work with the IPv6 routed-network model I worked on.

 Please be more specific.  I'm not following your argument here.  My
 proposal doesn't really add much new data model.

My point is that there's a whole bunch of work there to solve the question
of 'how do I allocate addresses to a port when addresses are location
specific' that assumes that there's one model for location specific
addresses that is a bunch of segments with each host on one segment.  I can
break this model easily.  Per the previous IPv6 proposal, I might choose my
address with more care than just by its location, to contain extra
information I care about.  I might have multiple segments connected to one
host where either segment will do and the scheduler should choose the most
useful one.

If this whole model is built using reusable-ish concepts like networks, and
adds a field to ports, then basically it ends up in, or significantly
affecting, the model of core Neutron.  Every Neutron developer to come will
have to read it, understand it, and not break it.  Depending on how it's
implemented, every operator that comes along will have to deploy it and may
be affected by bugs in it (though that depends on precisely how much ends
up as an extension).

If we find a more general purpose interface - and per above, mostly the
interface is 'sometimes I want to pick my address only at binding' plus
'IPAM and address assignment is more complex than the subnet model we have
today' then potentially these datamodels can be specific to IPAM - and not
general purpose 'we have these objects around already' things we're reusing
- and with a clean interface the models may not even be present as code
into a deployed system, which is the best proof they are not introducing
bugs.

Every bit of cruft we write, we have to carry.  It makes more sense to make
the core extensible for this case, in my mind, than it does to introduce it
into the core.

 We've discussed this with Neil at length.  I haven't been able to
 reconcile our respective approaches in to one model that works for both of
 us and still provides value.

QED.


 Could you provide some links so that I can brush up on your ipv6 routed
 network model?  I'd like to consider it but I don't know much about it.


The best writeup I have is
http://datatracker.ietf.org/doc/draft-baker-openstack-ipv6-model/?include_text=1
(don't judge it by the place it was filed). But the concept was that (a)
VMs received v6 addresses, (b) they were location specific, (c) each had
their own L2 segment (per Neil's idea, and really the ultimate use of this
model), and (d) there was information in the address additional to just its
location and the entropy of choosing a random address.


  1: some network types don't allow unbound ports to have addresses, they
 just get placeholder addresses for each subnet until they're bound
  2: 'subnets'

Re: [openstack-dev] [Neutron][L3] Representing a networks connected by routers

2015-07-21 Thread Ian Wells

 is essentially saying give me an
 address that is routable in this scope - they don't care which actual
 subnet it gets allocated on. This is conceptually more in-line with [2] -
 modeling L3 domain separately from the existing Neutron concept of a
 network being a broadcast domain.


Again, the issue is that when you ask for an address you tend to have quite
a strong opinion of what that address should be if it's location-specific.



 Fundamentally, however we associate the segments together, this comes down
 to a scheduling problem.


It's not *solely* a scheduling problem, and that is my issue with this
statement (Assaf has been saying the same).  You *can* solve this
*exclusively* with scheduling (allocate the address up front, hope that the
address has space for a VM with all its constraints met) - but that
solution is horrible; or you can solve this largely with allocation where
scheduling helps to deal with pool exchaustion, where it is mainly another
sort of problem but scheduling plays a part.

Nova needs to be able to incorporate data from Neutron in its scheduling
 decision. Rather than solving this with a single piece of meta-data like
 network_id as described in proposal 1, it probably makes more sense to
 build out the general concept of utilizing network data for nova
 scheduling. We could still model this as in #1, or using address scopes, or
 some arbitrary data as in #2. But the harder problem to solve is the
 scheduling, not how we tag these things to inform that scheduling.

  The optimization of routing for floating IPs is also a scheduling
 problem, though one that would require a lot more changes to how FIP are
 allocated and associated to solve.

  John

  [1] https://review.openstack.org/#/c/180803/
 [2] https://bugs.launchpad.net/neutron/+bug/1458890/comments/7




   On Jul 21, 2015, at 10:52 AM, Carl Baldwin c...@ecbaldwin.net wrote:

  On Jul 20, 2015 4:26 PM, Ian Wells ijw.ubu...@cack.org.uk wrote:
 
  There are two routed network models:
 
  - I give my VM an address that bears no relation to its location and
 ensure the routed fabric routes packets there - this is very much the
 routing protocol method for doing things where I have injected a route into
 the network and it needs to propagate.  It's also pretty useless because
 there are too many host routes in any reasonable sized cloud.
 
  - I give my VM an address that is based on its location, which only
 becomes apparent at binding time.  This means that the semantics of a port
 changes - a port has no address of any meaning until binding, because its
 location is related to what it does - and it leaves open questions about
 what to do when you migrate.
 
  Now, you seem to generally be thinking in terms of the latter model,
 particularly since the provider network model you're talking about fits
 there.  But then you say:

 Actually, both.  For example, GoDaddy assigns each vm an ip from the
 location based address blocks and optionally one from the routed location
 agnostic ones.  I would also like to assign router ports out of the
 location based blocks which could host floating ips from the other blocks.

  On 20 July 2015 at 10:33, Carl Baldwin c...@ecbaldwin.net wrote:
 
  When creating a
  port, the binding information would be sent to the IPAM system and the
  system would choose an appropriate address block for the allocation.

 Implicit in both is a need to provide at least a hint at host binding.
 Or, delay address assignment until binding.  I didn't mention it because my
 email was already long.
 This is something and discussed but applies equally to both proposals.

  No, it wouldn't, because creating and binding a port are separate
 operations.  I can't give the port a location-specific address on creation
 - not until it's bound, in fact, which happens much later.
 
  On proposal 1: consider the cost of adding a datamodel to Neutron.  It
 has to be respected by all developers, it frequently has to be deployed by
 all operators, and every future change has to align with it.  Plus either
 it has to be generic or optional, and if optional it's a burden to some
 proportion of Neutron developers and users.  I accept proposal 1 is easy,
 but it's not universally applicable.  It doesn't work with Neil Jerram's
 plans, it doesn't work with multiple interfaces per host, and it doesn't
 work with the IPv6 routed-network model I worked on.

 Please be more specific.  I'm not following your argument here.  My
 proposal doesn't really add much new data model.

 We've discussed this with Neil at length.  I haven't been able to
 reconcile our respective approaches in to one model that works for both of
 us and still provides value.  The routed segments model needs to somehow
 handle the L2 details of the underlying network.  Neil's model confines L2
 to the port and routes to it.  The two models can't just be squished
 together unless I'm missing something.

 Could you provide some links so that I can brush up on your ipv6

Re: [openstack-dev] [neutron] [VXLAN] patch to use per-VNI multicast group addresses

2015-07-21 Thread Ian Wells

It is useful, yes; and posting diffs on the mailing list is not the way to
get them reviewed and approved.  If you can get this on gerrit it will get
a proper review, and I would certainly like to see something like this
incorporated.

On 21 July 2015 at 15:41, John Nielsen li...@jnielsen.net wrote:

 I may be in a small minority since I a) use VXLAN, b) don’t hate multicast
 and c) use linuxbridge instead of OVS. However I thought I’d share this
 patch in case I’m not alone.

 If you assume the use of multicast, VXLAN works quite nicely to isolate L2
 domains AND to prevent delivery of unwanted broadcast/unknown/multicast
 packets to VTEPs that don’t need them. However, the latter only holds up if
 each VXLAN VNI uses its own unique multicast group address. Currently, you
 have to either disable multicast (and use l2_population or similar) or use
 only a single group address for ALL VNIs (and force every single VTEP to
 receive every BUM packet from every network). For my usage, this patch
 seems simpler.

 Feedback is very welcome. In particular I’d like to know if anyone else
 finds this useful and if so, what (if any) changes might be required to get
 it committed. Thanks!

 JN


 commit 17c32a9ad07911f3b4148e96cbcae88720eef322
 Author: John Nielsen j...@jnielsen.net
 Date:   Tue Jul 21 16:13:42 2015 -0600

 Add a boolean option, vxlan_group_auto, which if enabled will compute
 a unique multicast group address group for each VXLAN VNI. Since VNIs
 are 24 bits, they map nicely to the 239.0.0.0/8 site-local multicast
 range. Eight bits of the VNI are used for the second, third and fourth
 octets (with 239 always as the first octet).

 Using this option allows VTEPs to receive BUM datagrams via multicast,
 but only for those VNIs in which they participate. In other words, it
 is
 an alternative to the l2_population extension and driver for
 environments
 where both multicast and linuxbridge are used.

 If the option is True then multicast groups are computed as described
 above. If the option is False then the previous behavior is used
 (either a single multicast group is defined by vxlan_group or multicast
 is disabled).

 diff --git a/etc/neutron/plugins/ml2/linuxbridge_agent.ini
 b/etc/neutron/plugins/ml2/linuxbridge_agent.ini
 index d1a01ba..03578ad 100644
 --- a/etc/neutron/plugins/ml2/linuxbridge_agent.ini
 +++ b/etc/neutron/plugins/ml2/linuxbridge_agent.ini
 @@ -25,6 +25,10 @@
  # This group must be the same on all the agents.
  # vxlan_group = 224.0.0.1
  #
 +# (BoolOpt) Derive a unique 239.x.x.x multicast group for each vxlan VNI.
 +# If this option is true, the setting of vxlan_group is ignored.
 +# vxlan_group_auto = False
 +#
  # (StrOpt) Local IP address to use for VXLAN endpoints (required)
  # local_ip =
  #
 diff --git
 a/neutron/plugins/ml2/drivers/linuxbridge/agent/common/config.py
 b/neutron/plugins/ml2/drivers/linuxbridge/agent/common/config.py
 index 6f15236..b4805d5 100644
 --- a/neutron/plugins/ml2/drivers/linuxbridge/agent/common/config.py
 +++ b/neutron/plugins/ml2/drivers/linuxbridge/agent/common/config.py
 @@ -31,6 +31,9 @@ vxlan_opts = [
 help=_(TOS for vxlan interface protocol packets.)),
  cfg.StrOpt('vxlan_group', default=DEFAULT_VXLAN_GROUP,
 help=_(Multicast group for vxlan interface.)),
 +cfg.BoolOpt('vxlan_group_auto', default=False,
 +help=_(Derive a unique 239.x.x.x multicast group for
 each 
 +   vxlan VNI)),
  cfg.IPOpt('local_ip', version=4,
help=_(Local IP address of the VXLAN endpoints.)),
  cfg.BoolOpt('l2_population', default=False,
 diff --git
 a/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py
 b/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py
 index 61627eb..a0efde1 100644
 ---
 a/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py
 +++
 b/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py
 @@ -127,6 +127,14 @@ class LinuxBridgeManager(object):
  LOG.warning(_LW(Invalid Segmentation ID: %s, will lead to 
  incorrect vxlan device name),
 segmentation_id)

 +def get_vxlan_group(self, segmentation_id):
 +if cfg.CONF.VXLAN.vxlan_group_auto:
 +return (239. +
 +str(segmentation_id  16) + . +
 +str(segmentation_id  8 % 256) + . +
 +str(segmentation_id % 256))
 +return cfg.CONF.VXLAN.vxlan_group
 +
  def get_all_neutron_bridges(self):
  neutron_bridge_list = []
  bridge_list = os.listdir(BRIDGE_FS)
 @@ -240,7 +248,7 @@ class LinuxBridgeManager(object):
 'segmentation_id': segmentation_id})
  args = {'dev': self.local_int}
  if self.vxlan_mode == lconst.VXLAN_MCAST:
 -args['group'] =

Re: [openstack-dev] [neutron] 'routed' network type, DHCP agent + devstack support - review requested

2015-07-20 Thread Ian Wells

On 20 July 2015 at 10:21, Neil Jerram neil.jer...@metaswitch.com wrote:

 Hi Ian,

 On 20/07/15 18:00, Ian Wells wrote:

 On 19 July 2015 at 03:46, Neil Jerram neil.jer...@metaswitch.com
 mailto:neil.jer...@metaswitch.com wrote:

 The change at [1] creates and describes a new 'routed' value for
 provider:network_type.  It means that a compute host handles data
 to/from the relevant TAP interfaces by routing it, and specifically
 that those TAP interfaces are not bridged.


 To clarify that, the user uses provider:network_type in the API to
 request a 'routed' network be created, and the Neutron plugin either
 implements that or rejects the create call?  Or something else?


 Yes, I believe so.  Could it be otherwise?


We can make it work any way you like if you'we willing to spend the rest of
your life writing it. ;)

It depends rather on how you picture this working.

As described you've made it so that networks would be routed if the admin
created them and specifically flagged them as routed, which useful for
testing, or if the mechdriver is the default, which is probably the most
useful way in production.

I think the thing that we'll be missing long term is a means to explicitly
request an L2 domain - as that's the special case that you might explicitly
want, the general case is 'with IP addresses my VMs can talk to each other'
- and that would require more than Neutron currently provides and would
require work.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] 'routed' network type, DHCP agent + devstack support - review requested

2015-07-20 Thread Ian Wells

On 19 July 2015 at 03:46, Neil Jerram neil.jer...@metaswitch.com wrote:

 The change at [1] creates and describes a new 'routed' value for
 provider:network_type.  It means that a compute host handles data
 to/from the relevant TAP interfaces by routing it, and specifically
 that those TAP interfaces are not bridged.


To clarify that, the user uses provider:network_type in the API to request
a 'routed' network be created, and the Neutron plugin either implements
that or rejects the create call?  Or something else?
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][L3] Representing a networks connected by routers

2015-07-20 Thread Ian Wells

There are two routed network models:

- I give my VM an address that bears no relation to its location and ensure
the routed fabric routes packets there - this is very much the routing
protocol method for doing things where I have injected a route into the
network and it needs to propagate.  It's also pretty useless because there
are too many host routes in any reasonable sized cloud.

- I give my VM an address that is based on its location, which only becomes
apparent at binding time.  This means that the semantics of a port changes
- a port has no address of any meaning until binding, because its location
is related to what it does - and it leaves open questions about what to do
when you migrate.

Now, you seem to generally be thinking in terms of the latter model,
particularly since the provider network model you're talking about fits
there.  But then you say:

On 20 July 2015 at 10:33, Carl Baldwin c...@ecbaldwin.net wrote:

 When creating a
 port, the binding information would be sent to the IPAM system and the
 system would choose an appropriate address block for the allocation.


No, it wouldn't, because creating and binding a port are separate
operations.  I can't give the port a location-specific address on creation
- not until it's bound, in fact, which happens much later.

On proposal 1: consider the cost of adding a datamodel to Neutron.  It has
to be respected by all developers, it frequently has to be deployed by all
operators, and every future change has to align with it.  Plus either it
has to be generic or optional, and if optional it's a burden to some
proportion of Neutron developers and users.  I accept proposal 1 is easy,
but it's not universally applicable.  It doesn't work with Neil Jerram's
plans, it doesn't work with multiple interfaces per host, and it doesn't
work with the IPv6 routed-network model I worked on.

Given that, I wonder whether proposal 2 could be rephrased.

1: some network types don't allow unbound ports to have addresses, they
just get placeholder addresses for each subnet until they're bound
2: 'subnets' on these networks are more special than subnets on other
networks.  (More accurately, they dont use subnets.  It's a shame subnets
are core Neutron, because they're pretty horrible and yet hard to replace.)
3: there's an independent (in an extension?  In another API endpoint?)
datamodel that the network points to and that IPAM refers to to find a port
an address.  Bonus, people who aren't using funky network types can disable
this extension.
4: when the port is bound, the IPAM is referred to, and it's told the
binding information of the port.
5: when binding the port, once IPAM has returned its address, the network
controller probably does stuff with that address when it completes the
binding (like initialising routing).
6: live migration either has to renumber a port or forward old traffic to
the new address via route injection.  This is an open question now, so I'm
mentioning it rather than solving it.

In fact, adding that hook to IPAM at binding plus setting aside a 'not set'
IP address might be all you need to do to make it possible.  The IPAM needs
data to work out what an address is, but that doesn't have to take the form
of existing Neutron constructs.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] vif type libvirt-network

2015-06-12 Thread Ian Wells

On 11 June 2015 at 02:37, Andreas Scheuring scheu...@linux.vnet.ibm.com
wrote:

  Do you happen to know how data gets routed _to_ a VM, in the
  type='network' case?

 Neil, sorry no. Haven't played around with that, yet. But from reading
 the libvirt man, it looks good. It's saying Guest network traffic will
 be forwarded to the physical network via the host's IP routing stack -
 so I would assume this is L3. Maybe you should give it a quick try to
 figure out...


You would at the least require a namespace to preserve network separation,
I think.  And in fact if you go this way the answer may be to set up a
namespace in the same way that LB sets a bridge up.  Nova or Neutron can
create the NS whichever happens to need it first, and ignore the failure if
it happens to get caught in the race.  Some slight risk that a true failure
is not spotted, though.  Or you could have the Neutron agent wait for the
appearance of the interface, which it could do with either polling or use
of rt_netlink, offhand.

And none of this appears to require a libvirt network, but I don't think a
simple TAP plug exists either (something along the lines that Neil (?)
proposed where Nova is simply told that if it creates a TAP with the right
name then all will be well).

(And I begin to remember why VIF plugging is horrible.)
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] File injection, config drive and cloud-init

2015-06-11 Thread Ian Wells

On 11 June 2015 at 15:34, Michael Still mi...@stillhq.com wrote:

 On Fri, Jun 12, 2015 at 7:07 AM, Mark Boo mrkzm...@gmail.com wrote:
  - What functionality is missing (if any) in config drive / metadata
 service
  solutions to completely replace file injection?

 None that I am aware of. In fact, these two other options provide you
 with more data than you'd get with file injection.


A config drive is useful if and only if you know to read it and have
software that does so (for packaged Linux, you install the cloud-init
package, usually).  File injection works even if you don't adapt your VM
image.

Conversely, file injection only works on a limited range of disk formats.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Should we add instance action event to live migration?

2015-06-11 Thread Ian Wells

On 11 June 2015 at 12:37, Richard Raseley rich...@raseley.com wrote:

 Andrew Laski wrote:

 There are many reasons a deployer may want to live-migrate instances
 around: capacity planning, security patching, noisy neighbors, host
 maintenance, etc... and I just don't think the user needs to know or
 care that it has taken place.


 They might care, insofar as live migrations will often cause performance
 degradation from a users perspective.


Seconded.  If your app manager is warned that you're going to be live
migrating it can do something about the capacity drop.  I can imagine cases
where a migrating VM would be brutally murdered [1] and replaced because
it's not delivering sufficient performance.
-- 
Ian.

[1] See nova help brutally-murder
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] vif type libvirt-network

2015-06-10 Thread Ian Wells

I don't see a problem with this, though I think you do want plug/unplug
calls to be passed on to Neutron so that has the opportunity to set up the
binding from its side (usage 0) and tear it down when you're done with it
(usage 1).

There may be a set of races you need to deal with, too - what happens if
Nova starts a VM attached to a Neutron network binding that has yet to be
set up?  Neutron doesn't (well, technically, shouldn't be expected to) do
things instantaneously on a call, even binding, or you get into the realm
of distributed system failure case analysis.

Neil, are you trying to route to the host - where this will work, because a
libvirt network is L2 - or to the VM - where this won't, for the same
reason?
-- 
Ian.


On 10 June 2015 at 12:16, Neil Jerram neil.jer...@metaswitch.com wrote:

 On 10/06/15 15:47, Andreas Scheuring wrote:

 Hi Daniel, Neil and others,

 I was thinking about introducing libvirt-network as a new vif type to
 nova. It can be used when Neutron prepares a libvirt network for
 attaching guests.

 Would you see any general concerns with such an approach? Anything that
 I need to consider with libvirt networks in addition? Maybe I should
 mention one thing due to the discussion this morning: No plug/unplug
 behavior would required.

 Any feedback is welcome!


 I added a blueprint and wrote a spec with more details [1]. This
 blueprint would make the macvtap-vif blueprint [2] dispensable.

 The neutron code exploiting this libvirt network vif type will land on
 stackforge. It will manage macvtap backed libvirt networks -- offer
 guest attachments via macvtap. [3]



 [1] https://blueprints.launchpad.net/nova/+spec/libvirt-network-vif
 [2] https://blueprints.launchpad.net/nova/+spec/libvirt-macvtap-vif
 [3] https://launchpad.net/networking-macvtap
 (I'm still waiting for the repo to be approved, so for now I only have a
 launchpad project to ref to).


 Thanks, Andreas, this looks interesting.  I wonder if

 network
   namexyz/name
   forward mode=route\
   ...
 /network

 domain
   ...
   interface type='network'
 source network='xyx'/
   /interface
   ...
 /domain

 would provide the connectivity that my Calico project wants to set up [1]
 - i.e. where all data to and from VMs is routed on the compute host -
 instead of

 domain
   ...
   interface type='ethernet'
 ...
   /interface
   ...
 /domain

 Do you happen to know how data gets routed _to_ a VM, in the
 type='network' case?

 Regards,
 Neil


 [1] http://docs.projectcalico.org/en/latest/home.html


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Progressing/tracking work on libvirt / vif drivers

2015-06-08 Thread Ian Wells

Hey Gary,

Sorry for being a little late with the followup...

Concerns with binding type negotiation, or with the scripting?  And could
you summarise the concerns, for those of us that didn't hear them?
-- 
Ian,

On 2 June 2015 at 07:08, Gary Kotton gkot...@vmware.com wrote:

  Hi,
 At the summit this was discussed in the nova sessions and there were a
 number of concerns regarding security etc.
 Thanks
 Gary

   From: Irena Berezovsky irenab@gmail.com
 Reply-To: OpenStack List openstack-dev@lists.openstack.org
 Date: Tuesday, June 2, 2015 at 1:44 PM
 To: OpenStack List openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Progressing/tracking work on libvirt
 / vif drivers

   Hi Ian,
 I like your proposal. It sounds very reasonable and makes separation of
 concerns between neutron and nova very clear. I think with vif plug script
 support [1]. it will help to decouple neutron from nova dependency.
 Thank you for sharing this,
 Irena
 [1] https://review.openstack.org/#/c/162468/

 On Tue, Jun 2, 2015 at 10:45 AM, Ian Wells ijw.ubu...@cack.org.uk wrote:

  VIF plugging, but not precisely libvirt VIF plugging, so I'll tout this
 to a hopefully interested audience.

 At the summit, we wrote up a spec we were thinking of doing at [1].  It
 actually proposes two things, which is a little naughty really, but hey.

 Firstly we propose that we turn binding into a negotiation, so that Nova
 can offer binding options it supports to Neutron and Neutron can pick the
 one it likes most.  This is necessary if you happen to use vhostuser with
 qemu, as it doesn't work for some circumstances, and desirable all around,
 since it means you no longer have to configure Neutron to choose a binding
 type that Nova likes and Neutron can choose different binding types
 depending on circumstances.  As a bonus, it should make inter-version
 compatibility work better.

  Secondly we suggest that some of the information that Nova and Neutron
 currently calculate independently should instead be passed from Neutron to
 Nova, simplifying the Nova code since it no longer has to take an educated
 guess at things like TAP device names.  That one is more contentious, since
 in theory Neutron could pass an evil value, but if we can find some pattern
 that works (and 'pattern' might be literally true, in that you could get
 Nova to confirm that the TAP name begins with a magic string and is not
 going to be a physical device or other interface on the box) I think that
 would simplify the code there.

  Read, digest, see what you think.  I haven't put it forward yet
 (actually I've lost track of which projects take specs at this point) but I
 would very much like to get it implemented and it's not a drastic change
 (in fact, it's a no-op until we change Neutron to respect what Nova passes).

 [1] https://etherpad.openstack.org/p/YVR-nova-neutron-binding-spec

 On 1 June 2015 at 10:37, Neil Jerram neil.jer...@metaswitch.com wrote:

 On 01/06/15 17:45, Neil Jerram wrote:

  Many thanks, John  Dan.  I'll start by drafting a summary of the work
 that I'm aware of in this area, at
 https://etherpad.openstack.org/p/liberty-nova-libvirt-vif-work.


 OK, my first draft of this is now there at [1].  Please could folk with
 VIF-related work pending check that I haven't missed or misrepresented
 them?  Especially, please could owners of the 'Infiniband SR-IOV' and
 'mlnx_direct removal' changes confirm that those are really ready for core
 review?  It would be bad to ask for core review that wasn't in fact wanted.

 Thanks,
 Neil


 [1] https://etherpad.openstack.org/p/liberty-nova-libvirt-vif-work



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] virtual machine can not get DHCP lease due packet has no checksum

2015-06-02 Thread Ian Wells

The fix should work fine.  It is technically a workaround for the way
checksums work in virtualised systems, and the unfortunate fact that some
DHCP clients check checksums on packets where the hardware has checksum
offload enabled.  (This doesn't work due to an optimisation in the way QEMU
treats packet checksums.  You'll see the problem if your machine is running
the VM on the same host as its DHCP server and the VM has a vulnerable
client.)

I haven't tried it myself but I have confidence in it and would recommend a
backport.
-- 
Ian.

On 1 June 2015 at 21:32, Kevin Benton blak...@gmail.com wrote:

 I would propose a back-port of it and then continue the discussion on the
 patch. I don't see any major blockers for back-porting it.

 On Mon, Jun 1, 2015 at 7:01 PM, Tidwell, Ryan ryan.tidw...@hp.com wrote:

 Not seeing this on Kilo, we're seeing this on Juno builds (that's
 expected).  I'm interested in a Juno backport, but mainly wanted to be see
 if others had confidence in the fix.  The discussion in the bug report also
 seemed to indicate there were other alternative solutions others might be
 looking into that didn't involve an iptables rule.

 -Ryan

 -Original Message-
 From: Mark McClain [mailto:m...@mcclain.xyz]
 Sent: Monday, June 01, 2015 6:47 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Neutron] virtual machine can not get DHCP
 lease due packet has no checksum


  On Jun 1, 2015, at 7:26 PM, Tidwell, Ryan ryan.tidw...@hp.com wrote:
 
  I see a fix for https://bugs.launchpad.net/neutron/+bug/1244589 merged
 during Kilo.  I'm wondering if we think we have identified a root cause and
 have merged an appropriate long-term fix, or if
 https://review.openstack.org/148718 was merged just so there's at least
 a fix available while we investigate other alternatives.  Does anyone have
 an update to provide?
 
  -Ryan

 The fix works in environments we’ve tested in.  Are you still seeing
 problems?

 mark
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




 --
 Kevin Benton

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Progressing/tracking work on libvirt / vif drivers

2015-06-02 Thread Ian Wells

VIF plugging, but not precisely libvirt VIF plugging, so I'll tout this to
a hopefully interested audience.

At the summit, we wrote up a spec we were thinking of doing at [1].  It
actually proposes two things, which is a little naughty really, but hey.

Firstly we propose that we turn binding into a negotiation, so that Nova
can offer binding options it supports to Neutron and Neutron can pick the
one it likes most.  This is necessary if you happen to use vhostuser with
qemu, as it doesn't work for some circumstances, and desirable all around,
since it means you no longer have to configure Neutron to choose a binding
type that Nova likes and Neutron can choose different binding types
depending on circumstances.  As a bonus, it should make inter-version
compatibility work better.

Secondly we suggest that some of the information that Nova and Neutron
currently calculate independently should instead be passed from Neutron to
Nova, simplifying the Nova code since it no longer has to take an educated
guess at things like TAP device names.  That one is more contentious, since
in theory Neutron could pass an evil value, but if we can find some pattern
that works (and 'pattern' might be literally true, in that you could get
Nova to confirm that the TAP name begins with a magic string and is not
going to be a physical device or other interface on the box) I think that
would simplify the code there.

Read, digest, see what you think.  I haven't put it forward yet (actually
I've lost track of which projects take specs at this point) but I would
very much like to get it implemented and it's not a drastic change (in
fact, it's a no-op until we change Neutron to respect what Nova passes).

[1] https://etherpad.openstack.org/p/YVR-nova-neutron-binding-spec

On 1 June 2015 at 10:37, Neil Jerram neil.jer...@metaswitch.com wrote:

 On 01/06/15 17:45, Neil Jerram wrote:

  Many thanks, John  Dan.  I'll start by drafting a summary of the work
 that I'm aware of in this area, at
 https://etherpad.openstack.org/p/liberty-nova-libvirt-vif-work.


 OK, my first draft of this is now there at [1].  Please could folk with
 VIF-related work pending check that I haven't missed or misrepresented
 them?  Especially, please could owners of the 'Infiniband SR-IOV' and
 'mlnx_direct removal' changes confirm that those are really ready for core
 review?  It would be bad to ask for core review that wasn't in fact wanted.

 Thanks,
 Neil


 [1] https://etherpad.openstack.org/p/liberty-nova-libvirt-vif-work


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread Ian Wells

On 13 May 2015 at 10:30, Vinod Pandarinathan (vpandari) vpand...@cisco.com
wrote:

 - Traditional monitoring tools (Nagios, Zabbix, ) are necessary anyway
 for infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ,
 databases and more) and diagnostic purposes. Adding OpenStack service
 checks is fairly easy if you already have the toolchain.

  The solution is for health-checking, which includes periodically running
 light/mid/heavy
 Control and data plane tests and provide test data. The tool shall not
 have any dependency on one particular monitoring tool
 If monitoring tool is installed, then monitoring data shall be exposed to
 the applications in a consumable fashion.
 As I mentioned earlier, we are not replacing any monitoring solution
 available out there we are leveraging those solutions
  and provide  a clean interface so that the application/tenants and
 Operators know if the cloud is healthy.


To rephrase this:

- Zabbix and friends will monitor an operator's cloud and tell the operator
bad things are happening.  Or they can monitor an application's VMs and see
if the app is happy, and tell the app or its owner.
- Ceilometer will front cloud monitoring solutions and offer those
statistics to tenants of the cloud in ways that (ideally) make sense to the
client.  It lets tenants see stats they couldn't get for themselves.

This isn't quite what we're trying to address.  We had one specific use
case: a cloud application that needs to provide reasonably high
availability uses the Openstack APIs occasionally to try and correct
problems (VM died, app overloaded, etc.) - a pretty normal cloud
application.  If you're interested in maintaining service, you need to know
about single points of failure to work around them, and the cloud control
plane failing is a single point of failure - the APIs stop working, and the
app runs just fine until a second failure that causes them to be used, and
if you haven't done something by that point you get a meltdown.  The idea
of CloudPulse was to be able to say 'the cloud APIs are operating normally'
to applications that are interested.  If they're *not* normal then the
application can take corrective action; for instance, spinning up extra
capacity in another cloud and moving traffic over there.

As you can see, that's a cross-domain sort of monitoring similar to
Ceilometer - the tenant finding out information about the infrastructure
that they can't see directly.  That said, it's a very concise summary
('working'), and we also had in mind that you ran the tests to freshen the
results if the tests hadn't been run recently, rather than looping them
continually.  Also, the history of the results are not really relevant - my
app cares about about whether the control plane works *now*, not if it
worked for 8 hours out of the last 24.

We're scratching an itch.  Absolutely the point of mailing everyone about
it was to see if anyone had better scratching tools, and if people would
like to chat about it at the summit.  What seems to have come out of it is
that yes, there are tools out there that might be usable for the purpose,
and we'd love to hear your opinions and what ideas you have about how we
should do this.  Apparently there are also a lot of people with slightly
different itches to scratch, and I hope you all take the opportunity to get
together at the summit too.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [api] Minor changes to API

2015-04-21 Thread Ian Wells

On 20 April 2015 at 17:52, David Kranz dkr...@redhat.com wrote:

  On 04/20/2015 08:07 PM, Ian Wells wrote:

 Whatever your preference might be, I think it's best we lose the
 ambiguity.  And perhaps advertise that page a little more widely, actually
 - I hadn't come across it in my travels.  And perhaps improve its air of
 authority: rules on this subject should probably live somewhere in a repo
 so that it's clear there's consensus for changes.  Currently anyone can
 change it for any reason, and two years after the last substantive change
 it's hard to say who even knew it was being changed, let alone whether they
 agreed.

 This page has some kind of authority as it is linked to from
 https://wiki.openstack.org/wiki/Governance/Approved/APIStability. At that
 time the guidelines were a work in progress but clearly at this point it
 belongs in a more controlled repo. That said, this document has been
 referenced many times on the dev list and I am not sure that just moving it
 to a repo would increase awareness. It would also need to be more
 advertised.


Yeah - the repo was more an issue of authority, so that when it changes
it's clear that it's been changed and people checked the change.

The awareness thing is probably something that PTLs need to propagate -
reviewers need to know of it and what it says and check it when approving
API-affecting changes.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][code quality] Voting coverage job (-1 if coverage get worse after patch)

2015-04-20 Thread Ian Wells

On 20 April 2015 at 07:40, Boris Pavlovic bo...@pavlovic.me wrote:

 Dan,

 IMHO, most of the test coverage we have for nova's neutronapi is more
 than useless. It's so synthetic that it provides no regression
 protection, and often requires significantly more work than the change
 that is actually being added. It's a huge maintenance burden with very
 little value, IMHO. Good tests for that code would be very valuable of
 course, but what is there now is not.
 I think there are cases where going from 90 to 91% mean adding a ton of
 extra spaghetti just to satisfy a bot, which actually adds nothing but
 bloat to maintain.


 Let's not mix the bad unit tests in Nova with the fact that code should be
 fully covered by well written unit tests.
 This big task can be split into 2 smaller tasks:
 1) Bot that will check that we are covering new code by tests and don't
 introduce regressions


http://en.wikipedia.org/wiki/Code_coverage

You appear to be talking about statement coverage, which is one of the
weaker coverage metrics.

if a:
thing

gets 100% statement coverage if a is true, so I don't need to test when a
is false (which would be at a minimum decision coverage).

I wonder if the focus is wrong.  Maybe helping devs is better than making
more gate jobs, for starters; and maybe overall coverage is not a great
metric when you're changing 100 lines in 100,000.  If you were thinking
instead to provide coverage *tools* that were easy for developers to use,
that would be a different question.  As a dev, I would not be terribly
interested in finding that I've improved overall test coverage from 90.1%
to 90.2%, but I might be *very* interested to know that I got 100% decision
(or even boolean) coverage on the specific lines of the feature I just
added by running just the unit tests that exercise it.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [api] Minor changes to API

2015-04-20 Thread Ian Wells

On 20 April 2015 at 13:02, Kevin L. Mitchell kevin.mitch...@rackspace.com
wrote:

 On Mon, 2015-04-20 at 13:57 -0600, Chris Friesen wrote:
   However, minor changes like that could still possibly break clients
 that are not
   expecting them.  For example, a client that uses the json response as
 arguments
   to a method via **kwargs would start seeing TypeErrors for unexpected
 arguments.
 
  Isn't this what microversions were intended to solve?

 Yes.

  I'm relatively recent with OpenStack, so I don't have the history.  Did
 anyone
  ever consider explicitly allowing new attributes to be added to
 responses?

 The problem is advertising that this information is available.


There are some cases where that's not necessary: a call returns a JSON
dict.  If, when the dict does not contain the key, some backward compatible
behaviour is assumed, then you are in fact 100% backward compatible.

There are other more ambiguous cases, such as setting an attribute that
doesn't exist in some cases and getting a failure response; there it's nice
to be able to tell in advance via a detection call what to expect.

Anyway, I've been bitten by not knowing the unwritten rules so I do agree
we could use a policy.

That's
 why, in the past, nova required a new extension even if all you were
 doing was adding an attribute, and that's why we want a new microversion
 nowadays.


Depends on your  project.  For Neutron:

- some IPv6 changes introduced new (settable) subnet attributes without a
bump in version; these were merged in and are now released in Juno
- the recent VLAN and MTU changes introduced new network attributes without
a bump in version; these were certainly argued about as a break with
backward compatibility (and eventually became extensions, though for other
reasons than simply that one)
- extensions in Neutron can be used to add attributes without changing the
core interface; extension detection APIs exist to make planning easier

It would be nice to have a consistent policy here; it would make future
decision making easier and it would make it easier to write specs if we
knew what was expected and the possible implementations weren't up for
(quite so much) debate.  For different reasons, Neutron extensions are also
not favoured, so there's no clear cut choice to make.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [api] Minor changes to API

2015-04-20 Thread Ian Wells

On 20 April 2015 at 15:23, Matthew Treinish mtrein...@kortar.org wrote:

 On Mon, Apr 20, 2015 at 03:10:40PM -0700, Ian Wells wrote:
  It would be nice to have a consistent policy here; it would make future
  decision making easier and it would make it easier to write specs if we
  knew what was expected and the possible implementations weren't up for
  (quite so much) debate.  For different reasons, Neutron extensions are
 also
  not favoured, so there's no clear cut choice to make.

 Uhm, there is: https://wiki.openstack.org/wiki/APIChangeGuidelines
 and if you read that adding attrs without advertising it (using an
 extension, microversion, or whatever) is not an allowed change.


It is also not an unallowed change (given that there's a section that
appears to define what an unallowed attribute change is).  The page reads
very awkwardly.

Whatever your preference might be, I think it's best we lose the
ambiguity.  And perhaps advertise that page a little more widely, actually
- I hadn't come across it in my travels.  And perhaps improve its air of
authority: rules on this subject should probably live somewhere in a repo
so that it's clear there's consensus for changes.  Currently anyone can
change it for any reason, and two years after the last substantive change
it's hard to say who even knew it was being changed, let alone whether they
agreed.

Just adding
 things without a new extension or microversion makes the end user story
 terrible
 because it puts the burden completely on the user to try and figure out
 which
 version 2 (or whatever it currently is marked as) of the api the cloud
 they're
 using speaks. Think about it if it were a library, that just started adding
 things to it's interfaces without bumping any version. Even if it was a
 backwards compatible addition you would still expect the version to
 increment to
 indicate that the new stuff was there and available for use.


I appreciate your point and I'd be happy for that to be more obviously our
position.

The issue that the MTU change hit was the conflict between this general
principle and the consensus in its project.  Neutron's core team was giving
a strong 'no more extensions' vibe at the last summit, Neutron hasn't got
microversioning, and the content of that document is two years old and
apparently not very widely known by reviewers as well as me.  No choice
would have been right.

So again, how about we fix that document up and put it somewhere where it
receives a bit more control and attention?
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Magnum] Containers and networking

2015-04-03 Thread Ian Wells

This puts me in mind of a previous proposal, from the Neutron side of
things. Specifically, I would look at Erik Moe's proposal for VM ports
attached to multiple networks:
https://blueprints.launchpad.net/neutron/+spec/vlan-aware-vms .

I believe that you want logical ports hiding behind a conventional port
(which that has); the logical ports attached to a variety of Neutron
networks despite coming through the same VM interface (ditto); and an encap
on the logical port with a segmentation ID (that uses exclusively VLANs,
which probably suits here, though there's no particular reason why it has
to be VLANs or why it couldn't be selectable).  The original concept didn't
require multiple ports attached to the same incoming subnetwork, but that's
a comparatively minor adaptation.
-- 
Ian.


On 2 April 2015 at 11:35, Russell Bryant rbry...@redhat.com wrote:

 On 04/02/2015 01:45 PM, Kevin Benton wrote:
  +1. I added a suggestion for a container networking suggestion to the
  etherpad for neutron. It would be sad if the container solution built
  yet another overlay on top of the Neutron networks with yet another
  network management workflow. By the time the packets are traveling
  across the wires, it would be nice not to have double encapsulation from
  completely different systems.

 Yeah, that's what I like about this proposal.  Most of the existing work
 in this space seems to result in double encapsulation.  Now we just need
 to finish building it ...

 --
 Russell Bryant

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] VLAN trunking network for NFV

2015-03-25 Thread Ian Wells

Today:

You need to ensure that your cloud is using a suitable networking config -
with ML2, use Linuxbridge and either VXLAN or GRE.  If you're using either
OVS or VLAN you won't get a trunking network.  A tenant can't tell this, so
they can't easily tell that all or any networks are VLAN trunks without
testing the network.

Tomorrow (i.e. on trunk, or when Kilo is released):

You can use the vlan_transparent flag on a network to explicitly request a
trunk.  The dataplane code hasn't changed, so the cloud will report that
the network is a trunk if you're using ML2 with Linuxbridge and GRE or
VXLAN, and will report you can't have a trunk if you use OVS or VLAN.  This
means that you are no more likely to get a trunk if you ask for one - you
still need a suitable configuration - but your application will immediately
know if it works or not (the old alternative was pretty much to start it
and see if it works, which wasn't helpful).

ML2 now has a reference implementation of this; other plugins (to the best
of my knowledge) don't support the flag.  When they do, then any plugin or
driver can theoretically be written to behave differently if you have ask
for a trunk; for instance, in the future we can change the code to program
OVS differently if you want a trunk, or change ML2 to use a trunk-safe
VXLAN overlay even though VLAN networks are also available in a system.  No
driver does this today.
-- 
Ian.

On 24 March 2015 at 17:48, Guo, Ruijing ruijing@intel.com wrote:

  I am trying to understand how guest os use trunking network.



 If guest os use bridge like Linuxbride and OVS, how we launch it and how
 libvirt to support it?



 Thanks,

 -Ruijing





 *From:* Ian Wells [mailto:ijw.ubu...@cack.org.uk]
 *Sent:* Wednesday, March 25, 2015 2:18 AM
 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [Neutron] VLAN trunking network for NFV



 That spec ensures that you can tell what the plugin is doing.  You can ask
 for a VLAN transparent network, but the cloud may tell you it can't make
 one.

 The OVS driver in Openstack drops VLAN tagged packets, I'm afraid, and the
 spec you're referring to doesn't change that.  The spec does ensure that if
 you try and create a VLAN trunk on a cloud that uses the OVS driver, you'll
 be told you can't.  in the future, the OVS driver can be fixed, but that's
 how things stand at present.  Fixing the OVS driver really involves getting
 in at the OVS flow level - can be done, but we started with the basics.

 If you want to use a VLAN trunk using the current code, I recommend VXLAN
 or GRE along with the Linuxbridge driver, both of which support VLAN
 transparent networking.  If they're configured and you ask for a VLAN trunk
 you'll be told you got one.
 --

 Ian.





 On 24 March 2015 at 09:43, Daniele Casini daniele.cas...@dektech.com.au
 wrote:

 Hi all:

 in reference to the following specification about the creation of VLAN
 trunking network for NFV

 https://review.openstack.org/#/c/136554/3/specs/kilo/nfv-vlan-trunks.rst

 I would like to better understand how the tagged traffic will be realized.
 In order to explain myself, I report the following use case:

 A VNF is deployed in one VM, which has a trunk port carrying traffic for
 two VLANs over a single link able to transport more than one VLAN through a
 single integration-bridge (br-int) port. So, How does br-int manage the
 VLAN-ID? In other words, what are the action performed by the br-int when a
 VM forwards traffic to another host?
 Does it put an additional tag or replace the existing one keeping the
 match with a table or something like that?

 Thank you very much.

 Daniele


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] VLAN trunking network for NFV

2015-03-24 Thread Ian Wells

On 24 March 2015 at 11:45, Armando M. arma...@gmail.com wrote:

 This may be besides the point, but I really clash with the idea that we
 provide a reference implementation on something we don't have CI for...


Aside from the unit testing, it is going to get a test for the case we can
test - when using the standard config networking that Tempest runs with,
does it return the right answer?  That's pretty much the level of
commitment that the entire test suite gives.

Beyond that, it is about as well tested by the upstream testing as the ML2
plugin (which, in the main tests, is tested in one config only) and more
well tested than the LB driver (I don't eisn't touched by the system tests
but is still in-tree).  I'm not out to make the test coverage any worse,
and I apologise that we can't test this when it's returning a positive
result, but the system tests do have limitations in this regard.

That said, I'd love to put a positive test in the system tests if only we
can work out how to do one - suggestions welcome...
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] VLAN trunking network for NFV

2015-03-24 Thread Ian Wells

That spec ensures that you can tell what the plugin is doing.  You can ask
for a VLAN transparent network, but the cloud may tell you it can't make
one.

The OVS driver in Openstack drops VLAN tagged packets, I'm afraid, and the
spec you're referring to doesn't change that.  The spec does ensure that if
you try and create a VLAN trunk on a cloud that uses the OVS driver, you'll
be told you can't.  in the future, the OVS driver can be fixed, but that's
how things stand at present.  Fixing the OVS driver really involves getting
in at the OVS flow level - can be done, but we started with the basics.

If you want to use a VLAN trunk using the current code, I recommend VXLAN
or GRE along with the Linuxbridge driver, both of which support VLAN
transparent networking.  If they're configured and you ask for a VLAN trunk
you'll be told you got one.
-- 
Ian.


On 24 March 2015 at 09:43, Daniele Casini daniele.cas...@dektech.com.au
wrote:

 Hi all:

 in reference to the following specification about the creation of VLAN
 trunking network for NFV

 https://review.openstack.org/#/c/136554/3/specs/kilo/nfv-vlan-trunks.rst

 I would like to better understand how the tagged traffic will be realized.
 In order to explain myself, I report the following use case:

 A VNF is deployed in one VM, which has a trunk port carrying traffic for
 two VLANs over a single link able to transport more than one VLAN through a
 single integration-bridge (br-int) port. So, How does br-int manage the
 VLAN-ID? In other words, what are the action performed by the br-int when a
 VM forwards traffic to another host?
 Does it put an additional tag or replace the existing one keeping the
 match with a table or something like that?

 Thank you very much.

 Daniele


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][IPAM] Uniqueness of subnets within a tenant

2015-03-22 Thread Ian Wells

On 22 March 2015 at 07:48, Jay Pipes jaypi...@gmail.com wrote:

 On 03/20/2015 05:16 PM, Kevin Benton wrote:

 To clarify a bit, we obviously divide lots of things by tenant (quotas,
 network listing, etc). The difference is that we have nothing right now
 that has to be unique within a tenant. Are there objects that are
 uniquely scoped to a tenant in Nova/Glance/etc?


 Yes. Virtually everything is :)


Everything is owned by a tenant.  Very few things are one per tenant, where
is where this feels like it's leading.

Seems to me that an address pool corresponds to a network area that you can
route across (because routing only works over a network with unique
addresses and that's what an address pool does for you).  We have those
areas and we use NAT to separate them (setting aside the occasional
isolated network area with no external connections).  But NAT doesn't
separate tenants, it separates externally connected routers: one tenant can
have many of those routers, or one router can be connected to networks in
both tenants.  We just happen to frequently use the one external router per
tenant model, which is why address pools *appear* to be one per tenant.  I
think, more accurately, an external router should be given an address pool,
and tenants have nothing to do with it.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Neutron extenstions

2015-03-20 Thread Ian Wells

On 20 March 2015 at 15:49, Salvatore Orlando sorla...@nicira.com wrote:

 The MTU issue has been a long-standing problem for neutron users. What
 this extension is doing is simply, in my opinion, enabling API control over
 an aspect users were dealing with previously through custom made scripts.


Actually, version 1 is not even doing that; it's simply telling the user
what happened, which the user has never previously been able to tell, and
configuring the network consistently.  I don't think we implemented the
'choose an MTU' API, we're simply telling you the MTU you got.

Since this is frequently smaller than you think (there are some
non-standard features that mean you frequently *can* pass larger packets
than should really work, hiding the problem at the cost of a performance
penalty for doing it) and there was previously no way of getting any idea
of what it is, this is a big step forward.

And to reiterate, because this point is often missed: different networks in
Neutron have different MTUs.  My virtual networks might be 1450.  My
external network might be 1500.  The provider network to my NFS server
might be 9000.  There is *nothing* in today's Neutron that lets you do
anything about that, and - since Neutron routers and Neutron DHCP agents
have no means of dealing with different MTU networks - really strange
things happen if you try some sort of workaround.

If a plugin does not support specifically setting the MTU parameter, I
 would raise a 500 NotImplemented error. This will probably create a
 precedent, but as I also stated in the past, I tend to believe this might
 actually be better than the hide  seek game we do with extension.


I am totally happy with this, if we agree it's what we want to do, and it
makes plenty of sense for when you request an MTU.

The other half of the interface is when you don't request a specific MTU
but you'd like to know what MTU you got - the approach we have today is
that if the MTU can't be determined (either a plugin with no support or one
that's short on information) then the value on the network object is
unset.  I assume people are OK with that.


 The vlan_transparent feature serves a specific purpose of a class of
 applications - NFV apps.


To be pedantic - the uses for it are few and far between but I wouldn't
reduce it to 'NFV apps'.  http://virl.cisco.com/ I wrote on Openstack a
couple of years ago and it's network simulation but not actually NFV.
People implementing resold services (...aaS) in VMs would quite like VLANs
on their virtual networks too, and this has been discussed in at least 3
summits so far.  I'm sure other people can come up with creative reasons.

It has been speculated during the review process whether this was actually
 a provider network attribute.


Which it isn't, just for reference.


 In theory it is something that characterises how the network should be
 implemented in the backend.

However it was not possible to make this ad admin attribute because also
 non-admins might require a vlan_transparent network. Proper RBAC might
 allow us to expose this attribute only to a specific class of users, but
 Neutron does not yet have RBAC [1]


I think it's a little early to worry about restricting the flag.  The
default implementation pretty much returns a constant (and says if that
constant is true when you'd like it to be) - it's implemented as a call for
future expansion.

Because of its nature vlan_transparent is an attribute that probably
 several plugins will not be able to understand.


And again backward compatibility is documented, and actually pretty
horrible now I come to reread it, so if we wanted to go with a 500 as above
that's quite reasonable.


 Regardless of what the community decides regardless extensions vs
 non-extension, the code as it is implies that this flag is present in every
 request - defaulting to False.


Which is, in fact, not correct (or at least not the way it's supposed to
be, anyway; I need to check the code).

The original idea was that if it's not present in the request then you
can't assume the network you're returned is a VLAN trunk, but you also
can't assume it isn't - as in, it's the same as the current behaviour,
where the plugin does what it does and you get to put up with the results.
The difference is that the plugin now gets to tell you what it's done.


 This can lead to somewhat confusing situation, because users can set it to
 True, and a get a 200 response. As a user, I would think that Neutron has
 prepared for me a nice network which is vlan transparent... but if Neutron
 is running any plugin which does not support this extension I would be in a
 for a huge disappointment when I discover my network is not vlan
 transparent at all!


The spec has detail on how the user works this out, as I say.
Unfortunately it's not by return code

I reckon that perhaps, as a short term measure, the configuration flag
 Armando mentioned might be used to obscure completely the API attribute

Re: [openstack-dev] [Neutron] Neutron extenstions

2015-03-20 Thread Ian Wells

/nfv-vlan-trunks,n,z
 [5] https://review.openstack.org/#/c/136760/

 On 19 March 2015 at 14:56, Ian Wells ijw.ubu...@cack.org.uk wrote:

 On 19 March 2015 at 11:44, Gary Kotton gkot...@vmware.com wrote:

 Hi,
 Just the fact that we did this does not make it right. But I guess that
 we
 are starting to bend the rules. I think that we really need to be far
 more
 diligent about this kind of stuff. Having said that we decided the
 following on IRC:
 1. Mtu will be left in the core (all plugins should be aware of this and
 treat it if necessary)
 2. Vlan-transparency will be moved to an extension. Pritesh is working on
 this.


 The spec started out as an extension, and in its public review people
 requested that it not be an extension and that it should instead be core.
 I accept that we can change our minds, but I believe there should be a good
 reason for doing so.  You haven't given that reason here and you haven't
 even said who the 'we' is that decided this.  Also, as the spec author, I
 had a conversation with you all but there was no decision at the end of it
 (I presume that came afterward) and I feel that I have a reasonable right
 to be involved.  Could you at least summarise your reasoning here?

 I admit that I prefer this to be in core, but I'm not terribly choosy and
 that's not why I'm asking.  I'm more concerned that this is changing our
 mind at literally the last moment, and in turn wasting a developer's time,
 when there was a perfectly good process to debate this before coding was
 begun, and again when the code was up for review, both of which apparently
 failed.  I'd like to understand how we avoid getting here again in the
 future.  I'd also like to be certain we are not simply reversing a choice
 on a whim.
 --
 Ian.

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Neutron extenstions

2015-03-19 Thread Ian Wells

There are precedents for this.  For example, the attributes that currently
exist for IPv6 advertisement are very similar:

- added during the run of a stable Neutron API
- properties added on a Neutron object (MTU and VLAN affect network, but
IPv6 affects subnet - same principle though)
- settable, but with defaults so they're optional
- turn up in output when the subnet information is fetched

With the one caveat (write your code to ignore properties you don't
understand) this seems to address backward compatibility in both the IPv6
and the MTU/VLAN attribute changes - if you completely ignore the attribute
behaviour is enough the way it used to be that your app won't break.

Now, it may be that no-one noticed the ipv6 changes as they went through,
but given the long debate about what the attributes should look like at the
time they did get reasonable attention.  Do we want to change the rules for
future API changes?
-- 
Ian.


On 19 March 2015 at 10:07, Gary Kotton gkot...@vmware.com wrote:

  Hi,
 Until now all changes to the API’s have been made in separate extensions
 and not in the base. These should actually be on the provider networks
 extension.
 First this code is not supported by any of the plugins other than the ML2
 (I am not sure if this break things – it certain broke the unit tests).
 Secondly these two changes do not have open source reference
 implementations (but that is digressing from the problem).
 I really think that we need to revert these and have the extensions done
 in the standard fasion.
 Thanks
 Gary


   From: Brandon Logan brandon.lo...@rackspace.com
 Reply-To: OpenStack List openstack-dev@lists.openstack.org
 Date: Thursday, March 19, 2015 at 6:20 PM
 To: OpenStack List openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Neutron] Neutron extenstions

   Isn't this argument as to whether those fields should be turned
 off/on, versus just always being on?  Are there any guidelines as to what
 fields are allowed to be added in that base resource attr map?  If ML2
 needs these and other fields, should they just always be on?


  Thanks,

 Brandon
  --
 *From:* Doug Wiegley doug...@parksidesoftware.com
 *Sent:* Thursday, March 19, 2015 11:01 AM
 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [Neutron] Neutron extenstions

  Hi Gary,

  First I’m seeing these, but I don’t see that they’re required on input,
 unless I’m mis-reading those reviews.  Additional of new output fields to a
 json object, or adding optional inputs, is not generally considered to be
 backwards incompatible behavior in an API. Does OpenStack have a stricter
 standard on that?

  Thanks,
 doug


   On Mar 19, 2015, at 6:37 AM, Gary Kotton gkot...@vmware.com wrote:

  Hi,
 Changed the subject so that it may draw a little attention.
 There were 2 patches approved that kind of break the API (in my humble
 opinion):
 https://review.openstack.org/#/c/154921/ and
 https://review.openstack.org/#/c/158420/
 In both of these two new fields were added to the base attributes – mtu
 and vlan_transparency
 Reverts for them are:
 https://review.openstack.org/165801 (mtu) and
 https://review.openstack.org/165776 (vlan transparency).
 In my opinion these should be added as separate extensions.
 Thanks
 Gary

   From: Gary Kotton gkot...@vmware.com
 Reply-To: OpenStack List openstack-dev@lists.openstack.org
 Date: Thursday, March 19, 2015 at 2:32 PM
 To: OpenStack List openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [Neutron] VLAN transparency support

   Hi,
 This patch has the same addition too -
 https://review.openstack.org/#/c/154921/. We should also revert that one.
 Thanks
 Gary

   From: Gary Kotton gkot...@vmware.com
 Reply-To: OpenStack List openstack-dev@lists.openstack.org
 Date: Thursday, March 19, 2015 at 1:14 PM
 To: OpenStack List openstack-dev@lists.openstack.org
 Subject: [openstack-dev] [Neutron] VLAN transparency support

   Hi,
 It appears that https://review.openstack.org/#/c/158420/ update the base
 attributes for the networks. Is there any reason why this was not added as
 a separate extension like all others.
 I do not think that this is the correct way to go and we should do this as
 all other extensions have been maintained. I have posted a revert (
 https://review.openstack.org/#/c/165776/) – please feel free to knack if
 it is invalid.
 Thanks
 Gary

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [Neutron] VLAN transparency support

2015-03-19 Thread Ian Wells

Per the other discussion on attributes, I believe the change walks in
historical footsteps and it's a matter of project policy choice.  That
aside, you raised a couple of other issues on IRC:

- backward compatibility with plugins that haven't adapted their API - this
is addressed in the spec, which should have been implemented in the patches
(otherwise I will downvote the patch myself) - behaviour should be as
before with the additional feature that you can now tell more about what
the plugin is thinking
- whether they should be core or an extension - this is a more personal
opinion, but on the grounds that all networks are either trunks or not, and
all networks have MTUs, I think they do want to be core.  I would like to
see plugin developers strongly encouraged to consider what they can do on
both elements, whereas an extension tends to sideline functionality from
view so that plugin writers don't even know it's there for consideration.

Aside from that, I'd like to emphasise the value of these patches, so
hopefully we can find a way to get them in in some form in this cycle.  I
admit I'm interested in them because they make it easier to do NFV.  But
they also help normal cloud users and operators, who otherwise have to do
some really strange things [1].  I think it's maybe a little unfair to post
reversion patches before discussion, particularly when the patch works,
passes tests and implements an approved spec correctly.
-- 
Ian.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1138958 (admittedly first
link I found, but there's no shortage of them)

On 19 March 2015 at 05:32, Gary Kotton gkot...@vmware.com wrote:

  Hi,
 This patch has the same addition too -
 https://review.openstack.org/#/c/154921/. We should also revert that one.
 Thanks
 Gary

   From: Gary Kotton gkot...@vmware.com
 Reply-To: OpenStack List openstack-dev@lists.openstack.org
 Date: Thursday, March 19, 2015 at 1:14 PM
 To: OpenStack List openstack-dev@lists.openstack.org
 Subject: [openstack-dev] [Neutron] VLAN transparency support

   Hi,
 It appears that https://review.openstack.org/#/c/158420/ update the base
 attributes for the networks. Is there any reason why this was not added as
 a separate extension like all others.
 I do not think that this is the correct way to go and we should do this as
 all other extensions have been maintained. I have posted a revert (
 https://review.openstack.org/#/c/165776/) – please feel free to knack if
 it is invalid.
 Thanks
 Gary

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Neutron extenstions

2015-03-19 Thread Ian Wells

On 19 March 2015 at 11:44, Gary Kotton gkot...@vmware.com wrote:

 Hi,
 Just the fact that we did this does not make it right. But I guess that we
 are starting to bend the rules. I think that we really need to be far more
 diligent about this kind of stuff. Having said that we decided the
 following on IRC:
 1. Mtu will be left in the core (all plugins should be aware of this and
 treat it if necessary)
 2. Vlan-transparency will be moved to an extension. Pritesh is working on
 this.


The spec started out as an extension, and in its public review people
requested that it not be an extension and that it should instead be core.
I accept that we can change our minds, but I believe there should be a good
reason for doing so.  You haven't given that reason here and you haven't
even said who the 'we' is that decided this.  Also, as the spec author, I
had a conversation with you all but there was no decision at the end of it
(I presume that came afterward) and I feel that I have a reasonable right
to be involved.  Could you at least summarise your reasoning here?

I admit that I prefer this to be in core, but I'm not terribly choosy and
that's not why I'm asking.  I'm more concerned that this is changing our
mind at literally the last moment, and in turn wasting a developer's time,
when there was a perfectly good process to debate this before coding was
begun, and again when the code was up for review, both of which apparently
failed.  I'd like to understand how we avoid getting here again in the
future.  I'd also like to be certain we are not simply reversing a choice
on a whim.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Capability Discovery API

2015-03-18 Thread Ian Wells

On 18 March 2015 at 03:33, Duncan Thomas duncan.tho...@gmail.com wrote:

 On 17 March 2015 at 22:02, Davis, Amos (PaaS-Core) 
 amos.steven.da...@hp.com wrote:

 Ceph/Cinder:
 LVM or other?
 SCSI-backed?
 Any others?


 I'm wondering why any of the above matter to an application.


The Neutron requirements list is the same.  Everything you've listed
details implementation details with the exception of shared networks (which
are a core feature, and so it's actually rather unclear what you had in
mind there).

Implementation details should be hidden from cloud users - they don't care
if I'm using ovs/vlan, and they don't care that I change my cloud one day
to run ovs/vxlan, they only care that I deliver a cloud that will run their
application - and since I care that I don't break applications when I make
under the cover changes I will be thinking carefully about that too. I
think you could develop a feature list, mind, just that you've not managed
it here.

For instance: why is an LVM disk different from one on a Netapp when you're
a cloud application and you always attach a volume via a VM?  Well, it
basically isn't, unless there are features (like for instance a minimum TPS
guarantee) that are different between the drivers.  Cinder's even stranger
here, since you can have multiple backend drivers simultaneously and a
feature may not be present in all of them.

Also, in Neutron, the current MTU and VLAN work is intended to expose some
of those features to the app more than they were previously (e.g. 'can I
use a large MTU on this network?'), but there are complexities in exposing
this in advance of running the application.  The MTU size is not easy to
discover in advance (it varies depending on what sort of network you're
making), and what MTU you get for a specific network is very dependent on
the network controller (network controllers can choose to not expose it at
all, expose it with upper bounds in place, or expose it and try so hard to
implement what the user requests that it's not immediately obvious whether
a request will succeed or fail, for instance).  You could say 'you can ask
for large MTU networks' - that is a straightforward feature - but some apps
will fail to run if they ask and get declined.

This is not to say there isn't useful work that could be done here, just
that there may be some limitations on what is possible.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls

2015-03-13 Thread Ian Wells

On 12 March 2015 at 05:33, Fredy Neeser fredy.nee...@solnet.ch wrote:

2. I'm using policy routing on my hosts to steer VXLAN traffic (UDP
dest. port 4789) to interface br-ex.12 -- all other traffic from
192.168.1.14 is source routed from br-ex.1, presumably because br-ex.1 is a
lower-numbered interface than br-ex.12 (?) -- interesting question whether
I'm relying here on the order in which I created these two interfaces.

OK, I have to admit I've never used policy routing in anger so I can't
speak with much confidence here. I wonder if anything (link down, for
instance) might cause Linux to change its preference behaviour, though, and
your to-the-world packets haven't got a policy from what you say so a
preference change would be a disaster.

3. It's not clear to me how to setup multiple nodes with packstack if a
node's tunnel IP does not equal its admin IP (or the OpenStack API IP in
case of a controller node). With packstack, I can only specify the compute
node IPs through CONFIG_COMPUTE_HOSTS. Presumably, these IPs are used for
both packstack deployment (admin IP) and for configuring the VXLAN tunnel
IPs (local_ip and remote_ip parameters). How would I specify different IPs
for these purposes? (Recall that my hosts have a single NIC).

I don't think the single NIC is an issue, particularly, and even less so if
you have multiple interfaces, even VLAN interfaces, with different
addresses. At that point you should be able to use
CONFIG_NEUTRON_OVS_TUNNEL_IF=eth1.12 , which would need to be created and
addressed by the point you run packstack, as it expects it to be there at
this point. In fact the closed bug
https://bugs.launchpad.net/packstack/+bug/1279517 suggests that you're not
the first to try this and it does work (though since the change it refers
to isn't merged you might need to say ...=eth1_12 to keep packstack happy).

You may find that configuring a VLAN interface for eth1.12 (not in a
bridge, with a local address suitable for communication with compute nodes,
for VXLAN traffic) and eth1.1 (in br-ex, for external traffic to use) does
better for you.

Hmm, I only have one NIC (eth0).

Apparently I can't read - where I'm putting eth1 I mean eth0 in your setup,
I must have misread it early on. I'll try and make the switch.

eth0.1 is shorthand notation for eth0 VLAN 1, and there are a bunch of
interface management commands to create interfaces of this type. It appears
to be possible to configure this in the network setup scripts -
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Configure_802_1Q_VLAN_Tagging_Using_the_Command_Line.html
describes the Redhat way, though I've only done this on Ubuntu and Debian
myself.

In order to attach eth0 to br-ex, I had to configure it as an OVSPort.
Maybe I misunderstand your alternative, but are you suggesting to
configure eth0.1 as an OVSPort (connected to br-ex), and eth0.12 as a
standalone interface? (Not sure a physical interface can be brain split
in such a way.)

eth0.1 is a full on network interface and should work as an OVS port. You
would configure the external network in Openstack as flat, rather than
containing VLAN segments, because the tagging is done outside of Openstack
with this approach (otherwise you'd end up with double tagged packets).

And yes, eth0.12 would be a standalone interface.

Note that my physical switch uses a native VLAN of 1 and is configured
with Untag all ports for VLAN 1. Moreover, OVSPort eth0 (attached to
br-ex) is configured for VLAN trunking with a native VLAN of 1 (vlan_mode:
native-untagged, trunks: [1,12], tag: 1), so within bridge br-ex, native
packets are tagged 1.

Yes, as I say, if you moved over to the eth0.1 mechanism above you'd want
the packets to be untagged at the eth0.1 OVS port, because receiving them
via eth0.1 would untag them (and sending them would tag them) and OVS
doesn't need to help you out on the VLAN front any more.

I'm still not a fan of your setup but I don't know if it's just because
it's not where my natural preference lies. You may be inches from making
it work perfectly, and I'm not sure I would be able to tell. That said,
policy routing seems like a workaround to a problem you're having with
packstack; I would definitely go with two addresses if there were any way
to make it configure properly. If I were doing this there would also be
quite a lot of experimentation to verify my guesswork, I have to admit, so
it's not an easy answer.
--
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls

2015-03-11 Thread Ian Wells

On 11 March 2015 at 04:27, Fredy Neeser fredy.nee...@solnet.ch wrote:

 7: br-ex.1: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state
 UNKNOWN group default
 link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
 inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.1
valid_lft forever preferred_lft forever

 8: br-ex.12: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1554 qdisc noqueue
 state UNKNOWN group default
 link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
 inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.12
valid_lft forever preferred_lft forever


I find it hard to believe that you want the same address configured on
*both* of these interfaces - which one do you think will be sending packets?

You may find that configuring a VLAN interface for eth1.12 (not in a
bridge, with a local address suitable for communication with compute nodes,
for VXLAN traffic) and eth1.1 (in br-ex, for external traffic to use) does
better for you.

I'm also not clear what your Openstack API endpoint address or MTU is -
maybe that's why the eth1.1 interface is addressed?  I can tell you that if
you want your API to be on the same address 192.168.1.14 as the VXLAN
tunnel endpoints then it has to be one address on one interface and the two
functions will share the same MTU - almost certainly not what you're
looking for.  If you source VXLAN packets from a different IP address then
you can put it on a different interface and give it a different MTU - which
appears to fit what you want much better.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron][nfv] is there any reason neutron.allow_duplicate_networks should not be True by default?

2015-03-11 Thread Ian Wells

On 11 March 2015 at 10:56, Matt Riedemann mrie...@linux.vnet.ibm.com
wrote:

 While looking at some other problems yesterday [1][2] I stumbled across
 this feature change in Juno [3] which adds a config option
 allow_duplicate_networks to the [neutron] group in nova. The default
 value is False, but according to the spec [4] neutron allows this and it's
 just exposing a feature available in neutron via nova when creating an
 instance (create the instance with 2 ports from the same network).

 My question then is why do we have a config option to toggle a feature
 that is already supported in neutron and is really just turning a failure
 case into a success case, which is generally considered OK by our API
 change guidelines [5].

 I'm wondering if there is anything about this use case that breaks other
 NFV use cases, maybe something with SR-IOV / PCI?  If not, I plan on
 pushing a change to deprecate the option in Kilo and remove it in Liberty
 with the default being to allow the operation.


This was all down to backward compatibility.

Nova didn't allow two interfaces on the same Neutron network.  We tried to
change this by filing a bug, and the patches got rejected because the
original behaviour was claimed to be intentional and desirable.  (It's not
clear that it was intentional behaviour because it was never documented,
but the same lack of documented intent meant it's also not clear it was a
bug, so the situation was ambiguous.)

Eventually it was fixed as new functionality using a spec [1] so that the
change and reasoning could be clearly described, and because of the
previous concerns, Racha, who implemented the spec, additionally chose to
use a config item to preserve the original behaviour unless the new one was
explicitly requested.
-- 
Ian.

[1]
https://review.openstack.org/#/c/97716/5/specs/juno/nfv-multiple-if-1-net.rst
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Question about bug 1314614

2015-03-08 Thread Ian Wells

On 6 March 2015 at 13:16, Sławek Kapłoński sla...@kaplonski.pl wrote:

 Hello,

 Today I found bug https://bugs.launchpad.net/neutron/+bug/1314614 because
 I
 have such problem on my infra.


(For reference, if you delete a port that a Nova is using - it just goes
ahead and deletes the port from Neutron and leaves the VIF in an odd state,
disconnected and referring to a port that no longer exists.)

I saw that bug is In progress but change is abandoned quite long time
 ago. I
 was wondering is it possible that neutron will send notification to nova
 that
 such port was deleted in neutron? I know that in Juno neutron is sending
 notifications to nova when port is UP on compute node so maybe same
 mechanism
 can be used to notify nova that port is no longer exists and nova should
 delete it?


What behaviour are you looking for?

The patch associated with the bug falls attempts to stop deletion of used
ports.  It falls far short of implementing consistent behaviour, which
would have to take into account everything that used ports (including DHCP,
L3, network services, etc.), it would probably need to add an 'in-use' flag
to the port itself, and it changes the current API behaviour rather
markedly.  We could go there but there's much more code to be written.

Someone on the bug suggests removing the VIF from the instance if the port
is deleted, but I don't think that's terribly practical - for some instance
containers it would not be possible.

The current behaviour does seem to be consistent and logical, if perhaps
unexpected and a bit rough around the edges.  I'm not sure orphaning and
isolating a VIF is actually a bad thing if you know it's going to happen,
though it needs to be clear from the non-Neutron side that the VIF is no
longer bound to a port, which is where things seem to fall down right now.

I've also found no documentation about when delete should work and when it
shouldn't, or what happens if the port is bound (the API and CLI document
say that the operation 'deletes a port' and not much else).
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [horizon] Do No Evil

2015-03-07 Thread Ian Wells

With apologies for derailing the question, but would you care to tell us
what evil you're planning on doing?  I find it's always best to be informed
about these things.
-- 
Ian.

(Why yes, it *is* a Saturday morning.)

On 6 March 2015 at 12:23, Michael Krotscheck krotsch...@gmail.com wrote:

 Heya!

 So, a while ago Horizon pulled in JSHint to do javascript linting, which
 is awesome, but has a rather obnoxious Do no evil licence in the
 codebase: https://github.com/jshint/jshint/blob/master/src/jshint.js

 StoryBoard had the same issue, and I've recently replaced JSHint with
 ESlint for just that reason, but I'm not certain it matters as far as
 OpenStack license compatibility. I'm personally of the opinion that tools
 used != code shipped, but I am neither a lawyer nor a liable party should
 my opinion be wrong. Is this something worth revisiting?

 Michael

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] problems with huge pages and libvirt

2015-02-02 Thread Ian Wells

On 2 February 2015 at 09:49, Chris Friesen chris.frie...@windriver.com
wrote:

 On 02/02/2015 10:51 AM, Jay Pipes wrote:

 This is a bug that I discovered when fixing some of the NUMA related nova
 objects. I have a patch that should fix it up shortly.


 Any chance you could point me at it or send it to me?

  This is what happens when we don't have any functional testing of stuff
 that is
 merged into master...


 Indeed.  Does tempest support hugepages/NUMA/pinning?


This is a running discussion, but largely no - because this is ited to the
capabilities of the host, there's no guarantee for a given scenario what
result you would get (because Tempest will run on any hardware).

If you have test cases that should pass or fail on a NUMA-capable node, can
you write them up?  We're working on NUMA-specific testing right now
(though I'm not sure who, specifically, is working on the test case side of
that).
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova][Neutron] Thoughts on the nova-neutron interface

2015-01-28 Thread Ian Wells

On 28 January 2015 at 17:32, Robert Collins robe...@robertcollins.net
wrote:

 E.g. its a call (not cast) out to Neutron, and Neutron returns when
 the VIF(s) are ready to use, at which point Nova brings the VM up. If
 the call times out, we error.


I don't think this model really works with distributed systems, and it
really doesn't work when you have a limited number of threads to play with
- because they get consumed by anything that has to wait a long time for a
thing to happen, and eventually you can't service requests any more.  Also,
it's entirely opposite to what Nova does.  Does it return when the VM is
running?  No, it returns when the VM is requested, saying 'I note your
request and will act on it in my own time'.

What does Neutron have to do to complete a call?  That's entirely dependent
on the driver, but it could be talking to one, ten or a thousand devices,
any of which might be slow to respond: there is no upper bound on how long
it takes to bind a port, for instance.  So any REST call to Neutron should
change its DB and return, and leave an asynchronous process to deal with
making the network state change.  Neutron should notify Nova when it has
changed, and Nova can go on with its life doing other things till the
notification comes in.

Right now we have this mix of synchronous and async code, and its
 causing us to overlook things and have bugs. I'd be equally happy if
 we went all in with an async event driven approach, but we should
 decide if we're fish or fowl, not pick bits of both and hope reviewers
 can remember every little detail.


On this much we agree, I just happen to like fowl.

  One other problem, not yet raised,  is that Nova doesn't express its
needs

  when it asks for a port to be bound [...]


+1, OTOH I don't think this is a structural problem - it doesn't
 matter what protocol or calling style we use, this is just the
 parameters in the call :).


Agreed.  We just need to make it a proper negotiation, and that's it done.
No-one seems to have a problem with this, so I'll have a play with the idea
(out of tree for now, given the time of the cycle).


 I think your desire and Salvatore's are compatible: an interface that
 is excellent for Nova can also be excellent for other users.


Agreed.  But if there's one interface for everything it doesn't really need
to be a plugin.  The question is whether one interface is enough.


 Notifications aren't a complete solution to the orphaning issue unless
 the notification system is guaranteed non-lossy. Something like Kafka
 would be an excellent substrate for such a system, or we could look at
 per-service journalling (on either side of the integration point).


I prefer lossy notification systems.  RabbitMQ is non-lossy, and that means
it will sit on messages for days and then deliver them long past the point
at which they're useful, plus its queue depth is unbounded.  It's not a
great way to run an eventually consistent system, in my opinion.

The pattern I like is where you are notified (via an unreliable channel)
when your operation has completed, but you must also have a background
checking task that goes to see if the notification has gone missing by
checking the datamodel.  The task doesn't have to trigger very often, and
in fact you could hold it off indefinitely with a heartbeat providing the
communications channel remains functioning - but it does have to exist.
You don't have the problem of having to provide an infinite queue, and it's
not a crisis when your messaging system loses a message.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova][Neutron] Thoughts on the nova-neutron interface

2015-01-25 Thread Ian Wells

Lots of open questions in here, because I think we need a long conversation
on the subject.

On 23 January 2015 at 15:51, Kevin Benton blak...@gmail.com wrote:

 It seems like a change to using internal RPC interfaces would be pretty
 unstable at this point.



 Can we start by identifying the shortcomings of the HTTP interface and see
 if we can address them before making the jump to using an interface which
 has been internal to Neutron so far?


I think the protocol being used is a distraction from the actual
shortcomings.

Firstly, you'd have to explain to me why HTTP is so much slower than RPC.
If HTTP is incredibly slow, can be be sped up?  If RPC is moving the data
around using the same calls, what changes?  Secondly, the problem seems
more that we make too many roundtrips - which would be the same over RPC -
and if that's true, perhaps we should be doing bulk operations - which is
not transport-specific.


I absolutely do agree that Neutron should be doing more of the work, and
Nova less, when it comes to port binding.  (And, in fact, I'd like that we
stopped considering it 'Nova-Neutron' port binding, since in theory another
service attaching stuff to the network could request a port be bound; it
just happens at the moment that it's always Nova.)

One other problem, not yet raised,  is that Nova doesn't express its needs
when it asks for a port to be bound, and this is actually becoming a
problem for me right now.  At the moment, Neutron knows, almost
psychically, what binding type Nova will accept, and hands it over; Nova
then deals with whatever binding type it receives (optimisitically
expecting it's one it will support, and getting shirty if it isn't).  The
problem I'm seeing at the moment, and other people have mentioned, is that
certain forwarders can only bind a vhostuser port to a VM if the VM itself
has hugepages enabled.  They could fall back to another binding type but at
the moment that isn't an option: Nova doesn't tell Neutron anything about
what it supports, so there's no data on which to choose.  It should be
saying 'I will take these binding types in this preference order'.  I
think, in fact, that asking Neutron for bindings of a certain preference
type order, would give us much more flexibility - like, for instance, not
having to know exactly which binding type to deliver to which compute node
in multi-hypervisor environments, where at the moment the choice is made in
Neutron.

I scanned through the etherpad and I really like Salvatore's idea of adding
 a service plugin to Neutron that is designed specifically for interacting
 with Nova. All of the Nova notification interactions can be handled there
 and we can add new API components designed for Nova's use (e.g. syncing
 data, etc). Does anyone have any objections to that approach?


I think we should be leaning the other way, actually - working out what a
generic service - think a container management service, or an edge network
service - would want to ask when it wanted to connect to a virtual network,
and making an Neutron interface that supports that properly *without* being
tailored to Nova.  The requirements are similar in all cases, so it's not
clear that a generic interface would be any more complex.

Notifications on data changes in Neutron to prevent orphaning is another
example of a repeating pattern.  It's probably the same for any service
that binds to Neutron, but right now Neutron has Nova-specific code in it.
Broadening the scope, it's also likely the same in Cinder, and in fact it's
also pretty similar to the problem you get when you delete a project in
Keystone and all your resources get orphaned.  Is a Nova-Neutron specific
solution the right thing to do?
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova][neutron]VIF_VHOSTUSER

2015-01-09 Thread Ian Wells

Once more, I'd like to revisit the VIF_VHOSTUSER discussion [1].  I still
think this is worth getting into Nova's libvirt driver - specifically
because there's actually no way to distribute this as an extension; since
we removed the plugin mechanism for VIF drivers, it absolutely requires a
code change in the libvirt driver.  This means that there's no graceful way
of distributing an aftermarket VHOSTUSER driver for libvirt.

The standing counterargument to adding it is that nothing in the upstream
or 3rd party CI would currently test the VIF_VHOSTUSER code.  I'm not sure
that's a showstopper, given the code is zero risk to anyone when it's not
being used, and clearly is going to be experimental when it's enabled.  So,
Nova cores, would it be possible to incorporate this without a
corresponding driver in base Neutron?

Cheers,
-- 
Ian.

[1] https://review.openstack.org/#/c/96140/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][L2-Gateway] Meetings announcement

2015-01-03 Thread Ian Wells

Sukhdev,

Since the term is quite broad and has meant many things in the past, can
you define what you're thinking of when you say 'L2 gateway'?

Cheers,
-- 
Ian.

On 2 January 2015 at 18:28, Sukhdev Kapur sukhdevka...@gmail.com wrote:

 Hi all,

 HAPPY NEW YEAR.

 Starting Monday (Jan 5th, 2015) we will be kicking of bi-weekly meetings
 for L2 Gateway discussions.

 We are hoping to come up with an initial version of L2 Gateway API in Kilo
 cycle. The intent of these bi-weekly meetings is to discuss issues related
 to L2 Gateway API.

 Anybody interested in this topic is invited to join us in these meetings
 and share your wisdom with the similar minded members.

 Here is the details of these meetings:

 https://wiki.openstack.org/wiki/Meetings#Networking_L2_Gateway_meeting

 I have put together a wiki for this project. Next week is the initial
 meeting and the agenda is pretty much open. We will give introduction of
 the members of the team as well the progress made so far on this topic. If
 you would like to add anything to the agenda, feel free to update the
 agenda at the following wiki:

 https://wiki.openstack.org/wiki/Meetings/L2Gateway

 Look forward to on the IRC.

 -Sukhdev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron] bridge name generator for vif plugging

2014-12-15 Thread Ian Wells

Hey Ryota,

A better way of describing it would be that the bridge name is, at present,
generated in *both* Nova *and* Neutron, and the VIF type semantics define
how it's calculated.  I think you're right that in both cases it would make
more sense for Neutron to tell Nova what the connection endpoint was going
to be rather than have Nova calculate it independently.  I'm not sure that
that necessarily requires two blueprints, and you don't have a spec there
at the moment, which is a problem because the Neutron spec deadline is upon
us, but the idea's a good one.  (You might get away without a Neutron spec,
since the change to Neutron to add the information should be small and
backward compatible, but that's not something I can make judgement on.)

If we changed this, then your options are to make new plugging types where
the name is exchanged rather than calculated or use the old plugging types
and provide (from Neutron) and use if provided (in Nova) the name.  You'd
need to think carefully about upgrade scenarios to make sure that changing
version on either side is going to work.

VIF_TYPE_TAP, while somewhat different in its focus, is also moving in the
same direction of having a more logical interface between Nova and
Neutron.  That plus this points that we should have VIF_TYPE_TAP handing
over the TAP device name to use, and similarly create a VIF_TYPE_BRIDGE
(passing bridge name) and slightly modify VIF_TYPE_VHOSTUSER before it gets
established (to add the socket name).

Did you have any thoughts on how the metadata should be stored on the port?
-- 
Ian.


On 15 December 2014 at 10:01, Ryota Mibu r-m...@cq.jp.nec.com wrote:

 Hi all,


 We are proposing a change to move bridge name generator (creating bridge
 name from net-id or reading integration bridge name from nova.conf) from
 Nova to Neutron. The followings are BPs in Nova and Neutron.

 https://blueprints.launchpad.net/nova/+spec/neutron-vif-bridge-details
 https://blueprints.launchpad.net/neutron/+spec/vif-plugging-metadata

 I'd like to get your comments on this change whether this is relevant
 direction. I found related comment in Nova code [3] and guess these
 discussion had in context of vif-plugging and port-binding, but I'm not
 sure there was consensus about bridge name.


 https://github.com/openstack/nova/blob/2014.2/nova/network/neutronv2/api.py#L1298-1299


 Thanks,
 Ryota


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron] bridge name generator for vif plugging

2014-12-15 Thread Ian Wells

Let me write a spec and see what you both think.  I have a couple of things
we could address here and while it's a bit late it wouldn't be a dramatic
thing to fix and it might be acceptable.

On 15 December 2014 at 11:28, Daniel P. Berrange berra...@redhat.com
wrote:

 On Mon, Dec 15, 2014 at 11:15:56AM +0100, Ian Wells wrote:
  Hey Ryota,
 
  A better way of describing it would be that the bridge name is, at
 present,
  generated in *both* Nova *and* Neutron, and the VIF type semantics define
  how it's calculated.  I think you're right that in both cases it would
 make
  more sense for Neutron to tell Nova what the connection endpoint was
 going
  to be rather than have Nova calculate it independently.  I'm not sure
 that
  that necessarily requires two blueprints, and you don't have a spec there
  at the moment, which is a problem because the Neutron spec deadline is
 upon
  us, but the idea's a good one.  (You might get away without a Neutron
 spec,
  since the change to Neutron to add the information should be small and
  backward compatible, but that's not something I can make judgement on.)

 Yep, the fact that both Nova  Neutron calculat the bridge name is a
 historical accident. Originally Nova did it, because nova-network was
 the only solution. Then Neutron did it too, so it matched what Nova
 was doing. Clearly if we had Neutron right from the start, then it
 would have been Neutrons responsibility todo this. Nothing in Nova
 cares what the names are from a functional POV - it just needs to
 be told what to use.

 Regards,
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
 :|
 |: http://libvirt.org  -o- http://virt-manager.org
 :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
 :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
 :|

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova][Neutron] out-of-tree plugin for Mech driver/L2 and vif_driver

2014-12-10 Thread Ian Wells

On 10 December 2014 at 01:31, Daniel P. Berrange berra...@redhat.com
wrote:


 So the problem of Nova review bandwidth is a constant problem across all
 areas of the code. We need to solve this problem for the team as a whole
 in a much broader fashion than just for people writing VIF drivers. The
 VIF drivers are really small pieces of code that should be straightforward
 to review  get merged in any release cycle in which they are proposed.
 I think we need to make sure that we focus our energy on doing this and
 not ignoring the problem by breaking stuff off out of tree.


The problem is that we effectively prevent running an out of tree Neutron
driver (which *is* perfectly legitimate) if it uses a VIF plugging
mechanism that isn't in Nova, as we can't use out of tree code and we won't
accept in code ones for out of tree drivers.  This will get more confusing
as *all* of the Neutron drivers and plugins move out of the tree, as that
constraint becomes essentially arbitrary.

Your issue is one of testing.  Is there any way we could set up a better
testing framework for VIF drivers where Nova interacts with something to
test the plugging mechanism actually passes traffic?  I don't believe
there's any specific limitation on it being *Neutron* that uses the
plugging interaction.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Edge-VPN and Edge-Id

2014-12-05 Thread Ian Wells

I have no problem with standardising the API, and I would suggest that a
service that provided nothing but endpoints could be begun as the next
phase of 'advanced services' broken out projects to standardise that API.
I just don't want it in Neutron itself.

On 5 December 2014 at 00:33, Erik Moe erik@ericsson.com wrote:



 One reason for trying to get an more complete API into Neutron is to have
 a standardized API. So users know what to expect and for providers to have
 something to comply to. Do you suggest we bring this standardization work
 to some other forum, OPNFV for example? Neutron provides low level hooks
 and the rest is defined elsewhere. Maybe this could work, but there would
 probably be other issues if the actual implementation is not on the edge or
 outside Neutron.



 /Erik





 *From:* Ian Wells [mailto:ijw.ubu...@cack.org.uk]
 *Sent:* den 4 december 2014 20:19
 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [Neutron] Edge-VPN and Edge-Id



 On 1 December 2014 at 21:26, Mohammad Hanif mha...@brocade.com wrote:

   I hope we all understand how edge VPN works and what interactions are
 introduced as part of this spec.  I see references to neutron-network
 mapping to the tunnel which is not at all case and the edge-VPN spec
 doesn’t propose it.  At a very high level, there are two main concepts:

1. Creation of a per tenant VPN “service” on a PE (physical router)
which has a connectivity to other PEs using some tunnel (not known to
tenant or tenant-facing).  An attachment circuit for this VPN service is
also created which carries a “list of tenant networks (the list is
initially empty) .
2. Tenant “updates” the list of tenant networks in the attachment
circuit which essentially allows the VPN “service” to add or remove the
network from being part of that VPN.

  A service plugin implements what is described in (1) and provides an API
 which is called by what is described in (2).  The Neutron driver only
 “updates” the attachment circuit using an API (attachment circuit is also
 part of the service plugin’ data model).   I don’t see where we are
 introducing large data model changes to Neutron?



 Well, you have attachment types, tunnels, and so on - these are all
 objects with data models, and your spec is on Neutron so I'm assuming you
 plan on putting them into the Neutron database - where they are, for ever
 more, a Neutron maintenance overhead both on the dev side and also on the
 ops side, specifically at upgrade.



   How else one introduces a network service in OpenStack if it is not
 through a service plugin?



 Again, I've missed something here, so can you define 'service plugin' for
 me?  How similar is it to a Neutron extension - which we agreed at the
 summit we should take pains to avoid, per Salvatore's session?

 And the answer to that is to stop talking about plugins or trying to
 integrate this into the Neutron API or the Neutron DB, and make it an
 independent service with a small and well defined interaction with Neutron,
 which is what the edge-id proposal suggests.  If we do incorporate it into
 Neutron then there are probably 90% of Openstack users and developers who
 don't want or need it but care a great deal if it breaks the tests.  If it
 isn't in Neutron they simply don't install it.



   As we can see, tenant needs to communicate (explicit or otherwise) to
 add/remove its networks to/from the VPN.  There has to be a channel and the
 APIs to achieve this.



 Agreed.  I'm suggesting it should be a separate service endpoint.
 --

 Ian.

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Edge-VPN and Edge-Id

2014-12-04 Thread Ian Wells

On 1 December 2014 at 21:26, Mohammad Hanif mha...@brocade.com wrote:

  I hope we all understand how edge VPN works and what interactions are
 introduced as part of this spec.  I see references to neutron-network
 mapping to the tunnel which is not at all case and the edge-VPN spec
 doesn’t propose it.  At a very high level, there are two main concepts:

1. Creation of a per tenant VPN “service” on a PE (physical router)
which has a connectivity to other PEs using some tunnel (not known to
tenant or tenant-facing).  An attachment circuit for this VPN service is
also created which carries a “list of tenant networks (the list is
initially empty) .
2. Tenant “updates” the list of tenant networks in the attachment
circuit which essentially allows the VPN “service” to add or remove the
network from being part of that VPN.

 A service plugin implements what is described in (1) and provides an API
 which is called by what is described in (2).  The Neutron driver only
 “updates” the attachment circuit using an API (attachment circuit is also
 part of the service plugin’ data model).   I don’t see where we are
 introducing large data model changes to Neutron?


Well, you have attachment types, tunnels, and so on - these are all objects
with data models, and your spec is on Neutron so I'm assuming you plan on
putting them into the Neutron database - where they are, for ever more, a
Neutron maintenance overhead both on the dev side and also on the ops side,
specifically at upgrade.

How else one introduces a network service in OpenStack if it is not through
 a service plugin?


Again, I've missed something here, so can you define 'service plugin' for
me?  How similar is it to a Neutron extension - which we agreed at the
summit we should take pains to avoid, per Salvatore's session?

And the answer to that is to stop talking about plugins or trying to
integrate this into the Neutron API or the Neutron DB, and make it an
independent service with a small and well defined interaction with Neutron,
which is what the edge-id proposal suggests.  If we do incorporate it into
Neutron then there are probably 90% of Openstack users and developers who
don't want or need it but care a great deal if it breaks the tests.  If it
isn't in Neutron they simply don't install it.


 As we can see, tenant needs to communicate (explicit or otherwise) to
 add/remove its networks to/from the VPN.  There has to be a channel and the
 APIs to achieve this.


Agreed.  I'm suggesting it should be a separate service endpoint.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron] Boundary between Nova and Neutron involvement in network setup?

2014-12-04 Thread Ian Wells

On 4 December 2014 at 08:00, Neil Jerram neil.jer...@metaswitch.com wrote:

 Kevin Benton blak...@gmail.com writes:
 I was actually floating a slightly more radical option than that: the
 idea that there is a VIF type (VIF_TYPE_NOOP) for which Nova does
 absolutely _nothing_, not even create the TAP device.


Nova always does something, and that something amounts to 'attaches the VM
to where it believes the endpoint to be'.  Effectively you should view the
VIF type as the form that's decided on during negotiation between Neutron
and Nova - Neutron says 'I will do this much and you have to take it from
there'.  (In fact, I would prefer that it was *more* of a negotiation, in
the sense that the hypervisor driver had a say to Neutron of what VIF types
it supported and preferred, and Neutron could choose from a selection, but
I don't think it adds much value at the moment and I didn't want to propose
a change just for the sake of it.)  I think you're just proposing that the
hypervisor driver should do less of the grunt work of connection.

Also, libvirt is not the only hypervisor driver and I've found it
interesting to nose through the others for background reading, even if
you're not using them much.

For example, suppose someone came along and wanted to implement a new
 OVS-like networking infrastructure?  In principle could they do that
 without having to enhance the Nova VIF driver code?  I think at the
 moment they couldn't, but that they would be able to if VIF_TYPE_NOOP
 (or possibly VIF_TYPE_TAP) was already in place.  In principle I think
 it would then be possible for the new implementation to specify
 VIF_TYPE_NOOP to Nova, and to provide a Neutron agent that does the kind
 of configuration and vSwitch plugging that you've described above.


At the moment, the rule is that *if* you create a new type of
infrastructure then *at that point* you create your new VIF plugging type
to support it - vhostuser being a fine example, having been rejected on the
grounds that it was, at the end of Juno, speculative.  I'm not sure I
particularly like this approach but that's how things are at the moment -
largely down to not wanting to add code that isn;t used and therefore
tested.

None of this is criticism of your proposal, which sounds reasonable; I was
just trying to provide a bit of context.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Edge-VPN and Edge-Id

2014-12-01 Thread Ian Wells

On 1 December 2014 at 04:43, Mathieu Rohon mathieu.ro...@gmail.com wrote:

 This is not entirely true, as soon as a reference implementation,
 based on existing Neutron components (L2agent/L3agent...) can exist.


The specific thing I was saying is that that's harder with an edge-id
mechanism than one incorporated into Neutron, because the point of the
edge-id proposal is to make tunnelling explicitly *not* a responsibility of
Neutron.  So how do you get the agents to terminate tunnels when Neutron
doesn't know anything about tunnels and the agents are a part of Neutron?
Conversely, you can add a mechanism to the OVS subsystem so that you can
tap an L2 bridge into a network, which would probably be more
straightforward.

But even if it were true, this could at least give a standardized API
 to Operators that want to connect their Neutron networks to external
 VPNs, without coupling their cloud solution with whatever SDN
 controller. And to me, this is the main issue that we want to solve by
 proposing some neutron specs.


So the issue I worry about here is that if we start down the path of adding
the MPLS datamodels to Neutron we have to add Kevin's switch control work.
And the L2VPN descriptions for GRE, L2TPv3, VxLAN, and EVPN.  And whatever
else comes along.  And we get back to 'that's a lot of big changes that
aren't interesting to 90% of Neutron users' - difficult to get in and a lot
of overhead to maintain for the majority of Neutron developers who don't
want or need it.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Edge-VPN and Edge-Id

2014-11-29 Thread Ian Wells

On 27 November 2014 at 12:11, Mohammad Hanif mha...@brocade.com wrote:

  Folks,

  Recently, as part of the L2 gateway thread, there was some discussion on
 BGP/MPLS/Edge VPN and how to bridge any overlay networks to the neutron
 network.  Just to update everyone in the community, Ian and I have
 separately submitted specs which make an attempt to address the cloud edge
 connectivity.  Below are the links describing it:

  Edge-Id: https://review.openstack.org/#/c/136555/
 Edge-VPN: https://review.openstack.org/#/c/136929 .  This is a resubmit
 of https://review.openstack.org/#/c/101043/ for the kilo release under
 the “Edge VPN” title.  “Inter-datacenter connectivity orchestration” was
 just too long and just too generic of a title to continue discussing about
 :-(


Per the summit discussions, the difference is one of approach.

The Edge-VPN case addresses MPLS attachments via a set of APIs to be added
to the core of Neutron.  Those APIs are all new objects and don't really
change the existing API so much as extend it.  There's talk of making it a
'service plugin' but if it were me I would simply argue for a new service
endpoint.  Keystone's good at service discovery, endpoints are pretty easy
to create and I don't see why you need to fold it in.

The edge-id case says 'Neutron doesn't really care about what happens
outside of the cloud at this point in time, there are loads of different
edge termination types, and so the best solution would be one where the
description of the actual edge datamodel does not make its way into core
Neutron'.  This avoids us folding in the information about edges in the
same way that we folded in the information about services and later
regretted it.  The notable downside is that this method would work with an
external network controller such as ODL, but probably will never make its
way into the inbuilt OVS/ML2 network controller if it's implemented as
described (explicitly *because* it's designed in such a way as to keep the
functionality out of core Neutron).  Basically, it's not completely
incompatible with the datamodel that the Edge-VPN change describes, but
pushes that datamodel out to an independent service which would have its
own service endpoint to avoid complicating the Neutron API with information
that, likely, Neutron itself could probably only ever validate, store and
pass on to an external controller.

Also, the Edge-VPN case is specified for only MPLS VPNs, and doesn't
consider other edge cases such as Kevin's switch-based edges in
https://review.openstack.org/#/c/87825/ .  The edge-ID one is agnostic of
termination types (since it absolves Neutron of all of that responsibility)
and would leave the edge type description to the determination of an
external service.

Obviously, I'm biased, having written the competing spec; but I prefer the
simple change that pushes complexity out of the core to the larger but
comprehensive change that keeps it as a part of Neutron.  And in fact if
you look at the two specs with that in mind, they do go together; the
Edge-VPN model is almost precisely what you need to describe an endpoint
that you could then associate with an Edge-ID to attach it to Neutron.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] L2 gateway as a service

2014-11-20 Thread Ian Wells

On 19 November 2014 17:19, Sukhdev Kapur sukhdevka...@gmail.com wrote:

 Folks,

 Like Ian, I am jumping in this very late as well - as I decided to travel
 Europe after the summit, just returned back and  catching up :-):-)

 I have noticed that this thread has gotten fairly convoluted and painful
 to read.

 I think Armando summed it up well in the beginning of the thread. There
 are basically three written proposals (listed in Armando's email - I pasted
 them again here).

 [1] https://review.openstack.org/#/c/134179/
 [2] https://review.openstack.org/#/c/100278/
 [3] https://review.openstack.org/#/c/93613/

 On this thread I see that the authors of first two proposals have already
 agreed to consolidate and work together. This leaves with two proposals.
 Both Ian and I were involved with the third proposal [3] and have
 reasonable idea about it. IMO, the use cases addressed by the third
 proposal are very similar to use cases addressed by proposal [1] and [2]. I
 can volunteer to  follow up with Racha and Stephen from Ericsson to see if
 their use case will be covered with the new combined proposal. If yes, we
 have one converged proposal. If no, then we modify the proposal to
 accommodate their use case as well. Regardless, I will ask them to review
 and post their comments on [1].

 Having said that, this covers what we discussed during the morning session
 on Friday in Paris. Now, comes the second part which Ian brought up in the
 afternoon session on Friday.
 My initial reaction was, when heard his use case, that this new
 proposal/API should cover that use case as well (I am being bit optimistic
 here :-)). If not, rather than going into the nitty gritty details of the
 use case, let's see what modification is required to the proposed API to
 accommodate Ian's use case and adjust it accordingly.


As far as I can see, the question of whether you mark a network as 'edge'
and therefore bridged to something you don't know about (my proposal) or
whether you attach a block to it that, behinds the scenes, bridges to
something you don't know about (Maruti's, if you take out all of the
details of *what* is being attached to from the API) are basically as good
as each other.

My API parallels the way that provider networks are used, because that's
what I had in mind at the time; Maruti's uses a block rather than marking
the network, and the only real difference that makes is that (a) you can
attach many networks to one block (which doesn't really seem to bring
anything special) and (b) uses a port to connect to the network (which is
not massively helpful because there's nothing sensible you can put on the
port; there may be many things behind the gateway).  At this point it
becomes a completely religious argument about which is better.  I still
prefer mine, from gut feel, but they are almost exactly equivalent at this
point.

Taking your statement above of 'let's take out the switch port stuff' then
Maruti's use case would need to explain where that data goes. The point I
made is that it becomes a Sisyphean task (endless and not useful) to
introduce a data model and API to introduce this into Neutron via an API
and that's what I didn't want to do.  Can we address that question?

-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

1 2 3 >

1 - 100 of 203 matches

Mail list logo