Re: [Openstack-operators] Neutron getting stuck creating namespaces

2015-11-24 Thread Saverio Proto
Hello Xav,

what version of Openstack are you running ?

thank you

Saverio


2015-11-23 20:04 GMT+01:00 Xav Paice :
> Hi,
>
> Over the last few months we've had a few incidents where the process to
> create network namespaces (Neutron, OVS) on the network nodes gets 'stuck'
> and prevents not only the router it's trying to create from finishing, but
> all further namespace operations too.
>
> This has usually finished up with either us rebooting the node pretty fast
> afterwards, or the node rebooting itself.
>
> It looks very much like we're affected by
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 but the notes
> say it's fixed in the kernel we're running.  I've asked the clever person
> who checked it to make some extra notes in the bug report.
>
> It looks very much like when we have a bunch of load on the box the thing is
> more likely to trigger - I was wondering if other ops have a max ratio of
> routers per network node?  I would have thought our current max of 150
> routers per node would be pretty light, but with the dhcp namespaces as well
> that's ~450 namespaces on a box and maybe that's an issue?
>
> Thanks
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Neutron getting stuck creating namespaces

2015-11-24 Thread Saverio Proto
Hello Xav,

we also had problems with namespaces in Juno. Maybe a little different
than what you describe.

we are running about 250 namespaces in our network node. When we
reboot the network node we observe that some namespaces have qr-* and
qg-* interfaces missing.

we believe that is because the control plane in neutron juno performs
very badly. This is probably fixed in Kilo.

to work around it, after the network node is up and running, we do
reset the namespaces that have interfaces missing:

 neutron router-update  --admin-state-up false
  sleep 5
  neutron router-update  --admin-state-up true

Saverio





2015-11-24 9:51 GMT+01:00 Xav Paice :
> Neutron is Juno, on Trusty boxes with the 3.19 LTS kernel.  We're in the
> process of updating to Kilo, and onwards to Liberty.
>
> On 24 November 2015 at 21:24, Saverio Proto  wrote:
>>
>> Hello Xav,
>>
>> what version of Openstack are you running ?
>>
>> thank you
>>
>> Saverio
>>
>>
>> 2015-11-23 20:04 GMT+01:00 Xav Paice :
>> > Hi,
>> >
>> > Over the last few months we've had a few incidents where the process to
>> > create network namespaces (Neutron, OVS) on the network nodes gets
>> > 'stuck'
>> > and prevents not only the router it's trying to create from finishing,
>> > but
>> > all further namespace operations too.
>> >
>> > This has usually finished up with either us rebooting the node pretty
>> > fast
>> > afterwards, or the node rebooting itself.
>> >
>> > It looks very much like we're affected by
>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 but the
>> > notes
>> > say it's fixed in the kernel we're running.  I've asked the clever
>> > person
>> > who checked it to make some extra notes in the bug report.
>> >
>> > It looks very much like when we have a bunch of load on the box the
>> > thing is
>> > more likely to trigger - I was wondering if other ops have a max ratio
>> > of
>> > routers per network node?  I would have thought our current max of 150
>> > routers per node would be pretty light, but with the dhcp namespaces as
>> > well
>> > that's ~450 namespaces on a box and maybe that's an issue?
>> >
>> > Thanks
>> >
>> > ___
>> > OpenStack-operators mailing list
>> > OpenStack-operators@lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>> >
>
>

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Neutron getting stuck creating namespaces

2015-11-24 Thread Bajin, Joseph
We haven’t seen the bad namespaces issue, but we have experienced an issue 
where our node eventually started to see soft lockups like these:

kernel: BUG: soft lockup - CPU#0 stuck for 22s!

We noticed it once we hit a high amount of namespaces. It was definitely over 
400, as we didn’t realize that the option to delete namespaces was reverted 
from true to false a few releases ago.  We cleaned up the namespaces and those 
errors would stop showing up, then eventually over time those namespaces rose 
again to a high level, and this time we were lucky to have the soft lockup not 
on the neutron process, but on the kernel scheduler.  That is where our reboot 
happened as the system realized that it was dead and restarted it.  





On 11/24/15, 4:14 AM, "Saverio Proto"  wrote:

>Hello Xav,
>
>we also had problems with namespaces in Juno. Maybe a little different
>than what you describe.
>
>we are running about 250 namespaces in our network node. When we
>reboot the network node we observe that some namespaces have qr-* and
>qg-* interfaces missing.
>
>we believe that is because the control plane in neutron juno performs
>very badly. This is probably fixed in Kilo.
>
>to work around it, after the network node is up and running, we do
>reset the namespaces that have interfaces missing:
>
> neutron router-update  --admin-state-up false
>  sleep 5
>  neutron router-update  --admin-state-up true
>
>Saverio
>
>
>
>
>
>2015-11-24 9:51 GMT+01:00 Xav Paice :
>> Neutron is Juno, on Trusty boxes with the 3.19 LTS kernel.  We're in the
>> process of updating to Kilo, and onwards to Liberty.
>>
>> On 24 November 2015 at 21:24, Saverio Proto  wrote:
>>>
>>> Hello Xav,
>>>
>>> what version of Openstack are you running ?
>>>
>>> thank you
>>>
>>> Saverio
>>>
>>>
>>> 2015-11-23 20:04 GMT+01:00 Xav Paice :
>>> > Hi,
>>> >
>>> > Over the last few months we've had a few incidents where the process to
>>> > create network namespaces (Neutron, OVS) on the network nodes gets
>>> > 'stuck'
>>> > and prevents not only the router it's trying to create from finishing,
>>> > but
>>> > all further namespace operations too.
>>> >
>>> > This has usually finished up with either us rebooting the node pretty
>>> > fast
>>> > afterwards, or the node rebooting itself.
>>> >
>>> > It looks very much like we're affected by
>>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 but the
>>> > notes
>>> > say it's fixed in the kernel we're running.  I've asked the clever
>>> > person
>>> > who checked it to make some extra notes in the bug report.
>>> >
>>> > It looks very much like when we have a bunch of load on the box the
>>> > thing is
>>> > more likely to trigger - I was wondering if other ops have a max ratio
>>> > of
>>> > routers per network node?  I would have thought our current max of 150
>>> > routers per node would be pretty light, but with the dhcp namespaces as
>>> > well
>>> > that's ~450 namespaces on a box and maybe that's an issue?
>>> >
>>> > Thanks
>>> >
>>> > ___
>>> > OpenStack-operators mailing list
>>> > OpenStack-operators@lists.openstack.org
>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>> >
>>
>>
>
>___
>OpenStack-operators mailing list
>OpenStack-operators@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators