Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Pedro Sousa
Hi,

that could be a problem with neutron metadata service, check the logs.

Have you considered that the outage might have corrupted your databases,
neutron, nova, etc?

BR

On Thu, Jul 5, 2018 at 9:07 PM Torin Woltjer 
wrote:

> Are IP addresses set by cloud-init on boot? I noticed that cloud-init
> isn't working on my VMs. created a new instance from an ubuntu 18.04 image
> to test with, the hostname was not set to the name of the instance and
> could not login as users I had specified in the configuration.
>
> *Torin Woltjer*
>
> *Grand Dial Communications - A ZK Tech Inc. Company*
>
> *616.776.1066 ext. 2006*
> *www.granddial.com  *
>
> --
> *From*: George Mihaiescu 
> *Sent*: 7/5/18 12:57 PM
> *To*: torin.wolt...@granddial.com
> *Cc*: "openst...@lists.openstack.org" , "
> openstack-operators@lists.openstack.org" <
> openstack-operators@lists.openstack.org>
> *Subject*: Re: [Openstack] Recovering from full outage
> You should tcpdump inside the qdhcp namespace to see if the requests make
> it there, and also check iptables rules on the compute nodes for the return
> traffic.
>
>
> On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer <
> torin.wolt...@granddial.com> wrote:
>
>> Yes, I've done this. The VMs hang for awhile waiting for DHCP and
>> eventually come up with no addresses. neutron-dhcp-agent has been restarted
>> on both controllers. The qdhcp netns's were all present; I stopped the
>> service, removed the qdhcp netns's, noted the dhcp agents show offline by
>> `neutron agent-list`, restarted all neutron services, noted the qdhcp
>> netns's were recreated, restarted a VM again and it still fails to pull an
>> IP address.
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> *  www.granddial.com
>>  *
>>
>> --
>> *From*: George Mihaiescu 
>> *Sent*: 7/5/18 10:38 AM
>> *To*: torin.wolt...@granddial.com
>> *Subject*: Re: [Openstack] Recovering from full outage
>> Did you restart the neutron-dhcp-agent  and rebooted the VMs?
>>
>> On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer <
>> torin.wolt...@granddial.com> wrote:
>>
>>> The qrouter netns appears once the lock_path is specified, the neutron
>>> router is pingable as well. However, instances are not pingable. If I log
>>> in via console, the instances have not been given IP addresses, if I
>>> manually give them an address and route they are pingable and seem to work.
>>> So the router is working correctly but dhcp is not working.
>>>
>>> No errors in any of the neutron or nova logs on controllers or compute
>>> nodes.
>>>
>>>
>>> *Torin Woltjer*
>>>
>>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>>
>>> *616.776.1066 ext. 2006*
>>> *  
>>>  www.granddial.com
>>>  *
>>>
>>> --
>>> *From*: "Torin Woltjer" 
>>> *Sent*: 7/5/18 8:53 AM
>>> *To*: 
>>> *Cc*: openstack-operators@lists.openstack.org,
>>> openst...@lists.openstack.org
>>> *Subject*: Re: [Openstack] Recovering from full outage
>>> There is no lock path set in my neutron configuration. Does it
>>> ultimately matter what it is set to as long as it is consistent? Does it
>>> need to be set on compute nodes as well as controllers?
>>>
>>> *Torin Woltjer*
>>>
>>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>>
>>> *616.776.1066 ext. 2006*
>>> *  
>>>  
>>>  www.granddial.com
>>>  *
>>>
>>> --
>>> *From*: George Mihaiescu 
>>> *Sent*: 7/3/18 7:47 PM
>>> *To*: torin.wolt...@granddial.com
>>> *Cc*: openstack-operators@lists.openstack.org,
>>> openst...@lists.openstack.org
>>> *Subject*: Re: [Openstack] Recovering from full outage
>>>
>>> Did you set a lock_path in the neutron’s config?
>>>
>>> On Jul 3, 2018, at 17:34, Torin Woltjer 
>>> wrote:
>>>
>>> The following errors appear in the neutron-linuxbridge-agent.log on both
>>> controllers: 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> http://paste.openstack.org/show/724930/
>>>
>>> No such errors are on the compute nodes themselves.
>>>
>>> *Torin Woltjer*
>>>
>>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>>
>>> *616.776.1066 ext. 2006*
>>> *  

Re: [Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV

2018-01-22 Thread Pedro Sousa
Hi,

I have sr-iov in production in some customers with maximum number of VFs
and didn't notice any performance issues.

My understanding is that of course you will have performance penalty if you
consume all those vfs, because you're dividing the bandwidth across them,
but other than if they're are there doing nothing you won't notice anything.

But I'm just talking from my experience :)

Regards,
Pedro Sousa

On Mon, Jan 22, 2018 at 11:47 PM, Maciej Kucia <mac...@kucia.net> wrote:

> Thank you for the reply. I am interested in SR-IOV and pci whitelisting is
> certainly involved.
> I suspect that OpenStack itself can handle those numbers of devices,
> especially in telco applications where not much scheduling is being done.
> The feedback I am getting is from sysadmins who work on network
> virtualization but I think this is just a rumor without any proof.
>
> The question is if performance penalty from SR-IOV drivers or PCI itself
> is negligible. Should cloud admin configure maximum number of VFs for
> flexibility or should it be manually managed and balanced depending on
> application?
>
> Regards,
> Maciej
>
>
>>
>> 2018-01-22 18:38 GMT+01:00 Jay Pipes <jaypi...@gmail.com>:
>>
>>> On 01/22/2018 11:36 AM, Maciej Kucia wrote:
>>>
>>>> Hi!
>>>>
>>>> Is there any noticeable performance penalty when using multiple virtual
>>>> functions?
>>>>
>>>> For simplicity I am enabling all available virtual functions in my NICs.
>>>>
>>>
>>> I presume by the above you are referring to setting your
>>> pci_passthrough_whitelist on your compute nodes to whitelist all VFs on a
>>> particular PF's PCI address domain/bus?
>>>
>>> Sometimes application is using only few of them. I am using Intel and
>>>> Mellanox.
>>>>
>>>> I do not see any performance drop but I am getting feedback that this
>>>> might not be the best approach.
>>>>
>>>
>>> Who is giving you this feedback?
>>>
>>> The only issue with enabling (potentially 254 or more) VFs for each PF
>>> is that each VF will end up as a record in the pci_devices table in the
>>> Nova cell database. Multiply 254 or more times the number of PFs times the
>>> number of compute nodes in your deployment and you can get a large number
>>> of records that need to be stored. That said, the pci_devices table is well
>>> indexed and even if you had 1M or more records in the table, the access of
>>> a few hundred of those records when the resource tracker does a
>>> PciDeviceList.get_by_compute_node() [1] will still be quite fast.
>>>
>>> Best,
>>> -jay
>>>
>>> [1] https://github.com/openstack/nova/blob/stable/pike/nova/comp
>>> ute/resource_tracker.py#L572 and then
>>> https://github.com/openstack/nova/blob/stable/pike/nova/pci/
>>> manager.py#L71
>>>
>>> Any recommendations?
>>>>
>>>> Thanks,
>>>> Maciej
>>>>
>>>>
>>>> ___
>>>> OpenStack-operators mailing list
>>>> OpenStack-operators@lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>
>>>>
>>> ___
>>> OpenStack-operators mailing list
>>> OpenStack-operators@lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>
>>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-operators][Ceilometer vs Monasca] Alarms: Ceilometer vs Monasca

2017-08-16 Thread Pedro Sousa
Hi,

I use Aodh + Gnocchi for autoscaling. I also use Mistral + Zaqar for
auto-healing. See the example below, hope it helps.


Main template:

(...)
mongocluster:
type: OS::Heat::AutoScalingGroup
properties:
  cooldown: 60
  desired_capacity: 2
  max_size: 3
  min_size: 1
  resource:
type: ./mongocluster.yaml
properties:
  network: { get_attr: [ voicis_network, be_om_net ] }
  flavor: { get_param: flavor }
  image: { get_param: image }
  key_name: { get_param: key_name }
  base_mgmt_security_group: { get_attr: [ security_groups,
base_mgmt ] }
  mongodb_security_group: { get_attr: [ security_groups, mongodb ] }
  root_stack_id: {get_param: "OS::stack_id"}
  metadata: {"metering.server_group": {get_param: "OS::stack_id"}}


mongodb_scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
  adjustment_type: change_in_capacity
  auto_scaling_group_id: {get_resource: mongocluster}
  cooldown: 60
  scaling_adjustment: 1

  mongodb_scaledown_policy:
type: OS::Heat::ScalingPolicy
properties:
  adjustment_type: change_in_capacity
  auto_scaling_group_id: {get_resource: mongocluster}
  cooldown: 60
  scaling_adjustment: -1

cpu_alarm_high:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
  description: Scale-up if the average CPU > 95% for 1 minute
  metric: cpu_util
  aggregation_method: mean
  granularity: 300
  evaluation_periods: 1
  threshold: 80
  resource_type: instance
  comparison_operator: gt
  alarm_actions:
- str_replace:
template: trust+url
params:
  url: {get_attr: [mongodb_scaleup_policy, signal_url]}
  query:
str_replace:
  template: '{"=": {"server_group": "stack_id"}}'
  params:
stack_id: {get_param: "OS::stack_id"}

  cpu_alarm_low:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
  metric: cpu_util
  aggregation_method: mean
  granularity: 300
  evaluation_periods: 1
  threshold: 5
  resource_type: instance
  comparison_operator: lt
  alarm_actions:
- str_replace:
template: trust+url
params:
  url: {get_attr: [mongodb_scaledown_policy, signal_url]}
  query:
str_replace:
  template: '{"=": {"server_group": "stack_id"}}'
  params:
stack_id: {get_param: "OS::stack_id"}

outputs:
  mongo_stack_id:
description: UUID of the cluster nested stack
value: {get_resource: mongocluster}
  scale_up_url:
description: >
  This URL is the webhook to scale up the autoscaling group.  You
  can invoke the scale-up operation by doing an HTTP POST to this
  URL; no body nor extra headers are needed.
value: {get_attr: [mongodb_scaleup_policy, alarm_url]}
  scale_dn_url:
description: >
  This URL is the webhook to scale down the autoscaling group.
  You can invoke the scale-down operation by doing an HTTP POST to
  this URL; no body nor extra headers are needed.
value: {get_attr: [mongodb_scaledown_policy, alarm_url]}
  ceilometer_query:
value:
  str_replace:
template: >
  ceilometer statistics -m cpu_util
  -q metadata.user_metadata.stack=stackval -p 60 -a avg
params:
  stackval: { get_param: "OS::stack_id" }
description: >
  This is a Ceilometer query for statistics on the cpu_util meter
  Samples about OS::Nova::Server instances in this stack.  The -q
  parameter selects Samples according to the subject's metadata.
  When a VM's metadata includes an item of the form metering.X=Y,
  the corresponding Ceilometer resource has a metadata item of the
  form user_metadata.X=Y and samples about resources so tagged can
  be queried with a Ceilometer query term of the form
  metadata.user_metadata.X=Y.  In this case the nested stacks give
  their VMs metadata that is passed as a nested stack parameter,
  and this stack passes a metadata of the form metering.stack=Y,
  where Y is this stack's ID.




mongocluster.yaml

heat_template_version: ocata

description: >
  MongoDB cluster node


metadata:
type: json

  root_stack_id:
type: string
default: ""

conditions:
is_standalone: {equals: [{get_param: root_stack_id}, ""]}


resources:

mongodbserver:
type: OS::Nova::Server
properties:
  name: { str_replace: { params: { random_string: { get_resource:
random_str }, __zone__: { get_param: zone } }, template:
mongodb-random_string.__zone__ } }
  image: { get_param: image }
  flavor: { get_param: flavor }
  metadata: {get_param: metadata}
  key_name: { get_param: key_name }
  networks:
- port: { get_resource: om_port }
  user_data_format: SOFTWARE_CONFIG
  user_data: { get_resource: server_clu_init 

Re: [Openstack-operators] Blazar? (Reservations and/or scheduled termination)

2016-08-04 Thread Pedro Sousa
Take a look on manageiq.

Em 03/08/2016 17:50, "Jonathan D. Proulx"  escreveu:

> Hi All,
>
> As a private cloud operatior who doesn't charge internal users, I'd
> really like a way to force users to set an exiration time on their
> instances so if they forget about them they go away.
>
> I'd though Blazar was the thing to look at and Chameleoncloud.org
> seems to be using it (any of you around here?) but it also doesn't
> look like it's seen substantive work in a long time.
>
> Anyone have operational exprience with blazar to share or other
> solutions?
>
> -Jon
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Who's using TripleO in production?

2016-08-02 Thread Pedro Sousa
Hi,

I'm using it with CentOS. I've installed mitaka from CentOS Sig Repos and
followed Redhat Documentation:
https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/director-installation-and-usage/director-installation-and-usage

Let me know if you have more questions.

Regards


On Tue, Aug 2, 2016 at 5:57 PM, Curtis  wrote:

> Hi,
>
> I'm just curious who, if anyone, is using TripleO in production?
>
> I'm having a hard time finding anywhere to ask end-user type
> questions. #tripleo irc seems to be just a dev channel. Not sure if
> there is anywhere for end users to ask questions. A quick look at
> stackalytics shows it's mostly RedHat contributions, though again, it
> was a quick look.
>
> If there were other users it would be cool to perhaps try to have a
> session on it at the upcoming ops midcycle.
>
> Thanks,
> Curtis.
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [NFV] [Tacker] Problem installing Tacker on Mitaka when creating DB

2016-06-29 Thread Pedro Sousa
ib64/python2.7/site-packages/sqlalchemy/engine/default.py",
line 450, in do_execute
cursor.execute(statement, parameters)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line
174, in execute
self.errorhandler(self, exc, value)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py",
line 36, in defaulterrorhandler
raise errorclass, errorvalue
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError)
(1832, "Cannot change column 'device_id': used in a foreign key
constraint 'deviceattributes_ibfk_1'") [SQL: u'ALTER TABLE
deviceattributes MODIFY device_id VARCHAR(255) NOT NULL']


Any hint?

Regards,
Pedro Sousa
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Issue running Controllers as virtual machines on Vmware hosts

2016-06-15 Thread Pedro Sousa
Hi all,

I'm trying to virtualize some controllers on Vmware hosts, however I have
an issue with networking.

When tripleo enables promiscuous mode on the interfaces inside the VM
operating system, I lose connectivity to the network.  I already permitted
promiscuous mode on vmware vswitch.

Anyone had an issue like this before?

I use

CentOS 7.2 / Mitaka
Kernel: 3.10.0-327.18.2.el7.x86_64

Regards,
Pedro Sousa
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Failed Overcloud Update Mitaka

2016-06-14 Thread Pedro Sousa
Personally I run undercloud on a vm (kvm) and snapshot it before messing
with the heat stack :)

Regards

On Tue, Jun 14, 2016 at 6:16 PM, Charles Short  wrote:

> Well I just tested this
>
> Tried to create a snapshot of the heat stack overcloud (from a new clean
> state).
> The snapshot is stuck IN PROGRESS (for over an hour). I cannot remove it.
> Perhaps this is not such a good/reliable method.
> I will revert to my CloneZilla bare metal imaging to restore back.
>
> Any other suggestions as to how to cope with a stack update failure
> without deleting and recreating the stack?
>
> Thanks
>
> Charles
>
>
> On 14/06/2016 16:57, Charles Short wrote:
>
>> Hi,
>>
>> TripleO stable Mitaka
>>
>> I am testing expanding my stack by adding more compute nodes. The first
>> update failed, leaving the overcloud stack in a failed state.
>> Is it best practice to create a snapshot of the overcloud heat template
>> before updating the stack?
>> You could then roll back and try the update again.
>>
>> heat stack-snapshot overcloud
>> heat stack-restore [snapshot-id]
>>
>> Regards
>>
>> Charles
>>
>>
> --
> Charles Short
> Cloud Engineer
> Virtualization and Cloud Team
> European Bioinformatics Institute (EMBL-EBI)
> Tel: +44 (0)1223 494205
>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Neutron Multiple gateways instances source routing

2015-11-30 Thread Pedro Sousa
Hi Salvatore,

thank you for your reply. I'm aware that it works with static routes, I
would like to know if it does source routing
http://www.linuxhorizon.ro/iproute2.html

Anyway I solved it using a script inside the instance.

Regards,
Pedro Sousa





On Sun, Nov 29, 2015 at 7:29 AM, Salvatore Orlando <salv.orla...@gmail.com>
wrote:

> Hello Pedro,
>
> Neutron has some (limited) capabilities for injecting static routes into
> instances.
> You can try whether the subnet's host_routes attribute [1] can satisfy
> your requirement.
> Routes can however be specified only in the form destination CIDR/next hop.
> Note: the host_routes option leverages DHCP option 121 in the reference
> implementation and therefore requires DHCP on network interfaces.
>
> Salvatore
>
> [1]
> http://git.openstack.org/cgit/openstack/neutron/tree/neutron/api/v2/attributes.py#n808
> (sorry for the link to the code, the API spec does not render anymore
> correctly the subnet page)
>
> On 18 November 2015 at 12:23, Pedro Sousa <pgso...@gmail.com> wrote:
>
>> Hi all,
>>
>> I've a couple of linux instances with multiple interfaces (different
>> networks and gateways) and I would like to setup source routing, meaning
>> for example that packet that enters interface eth0 should be routed to it's
>> correspondent gateway interface and not default gateway.
>>
>> My question is if this can be done with neutron or do I need to configure
>> it inside the instances?
>>
>> Thanks,
>> Pedro Sousa
>>
>> ___
>> OpenStack-operators mailing list
>> OpenStack-operators@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Neutron Multiple gateways instances source routing

2015-11-18 Thread Pedro Sousa
Hi all,

I've a couple of linux instances with multiple interfaces (different
networks and gateways) and I would like to setup source routing, meaning
for example that packet that enters interface eth0 should be routed to it's
correspondent gateway interface and not default gateway.

My question is if this can be done with neutron or do I need to configure
it inside the instances?

Thanks,
Pedro Sousa
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Kilo NFV Performance Optimization advices on compute nodes

2015-10-12 Thread Pedro Sousa
Hi Jian,

thank you for your reply. I'm using SR-IOV, so I don't use Virtio for
network, however I've tried to use configure RPS still doesn't seem to work:


​
Any hint?

Regards,
Pedro Sousa


On Sat, Oct 10, 2015 at 6:47 AM, Jian Wen <wenjia...@gmail.com> wrote:

> You can try either of the following methods to distributes the load of
> processing received
> packets and spreads them across multiple CPUs.
>
> Receive Packet Steering:
> Optimizing EC2 Network Throughput On Multi-Core Servers
> <https://engineering.pinterest.com/blog/building-pinterest-cloud>
>
> Multi-Queue virtio-net:
>
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Tuning_and_Optimization_Guide/sect-Virtualization_Tuning_Optimization_Guide-Networking-Techniques.html
>
> On Tue, Sep 29, 2015 at 5:49 AM, Pedro Sousa <pgso...@gmail.com> wrote:
>
>> Hi all,
>>
>> I'm trying to deploy some nfv apps on my Openstack deployment, however
>> I'm having some performance issues in my VMs, that start to lose UDP
>> packets at a specific packet transmission rate.
>>
>>  Here's what I've tried and found so far:
>>
>> - VMs Centos 7.1 with 10GBe Neutron SR-IOV nics
>> - Configured Memory Hugepages:
>> http://redhatstackblog.redhat.com/tag/huge-pages/
>> - Configured CPU Pinning and NUMA Topology:
>> - Increased memory network buffers in kernel
>> - Running "egrep ens6 /proc/interrupts" I see network Interrupts are not
>> balanced evenly inside my guest across CPU cores, always hitting the same
>> CPU.
>>
>> Concerning this last issue does anybody have some good advices on how to
>> tackle this, how can I share the network load across the vcpus inside the
>> guest or am I looking in the wrong direction?
>>
>> Some pointers would be appreciated.
>>
>> Regards,
>> Pedro Sousa
>>
>>
>>
>> ___
>> OpenStack-operators mailing list
>> OpenStack-operators@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>
>
> --
> Best,
>
> Jian
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Neutron LBaaS HA in KIlo?

2015-07-15 Thread Pedro Sousa
Hi all,

can anybody clarify if Neutron LBaaS Agent has HA support in Kilo?

Regards,
Pedro Sousa
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] ha queues Juno periodic rabbitmq errors

2015-05-14 Thread Pedro Sousa
Hi all,

I'm using Juno and ocasionally see this kind of errors when I reboot one of
my rabbit nodes:

*MessagingTimeout: Timed out waiting for a reply to message ID
e95d4245da064c779be2648afca8cdc0*

I use ha queues in my openstack services:


*rabbit_hosts=192.168.113.206:5672
http://192.168.113.206:5672,192.168.113.207:5672
http://192.168.113.207:5672,192.168.113.208:5672
http://192.168.113.208:5672*

*rabbit_ha_queues=True*

As anyone experienced this issues? is this a oslo bug or related?

Regards,
Pedro Sousa
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] ha queues Juno periodic rabbitmq errors

2015-05-14 Thread Pedro Sousa
Hi Kevin,

thank you for reply, I'm using rabbitmqctl set_policy HA '^(?!amq\.).*'
'{ha-mode: all}'

I will test with ha-sync-mode:automatic' and net.ipv4.tcp_retries2=5

Regards,
Pedro Sousa






On Thu, May 14, 2015 at 4:29 PM, Kevin Bringard (kevinbri) 
kevin...@cisco.com wrote:

 If you're using Rabbit 3.x you need to enable HA queues via policy on the
 rabbit server side.

 Something like this:

 rabbitmqctl set_policy ha-all 
 '{ha-mode:all,ha-sync-mode:automatic}'


 Obviously, tailor it to your own needs :-)

 We've also seen issues with TCP_RETRIES2 needing to be turned way down
 because when rebooting the rabbit node, it takes quite some time for the
 remote host to realize it's gone and tear down the connections.

 On 5/14/15, 9:23 AM, Pedro Sousa pgso...@gmail.com wrote:

 Hi all,
 
 
 I'm using Juno and ocasionally see this kind of errors when I reboot one
 of my rabbit nodes:
 
 
 MessagingTimeout: Timed out waiting for a reply to message ID
 e95d4245da064c779be2648afca8cdc0
 
 
 I use ha queues in my openstack services:
 
 
 rabbit_hosts=192.168.113.206:5672
 http://192.168.113.206:5672,192.168.113.207:5672
 http://192.168.113.207:5672,192.168.113.208:5672
 http://192.168.113.208:5672
 
 rabbit_ha_queues=True
 
 
 
 As anyone experienced this issues? is this a oslo bug or related?
 
 
 Regards,
 Pedro Sousa
 
 
 
 
 
 
 


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Periodic packet loss neutron l3-agent HA Juno

2015-05-13 Thread Pedro Sousa
Hi,

as observed by Assaf Muller, I need to apply these patches:

https://review.openstack.org/154609
https://review.openstack.org/#/c/154589/

Waiting for openstack-neutron-2014.2.3 rpm from rdo repos to fix it.

Regards,
Pedro Sousa


On Mon, May 11, 2015 at 4:55 PM, Pedro Sousa pgso...@gmail.com wrote:

 Hi all,

 I'm using l3-agent in HA mode in Juno and I'm observing periodic routing
 packet loss in different tenant networks. I've started observing this when
 I switched from VXLAN tunnels to VLANS. I use openvswitch.

 At L2 level, within the same tenant network I don't see this behavior.

 Anyone has observed this and what's best way to debug it?

 openstack-neutron-ml2-2014.2.2-1.el7.noarch
 openstack-neutron-2014.2.2-1.el7.noarch
 python-neutronclient-2.3.9-1.el7.centos.noarch
 openstack-neutron-openvswitch-2014.2.2-1.el7.noarch
 openstack-neutron-metering-agent-2014.2.2-1.el7.noarch
 python-neutron-2014.2.2-1.el7.noarch
 openvswitch-2.3.1-2.el7.x86_64
 kernel-3.10.0-123.13.1.el7.x86_64
 keepalived-1.2.13-6.el7.x86_64


 Thanks,
 Pedro Sousa


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Periodic packet loss neutron l3-agent HA Juno

2015-05-11 Thread Pedro Sousa
Hi all,

I'm using l3-agent in HA mode in Juno and I'm observing periodic routing
packet loss in different tenant networks. I've started observing this when
I switched from VXLAN tunnels to VLANS. I use openvswitch.

At L2 level, within the same tenant network I don't see this behavior.

Anyone has observed this and what's best way to debug it?

openstack-neutron-ml2-2014.2.2-1.el7.noarch
openstack-neutron-2014.2.2-1.el7.noarch
python-neutronclient-2.3.9-1.el7.centos.noarch
openstack-neutron-openvswitch-2014.2.2-1.el7.noarch
openstack-neutron-metering-agent-2014.2.2-1.el7.noarch
python-neutron-2014.2.2-1.el7.noarch
openvswitch-2.3.1-2.el7.x86_64
kernel-3.10.0-123.13.1.el7.x86_64
keepalived-1.2.13-6.el7.x86_64


Thanks,
Pedro Sousa
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] error applying iptables rules openvswitch

2015-05-08 Thread Pedro Sousa
Hi all,

I'm trying to apply floating ips to my instances but I cannot connect to
them, I can however ping my router 192.168.100.1. Looking at the rules I
see that the floating ip rules are being applied only for my router, I
should have nat rules for the remaining ips, look bellow.

[root@compute03 ~]# ip netns exec
qrouter-7660497d-ecad-41d0-b6a9-2e8e268b8b05 iptables -t nat -S
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N neutron-l3-agent-OUTPUT
-N neutron-l3-agent-POSTROUTING
-N neutron-l3-agent-PREROUTING
-N neutron-l3-agent-float-snat
-N neutron-l3-agent-snat
-N neutron-postrouting-bottom
-A PREROUTING -j neutron-l3-agent-PREROUTING
-A OUTPUT -j neutron-l3-agent-OUTPUT
-A POSTROUTING -j neutron-l3-agent-POSTROUTING
-A POSTROUTING -j neutron-postrouting-bottom
-A neutron-l3-agent-POSTROUTING ! -i qg-f8ca9462-58 ! -o qg-f8ca9462-58 -m
conntrack ! --ctstate DNAT -j ACCEPT
-A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -p tcp -m tcp --dport
80 -j REDIRECT --to-ports 9697
-A neutron-l3-agent-snat -j neutron-l3-agent-float-snat
-A neutron-l3-agent-snat -s 10.0.20.0/24 -j SNAT --to-source 192.168.100.1
-A neutron-postrouting-bottom -j neutron-l3-agent-snat


Looking at openvswitch logs I see this:


2015-05-08 18:49:40.702 4576 ERROR neutron.agent.linux.utils
[req-39e10a37-f8f9-44b3-8625-9ef80427f4c8 None]
Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf',
'iptables-restore', '-c']
Exit code: 1
Stdout: ''
Stderr: 'iptables-restore: line 37 failed\n'
2015-05-08 18:49:40.703 4576 ERROR neutron.agent.linux.iptables_manager
[req-39e10a37-f8f9-44b3-8625-9ef80427f4c8 None] IPTablesManager.apply
failed to apply the following set of iptables rules:
 33. :INPUT ACCEPT [1857:623264]
 34. :FORWARD ACCEPT [279:20488]
 35. :OUTPUT ACCEPT [2040:428982]
 36. COMMIT
 37. :neutron-filter-top - [0:0]
 38. :neutron-openvswi-FORWARD - [0:0]
 39. :neutron-openvswi-INPUT - [0:0]
 40. :neutron-openvswi-OUTPUT - [0:0]
 41. :neutron-openvswi-i09e357b7-2 - [0:0]
 42. :neutron-openvswi-i21466de5-1 - [0:0]

Can anybody help to figure out this issue? Is it a bug or something?

I use CentOS 7, Juno with Neutron HA.

Thanks,
Pedro Sousa
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Neutron][Nova] No Valid Host when booting new VM with Public IP

2015-03-18 Thread Pedro Sousa
Hi Adam

For external network you should use floating ips to access externally to
your instances if I understood correctly.

Regards
Em 16/03/2015 20:56, Adam Lawson alaw...@aqorn.com escreveu:

 Got a strange error and I'm really hoping to get some help with it since
 it has be scratching my head.

 When I create a VM within Horizon and select the PRIVATE network, it boots
 up great.
 When I attempt to create a VM within Horizon and include the PUBLIC
 network (either by itself or with the private network), it fails with a No
 valid host found error.

 I looked at the nova-api and the nova-scheduler logs on the controller and
 the most I've found are errors/warnings binding VIF's but I'm not 100%
 certain it's the root cause although I believe it's related.

 I didn't find any WARNINGS or ERRORS in the compute or network node.

 Setup:

- 1 physical host running 4 KVM domains/guests
- 1x Controller
   - 1x Networ
   - 1x Volume
   - 1x Compute


 *Controller Node:*
 nova.conf (http://pastebin.com/q3e9cntH)

- neutron.conf (http://pastebin.com/ukEVzBbN)
- ml2_conf.ini (http://pastebin.com/w10jBGZC)
- nova-api.log (http://pastebin.com/My99Mg2z)
- nova-scheduler (http://pastebin.com/Nb75Z6yH)
- neutron-server.log (http://pastebin.com/EQVQPVDF)


 *Network Node:*

- l3_agent.ini (http://pastebin.com/DBaD1F5x)
- neutron.conf (http://pastebin.com/Bb3qkNi7)
- ml2_conf.ini (http://pastebin.com/xEC1Bs9L)


 *Compute Node:*

- nova.conf (http://pastebin.com/K6SiE9Pw)
- nova-compute.conf (http://pastebin.com/9Mz30b4v)
- neutron.conf (http://pastebin.com/Le4wYRr4)
- ml2_conf.ini (http://pastebin.com/nnyhC8mV)


 *Back-end:*
 Physical switch

 Any thoughts on what could be causing this?

 *Adam Lawson*

 AQORN, Inc.
 427 North Tatnall Street
 Ste. 58461
 Wilmington, Delaware 19801-2230
 Toll-free: (844) 4-AQORN-NOW ext. 101
 International: +1 302-387-4660
 Direct: +1 916-246-2072


 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Problem creating volumes from images with ceph storage

2015-02-12 Thread Pedro Sousa
Hi all,

I have an icehouse deployment with 2 OSD ceph nodes for image and volumes.
I can create volumes, however I cannot create volumes from my images:

*#cinder upload-to-image --disk-format qcow2 --container-format bare
b4ca5b64-e00e-4d54-930a-d1beb8ee027a centos-6.6*

*ERROR: The server has either erred or is incapable of performing the
requested operation. (HTTP 500) (Request-ID:
req-0a309a5a-6fee-4fa7-a8f1-09a684670c7b)*

Any hint?

Thanks,
Pedro Sousa
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Openstack deployment tool advice

2015-02-09 Thread Pedro Sousa
Hi all,

I'm looking into some options to deploy Openstack. I use RDO based Juno
distro and had been using packstack tool (based on puppet) to deploy it,
but it's not flexible enough as it lacks some features, as HA.

I've been looking to Foreman and TripleO project as alternatives, but would
appreciate some input on community based on it's own experience and pain :)
 on some good options to achieve this, having HA and Cloud Monitoring in
mind. Also it would be a plus to address things like rolling upgrades,
although I know this is not an easy task. :)

Thanks,
Pedro Sousa
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] improve perfomance Neutron VXLAN

2015-01-23 Thread Pedro Sousa
Hi Slawek,

I've tried with 8950/9000 but I had problems communicating  with external
hosts from the VM.

Regards,
Pedro Sousa




On Thu, Jan 22, 2015 at 9:36 PM, Sławek Kapłoński sla...@kaplonski.pl
wrote:

 As I wrote earlier, for me it is best to have 9000 on hosts and 8950 on
 instances. Then I have full speed between instances. With lower mtu on
 instances I have about 2-2.5 Gbps and I saw that vhost-net process on host
 is using 100 of 1 cpu core. I'm using libvirt with kvm - maybe You are
 using something else and it will be different on Your hosts.

 Slawek Kaplonski


 W dniu 22.01.2015 o 20:45, Pedro Sousa pisze:

 Hi Slawek,

 I've tried several options but that one that seems to work better is MTU
 1450 on VM and MTU 1600 on the host. With MTU 1400 on the VM I would get
 freezes and timeouts.

 Still I get about 2.2Gbit/Sec while in the host I get 9 Gbit/Sec, do you
 think is normal?

 Thanks,
 Pedro Sousa




 On Thu, Jan 22, 2015 at 7:32 PM, Sławek Kapłoński sla...@kaplonski.pl
 mailto:sla...@kaplonski.pl wrote:

 Hello,

 In dnsmasq file in neutron will be ok. It will then force option 26
 on vm.
 You can also manually change it on vms to tests.

 Slawek Kaplonski

 W dniu 22.01.2015 o 17:06, Pedro Sousa pisze:

 Hi Slawek,

 I'll test this, did you change the mtu on dnsmasq file in
 /etc/neutron/?
 Or do you need to change on other places too?

 Thanks,
 Pedro Sousa

 On Wed, Jan 21, 2015 at 4:26 PM, Sławek Kapłoński
 sla...@kaplonski.pl mailto:sla...@kaplonski.pl
 mailto:sla...@kaplonski.pl mailto:sla...@kaplonski.pl wrote:

  I have similar and I also got something like 2-2,5Gbps
 between vms.
  When I
  change it to 8950 on vms (so in neutron conf) (50 less then
 on
  hosts) then it
  is much better.
  You can check that probably when You make test between vms
 on host
  there is
  process called vhost-net (or something like that) and it
 uses 100%
  of one cpu
  core and that is imho bottleneck

  Slawek Kaplonski

  On Wed, Jan 21, 2015 at 04:12:02PM +, Pedro Sousa wrote:
Hi Slawek,
   
I have dhcp-option-force=26,1400 in neutron-dnsmasq.conf
 and
  MTU=9000 on
network-interfaces in the operating system.
   
Do I need to change somewhere else?
   
Thanks,
Pedro Sousa
   
On Wed, Jan 21, 2015 at 4:07 PM, Sławek Kapłoński
  sla...@kaplonski.pl mailto:sla...@kaplonski.pl
 mailto:sla...@kaplonski.pl mailto:sla...@kaplonski.pl

wrote:
   
 Hello,

 Try to set bigger jumbo framse on hosts and vms. For
 example on
  hosts You
 can
 set 9000 and then 8950 and check then. It helps me
 with similar
  problem.

 Slawek Kaplonski

 On Wed, Jan 21, 2015 at 03:22:50PM +, Pedro Sousa
 wrote:
  Hi all,
 
  is there a way to improve network performance on my
 instances
  with
 VXLAN? I
  changed the MTU on physical interfaces to 1600, still
  performance it's
  lower than in baremetal hosts:
 
  *On Instance:*
 
  [root@vms6-149a71e8-1f2a-4d6e-__bba4-e70dfa42b289
 ~]# iperf3 -s
 
 --__-
  Server listening on 5201
 
 --__-

  Accepted connection from 10.0.66.35, port 42900
  [  5] local 10.0.66.38 port 5201 connected to
 10.0.66.35 port
  42901
  [ ID] Interval   Transfer Bandwidth
  [  5]   0.00-1.00   sec   189 MBytes  1.59 Gbits/sec
  [  5]   1.00-2.00   sec   245 MBytes  2.06 Gbits/sec
  [  5]   2.00-3.00   sec   213 MBytes  1.78 Gbits/sec
  [  5]   3.00-4.00   sec   227 MBytes  1.91 Gbits/sec
  [  5]   4.00-5.00   sec   235 MBytes  1.97 Gbits/sec
  [  5]   5.00-6.00   sec   235 MBytes  1.97 Gbits/sec
  [  5]   6.00-7.00   sec   234 MBytes  1.96 Gbits/sec
  [  5]   7.00-8.00   sec   235 MBytes  1.97 Gbits/sec
  [  5]   8.00-9.00   sec   244 MBytes  2.05 Gbits/sec
  [  5]   9.00-10.00  sec   234 MBytes  1.97 Gbits/sec

Re: [Openstack-operators] improve perfomance Neutron VXLAN

2015-01-23 Thread Pedro Sousa
Hi,

I think you're right, the nic is not offloading because testing  the
instances I see a process ksoftirqd/0 with high CPU on compute hosts.

Doing iperf on baremetal I don't see this process with high CPU.

Is my assumption right?

Thanks

On Wed, Jan 21, 2015 at 3:59 PM, Robert van Leeuwen 
robert.vanleeu...@spilgames.com wrote:

  is there a way to improve network performance on my instances with
 VXLAN?
 I changed the MTU on physical interfaces to 1600, still performance it's
 lower than in baremetal hosts:

 Do you have VXLAN hardware offloading on the NIC?
 I think you are hitting the maximum speed you can do encapsulation in
 software.
 You will need network cards that can do VXLAN offloading to go to similar
 speed as hardware (10Gbit).

 Cheers,
 Robert van Leeuwen



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Import Windows 2008 Vmware VMDK Image to KVM Openstack.

2015-01-22 Thread Pedro Sousa
Hi all,

does anybody have a working procedure/howto to convert Windows based VMDK
images to KVM?

I tried to convert using qemu-img convert command but I always get a blue
screen when I launch the instance.

Thanks,
Pedro Sousa
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] improve perfomance Neutron VXLAN

2015-01-21 Thread Pedro Sousa
Hi all,

is there a way to improve network performance on my instances with VXLAN? I
changed the MTU on physical interfaces to 1600, still performance it's
lower than in baremetal hosts:

*On Instance:*

[root@vms6-149a71e8-1f2a-4d6e-bba4-e70dfa42b289 ~]# iperf3 -s
---
Server listening on 5201
---
Accepted connection from 10.0.66.35, port 42900
[  5] local 10.0.66.38 port 5201 connected to 10.0.66.35 port 42901
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-1.00   sec   189 MBytes  1.59 Gbits/sec
[  5]   1.00-2.00   sec   245 MBytes  2.06 Gbits/sec
[  5]   2.00-3.00   sec   213 MBytes  1.78 Gbits/sec
[  5]   3.00-4.00   sec   227 MBytes  1.91 Gbits/sec
[  5]   4.00-5.00   sec   235 MBytes  1.97 Gbits/sec
[  5]   5.00-6.00   sec   235 MBytes  1.97 Gbits/sec
[  5]   6.00-7.00   sec   234 MBytes  1.96 Gbits/sec
[  5]   7.00-8.00   sec   235 MBytes  1.97 Gbits/sec
[  5]   8.00-9.00   sec   244 MBytes  2.05 Gbits/sec
[  5]   9.00-10.00  sec   234 MBytes  1.97 Gbits/sec
[  5]  10.00-10.04  sec  9.30 MBytes  1.97 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  5]   0.00-10.04  sec  2.25 GBytes  1.92 Gbits/sec   43 sender
[  5]   0.00-10.04  sec  2.25 GBytes  1.92 Gbits/sec
 receiver


*On baremetal:*
iperf3 -s
warning: this system does not seem to support IPv6 - trying IPv4
---
Server listening on 5201
---
Accepted connection from 172.16.21.4, port 51408
[  5] local 172.16.21.5 port 5201 connected to 172.16.21.4 port 51409
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-1.00   sec  1.02 GBytes  8.76 Gbits/sec
[  5]   1.00-2.00   sec  1.07 GBytes  9.23 Gbits/sec
[  5]   2.00-3.00   sec  1.08 GBytes  9.29 Gbits/sec
[  5]   3.00-4.00   sec  1.08 GBytes  9.27 Gbits/sec
[  5]   4.00-5.00   sec  1.08 GBytes  9.27 Gbits/sec
[  5]   5.00-6.00   sec  1.08 GBytes  9.28 Gbits/sec
[  5]   6.00-7.00   sec  1.08 GBytes  9.28 Gbits/sec
[  5]   7.00-8.00   sec  1.08 GBytes  9.29 Gbits/sec
[  5]   8.00-9.00   sec  1.08 GBytes  9.28 Gbits/sec
[  5]   9.00-10.00  sec  1.08 GBytes  9.29 Gbits/sec
[  5]  10.00-10.04  sec  42.8 MBytes  9.31 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  5]   0.00-10.04  sec  10.8 GBytes  9.23 Gbits/sec   95 sender
[  5]   0.00-10.04  sec  10.8 GBytes  9.22 Gbits/sec
 receiver


Thanks,
Pedro Sousa
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Way to check compute - rabbitmq connectivity

2015-01-16 Thread Pedro Sousa
Hi

I had some similar issues in the past with havana. In icehouse and Juno
works fine with the latest version of rabbitmq. I use rabbit_hosts and
ha_queues enabled.

Regards
Pedro Sousa
Em 15/01/2015 15:38, Gustavo Randich gustavo.rand...@gmail.com escreveu:

 Hi,

 I'm experiencing some issues with nova-compute services not responding to
 rabbitmq messages, despite the service reporting OK state via periodic
 tasks. Apparently the TCP connection is open but in a stale or unresponsive
 state. This happens sporadically when there is some not yet understood
 network problem. Restarting nova-compute solves the problem.

 Is there any way, preferably via openstack API, to probe service
 responsiveness, i.e., that it consumes messages, so we can program an alert?

 Thanks in advance!


 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Fwd: HAPROXY 504 errors in HA conf

2015-01-15 Thread Pedro Sousa
False alarm, after more tests the issue persisted, so I switched to backup
mode in the other haproxy nodes and now everything works as expected.

Thanks
Em 15/01/2015 12:13, Pedro Sousa pgso...@gmail.com escreveu:

 Hi all,

 the culprit was haproxy, I had option httpchk when I disabled this
 stopped having timeouts rebooting the servers.

 Thank you all.


 On Wed, Jan 14, 2015 at 5:29 PM, John Dewey j...@dewey.ws wrote:

  I would verify that the VIP failover is occurring.

 Your master should have the IP address.  If you shut down keepalived the
 VIP should move to one of the others.   I generally set the state to MASTER
 on all systems, and have one with a higher priority than the others (e.g.
 100 vs 150 on others).

 On Tuesday, January 13, 2015 at 12:18 PM, Pedro Sousa wrote:

 As expected If I reboot the Keepalived MASTER node, I get timeouts again,
 so my understanding is that this happens when the VIP fails over to another
 node. Anyone has explanation for this?

 Thanks

 On Tue, Jan 13, 2015 at 8:08 PM, Pedro Sousa pgso...@gmail.com wrote:

 Hi,

 I think I found out the issue, as I have all the 3 nodes running
 Keepalived as MASTER, when I reboot one of the servers, one of the VIPS
 failsover to it, causing the timeout issues. So I left only one server as
 MASTER and the other 2 as BACKUP, and If I reboot the BACKUP servers
 everything will work fine.

 As a note aside, I don't know if this is some ARP issue because I have a
 similar problem with Neutron L3 running in HA Mode. If I reboot the server
 that is running as MASTER I loose connection to my floating IPS because the
 switch doesn't know yet that the Mac Addr has changed. To everything start
 working I have to ping an outside host  like google from an instance.

 Maybe someone could share some experience on this,

 Thank you for your help.




 On Tue, Jan 13, 2015 at 7:18 PM, Pedro Sousa pgso...@gmail.com wrote:

 Jesse,

 I see a lot of these messages in glance-api:

 2015-01-13 19:16:29.084 29269 DEBUG
 glance.api.middleware.version_negotiation
 [29d94a9a-135b-4bf2-a97b-f23b0704ee15 eb7ff2b5f0f34f51ac9ea0f75b60065d
 2524b02b63994749ad1fed6f3a825c15 - - -] Unknown version. Returning version
 choices. process_request
 /usr/lib/python2.7/site-packages/glance/api/middleware/version_negotiation.py:64

 While running openstack-status (glance image-list)

 == Glance images ==
 Error finding address for
 http://172.16.21.20:9292/v1/images/detail?sort_key=namesort_dir=asclimit=20:
 HTTPConnectionPool(host='172.16.21.20', port=9292): Max retries exceeded
 with url: /v1/images/detail?sort_key=namesort_dir=asclimit=20 (Caused by
 class 'httplib.BadStatusLine': '')


 Thanks


 On Tue, Jan 13, 2015 at 6:52 PM, Jesse Keating j...@bluebox.net wrote:

 On 1/13/15 10:42 AM, Pedro Sousa wrote:

 Hi


 I've changed some haproxy confs, now I'm getting a different error:

 *== Nova networks ==*
 *ERROR (ConnectionError): HTTPConnectionPool(host='172.16.21.20',
 port=8774): Max retries exceeded with url:
 /v2/2524b02b63994749ad1fed6f3a825c15/os-networks (Caused by class
 'httplib.BadStatusLine': '')*
 *== Nova instance flavors ==*

 If I restart my openstack services everything will start working.

 I'm attaching my new haproxy conf.


 Thanks


 Sounds like your services are losing access to something, like rabbit or
 the database. What do your service logs show prior to restart? Are they
 throwing any errors?


 --
 -jlk


 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators







___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Fwd: HAPROXY 504 errors in HA conf

2015-01-13 Thread Pedro Sousa
Hi


 I've changed some haproxy confs, now I'm getting a different error:

 *== Nova networks ==*
 *ERROR (ConnectionError): HTTPConnectionPool(host='172.16.21.20',
 port=8774): Max retries exceeded with url:
 /v2/2524b02b63994749ad1fed6f3a825c15/os-networks (Caused by class
 'httplib.BadStatusLine': '')*
 *== Nova instance flavors ==*

 If I restart my openstack services everything will start working.


 I'm attaching my new haproxy conf.


Thanks


haproxy.cfg
Description: Binary data
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Fwd: HAPROXY 504 errors in HA conf

2015-01-13 Thread Pedro Sousa
As expected If I reboot the Keepalived MASTER node, I get timeouts again,
so my understanding is that this happens when the VIP fails over to another
node. Anyone has explanation for this?

Thanks

On Tue, Jan 13, 2015 at 8:08 PM, Pedro Sousa pgso...@gmail.com wrote:

 Hi,

 I think I found out the issue, as I have all the 3 nodes running
 Keepalived as MASTER, when I reboot one of the servers, one of the VIPS
 failsover to it, causing the timeout issues. So I left only one server as
 MASTER and the other 2 as BACKUP, and If I reboot the BACKUP servers
 everything will work fine.

 As a note aside, I don't know if this is some ARP issue because I have a
 similar problem with Neutron L3 running in HA Mode. If I reboot the server
 that is running as MASTER I loose connection to my floating IPS because the
 switch doesn't know yet that the Mac Addr has changed. To everything start
 working I have to ping an outside host  like google from an instance.

 Maybe someone could share some experience on this,

 Thank you for your help.




 On Tue, Jan 13, 2015 at 7:18 PM, Pedro Sousa pgso...@gmail.com wrote:

 Jesse,

 I see a lot of these messages in glance-api:

 2015-01-13 19:16:29.084 29269 DEBUG
 glance.api.middleware.version_negotiation
 [29d94a9a-135b-4bf2-a97b-f23b0704ee15 eb7ff2b5f0f34f51ac9ea0f75b60065d
 2524b02b63994749ad1fed6f3a825c15 - - -] Unknown version. Returning version
 choices. process_request
 /usr/lib/python2.7/site-packages/glance/api/middleware/version_negotiation.py:64

 While running openstack-status (glance image-list)

 == Glance images ==
 Error finding address for
 http://172.16.21.20:9292/v1/images/detail?sort_key=namesort_dir=asclimit=20:
 HTTPConnectionPool(host='172.16.21.20', port=9292): Max retries exceeded
 with url: /v1/images/detail?sort_key=namesort_dir=asclimit=20 (Caused by
 class 'httplib.BadStatusLine': '')


 Thanks


 On Tue, Jan 13, 2015 at 6:52 PM, Jesse Keating j...@bluebox.net wrote:

 On 1/13/15 10:42 AM, Pedro Sousa wrote:

 Hi


 I've changed some haproxy confs, now I'm getting a different error:

 *== Nova networks ==*
 *ERROR (ConnectionError): HTTPConnectionPool(host='172.16.21.20',
 port=8774): Max retries exceeded with url:
 /v2/2524b02b63994749ad1fed6f3a825c15/os-networks (Caused by class
 'httplib.BadStatusLine': '')*
 *== Nova instance flavors ==*

 If I restart my openstack services everything will start working.

 I'm attaching my new haproxy conf.


 Thanks


 Sounds like your services are losing access to something, like rabbit or
 the database. What do your service logs show prior to restart? Are they
 throwing any errors?


 --
 -jlk


 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Neutron DVR HA

2015-01-07 Thread Pedro Sousa
Hi all,

after some more tests it seems some gratuitous arp issue because if I start
a new connection (ping) from an inside instance to an external host like
google it will work.

This means that instance advertises the switch that in fact  something has
changed and it should update arp table.

Anyone has seen this behavior?

Thanks



On Tue, Dec 30, 2014 at 6:56 PM, Pedro Sousa pgso...@gmail.com wrote:

 Hi,

 as I stated, if I ping from an openstack instance the request appears on
 *Compute01* and is *OK*:

 *18:50:38.721115 IP 10.0.30.23  172.16.28.32 http://172.16.28.32: ICMP
 echo request, id 29956, seq 36, length 64*
 *18:50:38.721304 IP 172.16.28.32  10.0.30.23 http://10.0.30.23: ICMP
 echo reply, id 29956, seq 36, length 64   *


 if I ping outside from my outside network the request appears on *Compute02
 *and is *NOT OK*:

 *18:50:40.104025 ethertype IPv4, IP 192.168.8.4  172.16.28.32
 http://172.16.28.32: ICMP echo request, id 13981, seq 425, length 64*
 *18:50:40.104025 ethertype IPv4, IP 192.168.8.4  172.16.28.32
 http://172.16.28.32: ICMP echo request, id 13981, seq 425, length 64*


 I appreciate if someone can help me with this.

 Thanks.







 On Tue, Dec 30, 2014 at 3:17 PM, Pedro Sousa pgso...@gmail.com wrote:

 Hi Assaf,

 another update, if I ping the floating ip from my instance it works. If I
 ping from outside/provider network, from my pc, it doesn't.

 Thanks

 On Tue, Dec 30, 2014 at 11:35 AM, Pedro Sousa pgso...@gmail.com wrote:

 Hi Assaf,

 According your instructions I can confirm that I have l2pop disabled.

 Meanwhile, I've made another test, yesterday when I left the office this
 wasn't working, but when I arrived this morning it was pinging again, and I
 didn't changed or touched anything. So my interpretation that this has some
 sort of timeout issue.

 Thanks






 On Tue, Dec 30, 2014 at 11:27 AM, Assaf Muller amul...@redhat.com
 wrote:

 Sorry I can't open zip files on this email. You need l2pop to not exist
 in the ML2 mechanism drivers list in neutron.conf where the Neutron
 server
 is, and you need l2population = False in each OVS agent.

 - Original Message -
 
  [Text File:warning1.txt]
 
  Hi Asaf,
 
  I think I disabled it, but maybe you can check my conf files? I've
 attached
  the zip.
 
  Thanks
 
  On Tue, Dec 30, 2014 at 8:27 AM, Assaf Muller  amul...@redhat.com 
 wrote:
 
 
 
 
  - Original Message -
   Hi Britt,
  
   some update on this after running tcpdump:
  
   I have keepalived master running on controller01, If I reboot this
 server
   it
   failovers to controller02 which now becomes Keepalived Master, then
 I see
   ping packets arriving to controller02, this is good.
  
   However when the controller01 comes online I see that ping requests
 stop
   being forwarded to controller02 and start being sent to
 controller01 that
   is
   now in Backup State, so it stops working.
  
 
  If traffic is being forwarded to a backup node, that sounds like
 L2pop is on.
  Is that true by chance?
 
   Any hint for this?
  
   Thanks
  
  
  
   On Mon, Dec 29, 2014 at 11:06 AM, Pedro Sousa  pgso...@gmail.com
  wrote:
  
  
  
   Yes,
  
   I was using l2pop, disabled it, but the issue remains.
  
   I also stopped bogus VRRP messages configuring a user/password for
   keepalived, but when I reboot the servers, I see keepalived process
 running
   on them but I cannot ping the virtual router ip address anymore.
  
   So I rebooted the node that is running Keepalived as Master, starts
 pinging
   again, but when that node comes online, everything stops working.
 Anyone
   experienced this?
  
   Thanks
  
  
   On Tue, Dec 23, 2014 at 5:03 PM, David Martin  dmart...@gmail.com
  wrote:
  
  
  
   Are you using l2pop? Until
 https://bugs.launchpad.net/neutron/+bug/1365476
   is
   fixed it's pretty broken.
  
   On Tue, Dec 23, 2014 at 10:48 AM, Britt Houser (bhouser) 
   bhou...@cisco.com
wrote:
  
  
  
   Unfortunately I've not had a chance yet to play with neutron router
 HA, so
   no
   hints from me. =( Can you give a little more details about it stops
   working? I.e. You see packets dropped while controller 1 is down?
 Do
   packets begin flowing before controller1 comes back online? Does
   controller1
   come back online successfully? Do packets begin to flow after
 controller1
   comes back online? Perhaps that will help.
  
   Thx,
   britt
  
   From: Pedro Sousa  pgso...@gmail.com 
   Date: Tuesday, December 23, 2014 at 11:14 AM
   To: Britt Houser  bhou...@cisco.com 
   Cc:  OpenStack-operators@lists.openstack.org  
   OpenStack-operators@lists.openstack.org 
   Subject: Re: [Openstack-operators] Neutron DVR HA
  
   I understand Britt, thanks.
  
   So I disabled DVR and tried to test L3_HA, but it's not working
 properly,
   it
   seems a keepalived issue. I see that it's running on 3 nodes:
  
   [root@controller01 keepalived]# neutron
 l3-agent-list-hosting-router
   harouter

Re: [Openstack-operators] Neutron DVR HA

2014-12-30 Thread Pedro Sousa
Hi,

as I stated, if I ping from an openstack instance the request appears on
*Compute01* and is *OK*:

*18:50:38.721115 IP 10.0.30.23  172.16.28.32 http://172.16.28.32: ICMP
echo request, id 29956, seq 36, length 64*
*18:50:38.721304 IP 172.16.28.32  10.0.30.23 http://10.0.30.23: ICMP
echo reply, id 29956, seq 36, length 64   *


if I ping outside from my outside network the request appears on *Compute02
*and is *NOT OK*:

*18:50:40.104025 ethertype IPv4, IP 192.168.8.4  172.16.28.32
http://172.16.28.32: ICMP echo request, id 13981, seq 425, length 64*
*18:50:40.104025 ethertype IPv4, IP 192.168.8.4  172.16.28.32
http://172.16.28.32: ICMP echo request, id 13981, seq 425, length 64*


I appreciate if someone can help me with this.

Thanks.







On Tue, Dec 30, 2014 at 3:17 PM, Pedro Sousa pgso...@gmail.com wrote:

 Hi Assaf,

 another update, if I ping the floating ip from my instance it works. If I
 ping from outside/provider network, from my pc, it doesn't.

 Thanks

 On Tue, Dec 30, 2014 at 11:35 AM, Pedro Sousa pgso...@gmail.com wrote:

 Hi Assaf,

 According your instructions I can confirm that I have l2pop disabled.

 Meanwhile, I've made another test, yesterday when I left the office this
 wasn't working, but when I arrived this morning it was pinging again, and I
 didn't changed or touched anything. So my interpretation that this has some
 sort of timeout issue.

 Thanks






 On Tue, Dec 30, 2014 at 11:27 AM, Assaf Muller amul...@redhat.com
 wrote:

 Sorry I can't open zip files on this email. You need l2pop to not exist
 in the ML2 mechanism drivers list in neutron.conf where the Neutron
 server
 is, and you need l2population = False in each OVS agent.

 - Original Message -
 
  [Text File:warning1.txt]
 
  Hi Asaf,
 
  I think I disabled it, but maybe you can check my conf files? I've
 attached
  the zip.
 
  Thanks
 
  On Tue, Dec 30, 2014 at 8:27 AM, Assaf Muller  amul...@redhat.com 
 wrote:
 
 
 
 
  - Original Message -
   Hi Britt,
  
   some update on this after running tcpdump:
  
   I have keepalived master running on controller01, If I reboot this
 server
   it
   failovers to controller02 which now becomes Keepalived Master, then
 I see
   ping packets arriving to controller02, this is good.
  
   However when the controller01 comes online I see that ping requests
 stop
   being forwarded to controller02 and start being sent to controller01
 that
   is
   now in Backup State, so it stops working.
  
 
  If traffic is being forwarded to a backup node, that sounds like L2pop
 is on.
  Is that true by chance?
 
   Any hint for this?
  
   Thanks
  
  
  
   On Mon, Dec 29, 2014 at 11:06 AM, Pedro Sousa  pgso...@gmail.com 
 wrote:
  
  
  
   Yes,
  
   I was using l2pop, disabled it, but the issue remains.
  
   I also stopped bogus VRRP messages configuring a user/password for
   keepalived, but when I reboot the servers, I see keepalived process
 running
   on them but I cannot ping the virtual router ip address anymore.
  
   So I rebooted the node that is running Keepalived as Master, starts
 pinging
   again, but when that node comes online, everything stops working.
 Anyone
   experienced this?
  
   Thanks
  
  
   On Tue, Dec 23, 2014 at 5:03 PM, David Martin  dmart...@gmail.com
  wrote:
  
  
  
   Are you using l2pop? Until
 https://bugs.launchpad.net/neutron/+bug/1365476
   is
   fixed it's pretty broken.
  
   On Tue, Dec 23, 2014 at 10:48 AM, Britt Houser (bhouser) 
   bhou...@cisco.com
wrote:
  
  
  
   Unfortunately I've not had a chance yet to play with neutron router
 HA, so
   no
   hints from me. =( Can you give a little more details about it stops
   working? I.e. You see packets dropped while controller 1 is down? Do
   packets begin flowing before controller1 comes back online? Does
   controller1
   come back online successfully? Do packets begin to flow after
 controller1
   comes back online? Perhaps that will help.
  
   Thx,
   britt
  
   From: Pedro Sousa  pgso...@gmail.com 
   Date: Tuesday, December 23, 2014 at 11:14 AM
   To: Britt Houser  bhou...@cisco.com 
   Cc:  OpenStack-operators@lists.openstack.org  
   OpenStack-operators@lists.openstack.org 
   Subject: Re: [Openstack-operators] Neutron DVR HA
  
   I understand Britt, thanks.
  
   So I disabled DVR and tried to test L3_HA, but it's not working
 properly,
   it
   seems a keepalived issue. I see that it's running on 3 nodes:
  
   [root@controller01 keepalived]# neutron l3-agent-list-hosting-router
   harouter
  
 +--+--++---+
   | id | host | admin_state_up | alive |
  
 +--+--++---+
   | 09cfad44-2bb2-4683-a803-ed70f3a46a6a | controller01 | True | :-) |
   | 58ff7c42-7e71-4750-9f05-61ad5fbc5776 | compute03 | True | :-) |
   | 8d778c6a-94df-40b7-a2d6-120668e699ca | compute02 | True

[Openstack-operators] Neutron DVR HA

2014-12-23 Thread Pedro Sousa
Hi all,

I've been trying Neutron DVR with 2 controllers +  2 computes. When I
create a router I can see that is running on all the servers:

[root@controller01 ~]# neutron l3-agent-list-hosting-router router
+--+--++---+
| id   | host | admin_state_up |
alive |
+--+--++---+
| 09cfad44-2bb2-4683-a803-ed70f3a46a6a | controller01 | True   |
:-)   |
| 0ca01d56-b6dd-483d-9c49-cc7209da2a5a | controller02 | True   |
:-)   |
| 52379f0f-9046-4b73-9d87-bab7f96be5e7 | compute01| True   |
:-)   |
| 8d778c6a-94df-40b7-a2d6-120668e699ca | compute02| True   |
:-)   |
+--+--++---+

However if controller01 server dies I cannot ping ip external gateway
anymore. Is this the expected behavior? Shouldn't it failback to the
another controller node?

Thanks
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Neutron DVR HA

2014-12-23 Thread Pedro Sousa
I understand Britt, thanks.

So I disabled DVR and tried  to test L3_HA, but it's not working properly,
it seems a keepalived issue. I see that it's running on 3 nodes:

[root@controller01 keepalived]# neutron l3-agent-list-hosting-router
harouter
+--+--++---+
| id   | host | admin_state_up |
alive |
+--+--++---+
| 09cfad44-2bb2-4683-a803-ed70f3a46a6a | controller01 | True   |
:-)   |
| 58ff7c42-7e71-4750-9f05-61ad5fbc5776 | compute03| True   |
:-)   |
| 8d778c6a-94df-40b7-a2d6-120668e699ca | compute02| True   |
:-)   |
+--+--++---+

However if I reboot one of the l3-agent nodes it stops working. I see this
in the logs:

*Dec 23 16:12:28 Compute02 Keepalived_vrrp[18928]: ip address associated
with VRID not present in received packet : 172.16.28.20*
*Dec 23 16:12:28 Compute02 Keepalived_vrrp[18928]: one or more VIP
associated with VRID mismatch actual MASTER advert*
*Dec 23 16:12:28 Compute02 Keepalived_vrrp[18928]: bogus VRRP packet
received on ha-a509de81-1c !!!*
*Dec 23 16:12:28 Compute02 Keepalived_vrrp[18928]: VRRP_Instance(VR_1)
ignoring received advertisment...*

*Dec 23 16:13:10 Compute03 Keepalived_vrrp[12501]: VRRP_Instance(VR_1)
ignoring received advertisment...*
*Dec 23 16:13:12 Compute03 Keepalived_vrrp[12501]: ip address associated
with VRID not present in received packet : 172.16.28.20*
*Dec 23 16:13:12 Compute03 Keepalived_vrrp[12501]: one or more VIP
associated with VRID mismatch actual MASTER advert*
*Dec 23 16:13:12 Compute03 Keepalived_vrrp[12501]: bogus VRRP packet
received on ha-d5718741-ef !!!*
*Dec 23 16:13:12 Compute03 Keepalived_vrrp[12501]: VRRP_Instance(VR_1)
ignoring received advertisment...*

Any hint?

Thanks




On Tue, Dec 23, 2014 at 3:17 PM, Britt Houser (bhouser) bhou...@cisco.com
wrote:

  Currently HA and DVR are mutually exclusive features.

   From: Pedro Sousa pgso...@gmail.com
 Date: Tuesday, December 23, 2014 at 9:42 AM
 To: OpenStack-operators@lists.openstack.org 
 OpenStack-operators@lists.openstack.org
 Subject: [Openstack-operators] Neutron DVR HA

   Hi all,

  I've been trying Neutron DVR with 2 controllers +  2 computes. When I
 create a router I can see that is running on all the servers:

  [root@controller01 ~]# neutron l3-agent-list-hosting-router router

 +--+--++---+
 | id   | host | admin_state_up |
 alive |

 +--+--++---+
 | 09cfad44-2bb2-4683-a803-ed70f3a46a6a | controller01 | True   |
 :-)   |
 | 0ca01d56-b6dd-483d-9c49-cc7209da2a5a | controller02 | True   |
 :-)   |
 | 52379f0f-9046-4b73-9d87-bab7f96be5e7 | compute01| True   |
 :-)   |
 | 8d778c6a-94df-40b7-a2d6-120668e699ca | compute02| True   |
 :-)   |

 +--+--++---+

  However if controller01 server dies I cannot ping ip external gateway
 anymore. Is this the expected behavior? Shouldn't it failback to the
 another controller node?

  Thanks

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] automatically evacuate an instance when a compute node dies

2014-12-09 Thread Pedro Sousa
Hi all,

is there a working solution in nova to automatically restart an instance
when a compute node dies in a healthy node?

I've heard about pacemaker, any good howto to help with this?

Regards
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators