Re: [Openstack-operators] [openstack-dev][openstack-operators][neutron[dhcp][dnsmask]: duplicate entries in addn_hosts causing no IP allocation
Two further thoughts on this: 1. Another DHCP agent problem that my team noticed is that it call_driver('reload_allocations') takes a bit of time (to regenerate the Dnsmasq config files, and to spawn a shell that sends a HUP signal) - enough so that if there is a fast steady rate of port-create and port-delete notifications coming from the Neutron server, these can build up in DHCPAgent's RPC queue, and then they still only get dispatched one at a time. So the queue and the time delay become longer and longer. I have a fix pending for this, which uses an extra thread to read those notifications off the RPC queue onto an internal queue, and then batches the call_driver('reload_allocations') processing when there is a contiguous sequence of such notifications - i.e. only does the config regeneration and HUP once, instead of lots of times. I don't think this is directly related to what you are seeing - but perhaps there actually is some link that I am missing. 2. There is an interesting and vaguely similar thread currently being discussed about the L3 agent (subject L3 agent rescheduling issue) - about possible RPC/threading issues between the agent and the Neutron server. You might like to review that thread and see if it describes any problems analogous to your DHCP one. Regards, Neil On 08/06/15 17:53, Neil Jerram wrote: My team has seen a problem that could be related: in a churn test where VMs are created and terminated at a constant rate - but so that the number of active VMs should remain roughly constant - the size of the host and addn_hosts files keeps increasing. In other words, it appears that the config for VMs that have actually been terminated is not being removed from the config file. Clearly, if you have a limited pool of IP addresses, this can eventually lead to the problem that you have described. For your case - i.e. with Icehouse - the problem might be https://bugs.launchpad.net/neutron/+bug/1192381. I'm not sure if the fix for that problem - i.e. sending port-create and port-delete notifications to DHCP agents even when the server thinks they are down - was merged before the Icehouse release, or not. But there must be at least one other cause as well, because my team was seeing this with Juno-level code. Therefore I, too, would be interested in any other insights about this problem. Regards, Neil On 08/06/15 16:26, Daniel Comnea wrote: Any help, ideas please? Thx, Dani On Mon, Jun 8, 2015 at 9:25 AM, Daniel Comnea comnea.d...@gmail.com mailto:comnea.d...@gmail.com wrote: + Operators Much thanks in advance, Dani On Sun, Jun 7, 2015 at 6:31 PM, Daniel Comnea comnea.d...@gmail.com mailto:comnea.d...@gmail.com wrote: Hi all, I'm running IceHouse (build using Fuel 5.1.1) on Ubuntu where dnsmask version 2.59-4. I have a very basic network layout where i have a private net which has 2 subnets 2fb7de9d-d6df-481f-acca-2f7860cffa60 | private-net | e79c3477-d3e5-471c-a728-8d881cf31bee 192.168.110.0/24 http://192.168.110.0/24 | | | | f48c3223-8507-455c-9c13-8b727ea5f441 192.168.111.0/24 http://192.168.111.0/24 | and i'm creating VMs via HEAT. What is happening is that sometimes i get duplicated entries in [1] and because of that the VM which was spun up doesn't get an ip. The Dnsmask processes are running okay [2] and i can't see anything special/ wrong in it. Any idea why this is happening? Or are you aware of any bugs around this area? Do you see a problems with having 2 subnets mapped to 1 private-net? Thanks, Dani [1] /var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts [2] nobody5664 1 0 Jun02 ?00:00:08 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tapc9164734-0c --except-interface=lo --pid-file=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/host --addn-hosts=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/opts --leasefile-ro --dhcp-authoritative --dhcp-range=set:tag0,192.168.110.0,static,86400s --dhcp-range=set:tag1,192.168.111.0,static,86400s --dhcp-lease-max=512 --conf-file= --server=10.0.0.31 --server=10.0.0.32 --domain=openstacklocal ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators
Re: [Openstack-operators] [openstack-dev][openstack-operators][neutron[dhcp][dnsmask]: duplicate entries in addn_hosts causing no IP allocation
My team has seen a problem that could be related: in a churn test where VMs are created and terminated at a constant rate - but so that the number of active VMs should remain roughly constant - the size of the host and addn_hosts files keeps increasing. In other words, it appears that the config for VMs that have actually been terminated is not being removed from the config file. Clearly, if you have a limited pool of IP addresses, this can eventually lead to the problem that you have described. For your case - i.e. with Icehouse - the problem might be https://bugs.launchpad.net/neutron/+bug/1192381. I'm not sure if the fix for that problem - i.e. sending port-create and port-delete notifications to DHCP agents even when the server thinks they are down - was merged before the Icehouse release, or not. But there must be at least one other cause as well, because my team was seeing this with Juno-level code. Therefore I, too, would be interested in any other insights about this problem. Regards, Neil On 08/06/15 16:26, Daniel Comnea wrote: Any help, ideas please? Thx, Dani On Mon, Jun 8, 2015 at 9:25 AM, Daniel Comnea comnea.d...@gmail.com mailto:comnea.d...@gmail.com wrote: + Operators Much thanks in advance, Dani On Sun, Jun 7, 2015 at 6:31 PM, Daniel Comnea comnea.d...@gmail.com mailto:comnea.d...@gmail.com wrote: Hi all, I'm running IceHouse (build using Fuel 5.1.1) on Ubuntu where dnsmask version 2.59-4. I have a very basic network layout where i have a private net which has 2 subnets 2fb7de9d-d6df-481f-acca-2f7860cffa60 | private-net | e79c3477-d3e5-471c-a728-8d881cf31bee 192.168.110.0/24 http://192.168.110.0/24 | | | | f48c3223-8507-455c-9c13-8b727ea5f441 192.168.111.0/24 http://192.168.111.0/24 | and i'm creating VMs via HEAT. What is happening is that sometimes i get duplicated entries in [1] and because of that the VM which was spun up doesn't get an ip. The Dnsmask processes are running okay [2] and i can't see anything special/ wrong in it. Any idea why this is happening? Or are you aware of any bugs around this area? Do you see a problems with having 2 subnets mapped to 1 private-net? Thanks, Dani [1] /var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts [2] nobody5664 1 0 Jun02 ?00:00:08 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tapc9164734-0c --except-interface=lo --pid-file=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/host --addn-hosts=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/opts --leasefile-ro --dhcp-authoritative --dhcp-range=set:tag0,192.168.110.0,static,86400s --dhcp-range=set:tag1,192.168.111.0,static,86400s --dhcp-lease-max=512 --conf-file= --server=10.0.0.31 --server=10.0.0.32 --domain=openstacklocal ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Gentoo image availability
Nice to hear. You're doing a great job! Few things to make Gentoo 'first class citizen' for openstack (guest). 1. Check if you supports for all eth's, not only eth0. If instance boots with two or more interfaces, it should be able to get all it addresses. 2. Add Gentoo 'element' to disk-image-builder (https://github.com/openstack/diskimage-builder) 3. Ship image with proper cloud-init cloud.cfg On 06/08/2015 06:26 PM, Matthew Thode wrote: Hi, I'm the packager of Openstack on Gentoo and have just started generation of Gentoo Openstack images. Right now it is just a basic amd64 image, but I plan on adding nomultilib and hardened variants (for a total of at least 4 images). I plan on generating these images at least weekly These images are not yet sanctioned by our infra team, but I plan on remedying that (being a member of said team should help). I am currently using the scripts at https://github.com/prometheanfire/gentoo-cloud-prep to generate the images (based on a heavily modified version of Matt Vandermeulen's scripts). If you have any issues please submit bugs there or contact me on irc (prometheanfire on freenode). Here's the link to the images, I'm currently gpg signing them with the same key I use to sign this email (offline master key smartcard setup for security minded folk). http://23.253.251.73/ Let me know if you have questions, ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Gentoo image availability
On 06/08/2015 09:17 PM, George Shuklin wrote: Nice to hear. You're doing a great job! Few things to make Gentoo 'first class citizen' for openstack (guest). 1. Check if you supports for all eth's, not only eth0. If instance boots with two or more interfaces, it should be able to get all it addresses. 2. Add Gentoo 'element' to disk-image-builder (https://github.com/openstack/diskimage-builder) 3. Ship image with proper cloud-init cloud.cfg On 06/08/2015 06:26 PM, Matthew Thode wrote: Hi, I'm the packager of Openstack on Gentoo and have just started generation of Gentoo Openstack images. Right now it is just a basic amd64 image, but I plan on adding nomultilib and hardened variants (for a total of at least 4 images). I plan on generating these images at least weekly These images are not yet sanctioned by our infra team, but I plan on remedying that (being a member of said team should help). I am currently using the scripts at https://github.com/prometheanfire/gentoo-cloud-prep to generate the images (based on a heavily modified version of Matt Vandermeulen's scripts). If you have any issues please submit bugs there or contact me on irc (prometheanfire on freenode). Here's the link to the images, I'm currently gpg signing them with the same key I use to sign this email (offline master key smartcard setup for security minded folk). http://23.253.251.73/ Let me know if you have questions, ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators Ya, not sure how to do multi-interface yet. I'd love if the cloud-init static ip support would work with it. (hash with macs being the key and a list of IPs being the value for each interface). Then dhcp can go away (I tend to much prefer config-drive). The disk-image-builder support is on my todo list already :D I just updated the cloud-init ebuild with a better cloud.cfg, could probably use more love, but it works. I am working on getting gentoo as a first class citizen in openstack-ansible as well, which depends on the disk-image-builder work. So much work still to do :D -- Matthew Thode (prometheanfire) ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [openstack-dev][openstack-operators][neutron[dhcp][dnsmask]: duplicate entries in addn_hosts causing no IP allocation
Hi Daniel, I'm concerned that we are encountered out-of-order port events on the DHCP agent side so the delete message is processed before the create message. Would you be willing to apply a small patch to your dhcp agent to see if it fixes the issue? If it does fix the issue, you should see occasional warnings in the DHCP agent log that show Received message for port that was already deleted. If it doesn't fix the issue, we may be losing the delete event entirely. If that's the case, it would be great if you can enable debuging on the agent and upload a log of a run when it happens. Cheers, Kevin Benton Here is the patch: diff --git a/neutron/agent/dhcp_agent.py b/neutron/agent/dhcp_agent.py index 71c9709..9b9b637 100644 --- a/neutron/agent/dhcp_agent.py +++ b/neutron/agent/dhcp_agent.py @@ -71,6 +71,7 @@ class DhcpAgent(manager.Manager): self.needs_resync = False self.conf = cfg.CONF self.cache = NetworkCache() +self.deleted_ports = set() self.root_helper = config.get_root_helper(self.conf) self.dhcp_driver_cls = importutils.import_class(self.conf.dhcp_driver) ctx = context.get_admin_context_without_session() @@ -151,6 +152,7 @@ class DhcpAgent(manager.Manager): LOG.info(_('Synchronizing state')) pool = eventlet.GreenPool(cfg.CONF.num_sync_threads) known_network_ids = set(self.cache.get_network_ids()) +self.deleted_ports = set() try: active_networks = self.plugin_rpc.get_active_networks_info() @@ -302,6 +304,10 @@ class DhcpAgent(manager.Manager): @utils.synchronized('dhcp-agent') def port_update_end(self, context, payload): Handle the port.update.end notification event. +if payload['port']['id'] in self.deleted_ports: +LOG.warning(_(Received message for port that was + already deleted: %s), payload['port']['id']) +return updated_port = dhcp.DictModel(payload['port']) network = self.cache.get_network_by_id(updated_port.network_id) if network: @@ -315,6 +321,7 @@ class DhcpAgent(manager.Manager): def port_delete_end(self, context, payload): Handle the port.delete.end notification event. port = self.cache.get_port_by_id(payload['port_id']) +self.deleted_ports.add(payload['port_id']) if port: network = self.cache.get_network_by_id(port.network_id) self.cache.remove_port(port) On Mon, Jun 8, 2015 at 8:26 AM, Daniel Comnea comnea.d...@gmail.com wrote: Any help, ideas please? Thx, Dani On Mon, Jun 8, 2015 at 9:25 AM, Daniel Comnea comnea.d...@gmail.com wrote: + Operators Much thanks in advance, Dani On Sun, Jun 7, 2015 at 6:31 PM, Daniel Comnea comnea.d...@gmail.com wrote: Hi all, I'm running IceHouse (build using Fuel 5.1.1) on Ubuntu where dnsmask version 2.59-4. I have a very basic network layout where i have a private net which has 2 subnets 2fb7de9d-d6df-481f-acca-2f7860cffa60 | private-net | e79c3477-d3e5-471c-a728-8d881cf31bee 192.168.110.0/24 | | | | f48c3223-8507-455c-9c13-8b727ea5f441 192.168.111.0/24 | and i'm creating VMs via HEAT. What is happening is that sometimes i get duplicated entries in [1] and because of that the VM which was spun up doesn't get an ip. The Dnsmask processes are running okay [2] and i can't see anything special/ wrong in it. Any idea why this is happening? Or are you aware of any bugs around this area? Do you see a problems with having 2 subnets mapped to 1 private-net? Thanks, Dani [1] /var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts [2] nobody5664 1 0 Jun02 ?00:00:08 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tapc9164734-0c --except-interface=lo --pid-file=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/host --addn-hosts=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/opts --leasefile-ro --dhcp-authoritative --dhcp-range=set:tag0,192.168.110.0,static,86400s --dhcp-range=set:tag1,192.168.111.0,static,86400s --dhcp-lease-max=512 --conf-file= --server=10.0.0.31 --server=10.0.0.32 --domain=openstacklocal ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -- Kevin Benton ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [openstack-dev][openstack-operators][neutron[dhcp][dnsmask]: duplicate entries in addn_hosts causing no IP allocation
+ Operators Much thanks in advance, Dani On Sun, Jun 7, 2015 at 6:31 PM, Daniel Comnea comnea.d...@gmail.com wrote: Hi all, I'm running IceHouse (build using Fuel 5.1.1) on Ubuntu where dnsmask version 2.59-4. I have a very basic network layout where i have a private net which has 2 subnets 2fb7de9d-d6df-481f-acca-2f7860cffa60 | private-net | e79c3477-d3e5-471c-a728-8d881cf31bee 192.168.110.0/24 | | | | f48c3223-8507-455c-9c13-8b727ea5f441 192.168.111.0/24 | and i'm creating VMs via HEAT. What is happening is that sometimes i get duplicated entries in [1] and because of that the VM which was spun up doesn't get an ip. The Dnsmask processes are running okay [2] and i can't see anything special/ wrong in it. Any idea why this is happening? Or are you aware of any bugs around this area? Do you see a problems with having 2 subnets mapped to 1 private-net? Thanks, Dani [1] /var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts [2] nobody5664 1 0 Jun02 ?00:00:08 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tapc9164734-0c --except-interface=lo --pid-file=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/host --addn-hosts=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/2fb7de9d-d6df-481f-acca-2f7860cffa60/opts --leasefile-ro --dhcp-authoritative --dhcp-range=set:tag0,192.168.110.0,static,86400s --dhcp-range=set:tag1,192.168.111.0,static,86400s --dhcp-lease-max=512 --conf-file= --server=10.0.0.31 --server=10.0.0.32 --domain=openstacklocal ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] [Fuel][Oslo][RabbitMQ][Shovel] Deprecate mirrored queues from HA AMQP cluster scenario
Hello, stackers. I'd like to bring out a poll about deprecating the RabbitMQ mirrored queues for HA layout and replacing the AMQP clustering by shovel [0], [1]. I guess the federation would not be a good option, but let's consider it as well. Why this must be done? The answer is that the rabbit cluster cannot detect and survive micro outages well and just ending up with some queues stuck and as a result, the rabbitmqctl control plane hanged completely unresponsive (until the rabbit node erased and recovered its cluster membership). These outages could be caused either by the network *or* by CPU load spikes. For example, like this bug in Fuel project [2] and this mail thread [3]. So, let's please vote and discuss. But the questions also are: a) Would be there changes in Oslo.messaging required as well in order to support the underlying AMQP layer architecture changes? b) Are there any volunteers for this research to be done for the Oslo.messaging AMQP rabbit driver? PS. Note, I'm not bringing RabbitMQ versions here as the issue seems unresolved for any of existing ones. This seems rather the Erlang's Mnesia generic clustering issue, than something what could be just fixed in RabbitMQ, unless the mnesia based clustering would be dropped completely ;) [0] https://www.rabbitmq.com/shovel-dynamic.html [1] https://www.rabbitmq.com/shovel.html [2] https://bugs.launchpad.net/fuel/+bug/1460762 [3] https://groups.google.com/forum/#!topic/rabbitmq-users/iZWokxvhlaU -- Best regards, Bogdan Dobrelya, Skype #bogdando_at_yahoo.com Irc #bogdando ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Gentoo image availability
Hi, I'm the packager of Openstack on Gentoo and have just started generation of Gentoo Openstack images. Right now it is just a basic amd64 image, but I plan on adding nomultilib and hardened variants (for a total of at least 4 images). I plan on generating these images at least weekly These images are not yet sanctioned by our infra team, but I plan on remedying that (being a member of said team should help). I am currently using the scripts at https://github.com/prometheanfire/gentoo-cloud-prep to generate the images (based on a heavily modified version of Matt Vandermeulen's scripts). If you have any issues please submit bugs there or contact me on irc (prometheanfire on freenode). Here's the link to the images, I'm currently gpg signing them with the same key I use to sign this email (offline master key smartcard setup for security minded folk). http://23.253.251.73/ Let me know if you have questions, -- Matthew Thode (prometheanfire) signature.asc Description: OpenPGP digital signature ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Gentoo image availability
Since these are distro packages, you might be interested in converting the prep scripts into a DIB (diskimage-builder) element. We can presently build images for Debian/Ubuntu/Centos/Fedora/RHEL/openSUSE... I'm sure someone out there would appreciate having gentoo, too. On Jun 8, 2015, at 8:26 AM, Matthew Thode prometheanf...@gentoo.org wrote: Hi, I'm the packager of Openstack on Gentoo and have just started generation of Gentoo Openstack images. Right now it is just a basic amd64 image, but I plan on adding nomultilib and hardened variants (for a total of at least 4 images). I plan on generating these images at least weekly These images are not yet sanctioned by our infra team, but I plan on remedying that (being a member of said team should help). I am currently using the scripts at https://github.com/prometheanfire/gentoo-cloud-prep to generate the images (based on a heavily modified version of Matt Vandermeulen's scripts). If you have any issues please submit bugs there or contact me on irc (prometheanfire on freenode). Here's the link to the images, I'm currently gpg signing them with the same key I use to sign this email (offline master key smartcard setup for security minded folk). http://23.253.251.73/ Let me know if you have questions, -- Matthew Thode (prometheanfire) ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators signature.asc Description: Message signed with OpenPGP using GPGMail ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators