Re: VM orchestration, updating Best practices
Hi Lisa, Erik Weber terbol...@gmail.com wrote: One way is to let puppet or whatever decide based on hostname, and pass the role that way. Or you could look at userdata, but that is hard to change later. Erik 26. mars 2014 18:47 skrev X. S. nordlicht1...@hotmail.de følgende: Hey! I have several choices to make regarding orchestration of VMs: - when and where should I assign a role to a template/VM? - Should I have a Database template, a Webserver template etc? Or should I just have one basic ubuntu template with chef/puppet installed and pass the role somehow differently to the VM (how?) so all the rest of the installation is taken care of by those tools? we use the following combination of tools/strategies: - match host names by regular expressions in puppet with this, every host with name www... has the role web-server node /^www.*\.example\.com$/ inherits 'web-server-node' { ... } you can also use 'if' or 'case' statement inside definitions/classes - specify a specific version of the package in puppet package { 'tomcat7': ensure = '7.0.26-1ubuntu1.2' } - a proxy repository for OS packages A caching proxy for the OS packages is a good measure to be able to control which packages are available for installation in your VMs. Even if the upstream repositories remove certain packages, your cache still keeps them. We use apt-cacher on Ubuntu 12.04. - Should I turn on automatic updates in Ubuntu and how often should I create a new, up to date template? - is puppet/chef really worth having to change the recipes on every minor new version and coming up with a recipe every time I want to install something new? Is there a way of installing security patches etc. automatically but handle new versions manually via chef or puppet? It depends on what you want to achieve. From your questions above, I have the impression that strict control of package versions is your goal. With puppet, you can be strict for certain packages and lenient for others, as you can also just specify that a package should just be present without giving a specific number: package { 'tomcat7': ensure = 'present' } or tell puppet to always upgrade to the latest version with 'ensure = latest'. cf. http://docs.puppetlabs.com/references/latest/type.html#package-attribute-ensure - I guess the best way for updates would be to start a new VM with the new software and one by one move the workload to the updated VMs. On the other hand this seems not very feasible for the daily updates on the OS level!? The way we do it is to create a template from a running VM, start that template, change the versions of the relevant packages in the puppet configuration to 'latest', and test the functionality. If everything is OK, the versions which have been tested are written into the puppet configuration and 'frozen' from that moment on until the next round of updates. I have been researching this for a few weeks. Maybe you can share a thing or two before my head explodes... Thank you! Lisa HTH Kambiz
Re: Virtual Router doesn't start
Hi, I hope I have understood what you wrote and created the following query correctly: select uip.vm_id, uip.network_id, uip.public_ip_address, n.state as nic_state, n.removed as nic_removed, vm.state as vm_state, vm.removed as vm_removed from user_ip_address uip join nics n on uip.vm_id = n.instance_id join vm_instance vm on uip.vm_id = vm.id where uip.id in (Select ip_address_id from firewall_rules fr where fr.network_id=205); +---++---+--+-+---++ | vm_id | network_id | public_ip_address | nic_state| nic_removed | vm_state | vm_removed | +---++---+--+-+---++ | 6 |205 | 10.193.17.169 | Allocated| NULL| Stopped | NULL | |10 |205 | 10.193.17.136 | Allocated| NULL| Stopped | NULL | |12 |205 | 10.193.17.140 | Allocated| NULL| Stopped | NULL | |13 |205 | 10.193.17.141 | Allocated| NULL| Stopped | NULL | |14 |205 | 10.193.17.142 | Allocated| NULL| Stopped | NULL | |15 |205 | 10.193.17.174 | Deallocating | 2014-03-18 23:00:53 | Expunging | NULL | |16 |205 | 10.193.17.103 | Allocated| NULL| Stopped | NULL | +---++---+--+-+---++ Is VM id 15 what you are looking for? Thank you Kambiz Alena Prokharchyk alena.prokharc...@citrix.com wrote: Kambiz, can you please try one more thing. 1) Locate all the firewall rules for your guest network (205, right?) Select id, ip_address_id from firewall_rules where network_id=205; 2) Now get all static nat enabled ip addresses for those rules: Select vm_id, network_id from user_ip_address where id in (Select id, ip_address_id from firewall_rules where network_id=205); For each vmId/networkId combo, check if there is non-removed nic and non-expunged vm. There might be some incorrect static nat ip/vm reference referring to vm that is removed already. If you find any, let me know and I will tell you how to clean it up -Alena. On 3/22/14, 5:41 AM, Kambiz Darabi dar...@m-creations.com wrote: Hi Alena, thank you for your help. The query returns no rows, i.e. nics.removed was not null, but I removed the row though to see what happens: a new virtual router was created which also couldn't be started due to the same NPE. I reverted the change by restoring from the dump. I have to mention that prior to the restart, r-7-VM was the router which was used by my instances. I deleted the router using the UI after the first occurrence of the NPE, because a post with a similar problem suggested that the deleted router would be recreated again (and this procedure solved the problem). Below I have attached the state of the two tables. Anything else I can try? Thank you Kambiz mysql select n.id, n.removed, n.ip4_address, n.netmask, n.gateway, n.ip_type, n.reserver_name, n.network_id, i.id as instance_id, i.name, i.state, i.type from vm_instance i join nics n on n.instance_id = i.id where i.type = 'DomainRouter'; ++-+---+---+-+ -+--++-+-+ ---+--+ | id | removed | ip4_address | netmask | gateway | ip_type | reserver_name| network_id | instance_id | name | state | type | ++-+---+---+-+ -+--++-+-+ ---+--+ | 9 | 2014-03-17 11:27:58 | 10.124.99.1 | 255.255.255.0 | NULL | NULL| ExternalGuestNetworkGuru |204 | 4 | r-4-VM | Expunging | DomainRouter | | 10 | 2014-03-17 11:27:58 | NULL | NULL | NULL | NULL| ControlNetworkGuru |202 | 4 | r-4-VM | Expunging | DomainRouter | | 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1 | NULL| PublicNetworkGuru|200 | 4 | r-4-VM | Expunging | DomainRouter | | 14 | 2014-03-17 11:27:52 | 10.124.99.1 | 255.255.255.0 | NULL | NULL| ExternalGuestNetworkGuru |205 | 7 | r-7-VM | Expunging | DomainRouter | | 15 | 2014-03-17 11:27:52 | NULL | NULL | NULL | NULL| ControlNetworkGuru |202 | 7 | r-7-VM | Expunging | DomainRouter | | 16 | 2014-03-17 11:27:52 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL| PublicNetworkGuru|200 | 7 | r-7-VM | Expunging | DomainRouter | | 26 | 2014-03-18 08:11:16 | 10.124.99.1 | 255.255.255.0
Re: Virtual Router doesn't start
Hi Alena, thank you for your help. The query returns no rows, i.e. nics.removed was not null, but I removed the row though to see what happens: a new virtual router was created which also couldn't be started due to the same NPE. I reverted the change by restoring from the dump. I have to mention that prior to the restart, r-7-VM was the router which was used by my instances. I deleted the router using the UI after the first occurrence of the NPE, because a post with a similar problem suggested that the deleted router would be recreated again (and this procedure solved the problem). Below I have attached the state of the two tables. Anything else I can try? Thank you Kambiz mysql select n.id, n.removed, n.ip4_address, n.netmask, n.gateway, n.ip_type, n.reserver_name, n.network_id, i.id as instance_id, i.name, i.state, i.type from vm_instance i join nics n on n.instance_id = i.id where i.type = 'DomainRouter'; ++-+---+---+-+-+--++-+-+---+--+ | id | removed | ip4_address | netmask | gateway | ip_type | reserver_name| network_id | instance_id | name| state | type | ++-+---+---+-+-+--++-+-+---+--+ | 9 | 2014-03-17 11:27:58 | 10.124.99.1 | 255.255.255.0 | NULL| NULL | ExternalGuestNetworkGuru |204 | 4 | r-4-VM | Expunging | DomainRouter | | 10 | 2014-03-17 11:27:58 | NULL | NULL | NULL| NULL | ControlNetworkGuru |202 | 4 | r-4-VM | Expunging | DomainRouter | | 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1 | NULL | PublicNetworkGuru|200 | 4 | r-4-VM | Expunging | DomainRouter | | 14 | 2014-03-17 11:27:52 | 10.124.99.1 | 255.255.255.0 | NULL| NULL | ExternalGuestNetworkGuru |205 | 7 | r-7-VM | Expunging | DomainRouter | | 15 | 2014-03-17 11:27:52 | NULL | NULL | NULL| NULL | ControlNetworkGuru |202 | 7 | r-7-VM | Expunging | DomainRouter | | 16 | 2014-03-17 11:27:52 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL | PublicNetworkGuru|200 | 7 | r-7-VM | Expunging | DomainRouter | | 26 | 2014-03-18 08:11:16 | 10.124.99.1 | 255.255.255.0 | NULL| NULL | ExternalGuestNetworkGuru |205 | 18 | r-18-VM | Expunging | DomainRouter | | 27 | 2014-03-18 08:11:16 | NULL | NULL | NULL| NULL | ControlNetworkGuru |202 | 18 | r-18-VM | Expunging | DomainRouter | | 28 | 2014-03-18 08:11:16 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL | PublicNetworkGuru|200 | 18 | r-18-VM | Expunging | DomainRouter | | 29 | NULL| 10.124.99.1 | 255.255.255.0 | NULL| NULL | ExternalGuestNetworkGuru |205 | 19 | r-19-VM | Stopped | DomainRouter | | 30 | NULL| NULL | NULL | NULL| NULL | ControlNetworkGuru |202 | 19 | r-19-VM | Stopped | DomainRouter | | 31 | NULL| 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL | PublicNetworkGuru|200 | 19 | r-19-VM | Stopped | DomainRouter | ++-+---+---+-+-+--++-+-+---+--+ mysql select * from router_network_ref; ++---+++ | id | router_id | network_id | guest_type | ++---+++ | 1 | 4 |204 | Isolated | | 2 | 7 |205 | Isolated | | 3 |18 |205 | Isolated | | 4 |19 |205 | Isolated | ++---+++ Alena Prokharchyk alena.prokharc...@citrix.com wrote: The error happens not because Ip is null, but because the nic in a certain network can¹t be found. Looks like there is some bug in VPC nic plug/unplug for Guest networks process. Kambiz, please do the following to fix it: 1) Stop the MS 2) Take the DB dump of cloud db in case you have to revert back. 3) Run the query: select * from router_network_ref where router_id=id of your VR) and network_id not in (select network_id from nics where instance_id=ID of your VR and removed is null); It will give you the list of networks refs that somehow weren¹t cleaned during the nic detach. Remove the entry returned from router_network_ref table. Let me know how it works. -Alena. On 3/21/14, 3:36 PM, Kambiz Darabi dar...@m-creations.com
Virtual Router doesn't start
Hello, as this is my first post to the list, I would like to thank all contributors for Cloudstack which I use since last fall without any problems. I run 4.1.1 with KVM and advanced networking. After a restart of the management server (stopping and starting the java process), the virtual domain router doesn't start and management-server.log shows a NullPointerException in NetworkModelImpl.getIpInNetwork (cf. stack trace below). By putting the server in debug mode and remote debugging, I found out that the reason is a row in the table nics which has NULL in ip (cf. row with id 30 in the result of the select statement below). What can I do to quickly solve this problem? Any pointers or suggestions are appreciated as the system is currently unusable. Thank you for your help Kambiz management-server.log: 2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking VirtualRouter to prepare for Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] 2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking Ovs to prepare for Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] 2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] 2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] 2014-03-18 10:03:27,151 WARN [network.element.VpcVirtualRouterElement] (Job-Executor-1:job-176) Network Ntwk[205|Guest|8] is not associated with any VPC 2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking NiciraNvp to prepare for Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] 2014-03-18 10:03:27,151 DEBUG [network.element.NiciraNvpElement] (Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service Connectivity on network net1 2014-03-18 10:03:27,153 DEBUG [cloud.network.NetworkModelImpl] (Job-Executor-1:job-176) Service SecurityGroup is not supported in the network id=205 2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Lock is acquired for network id 202 as a part of network implement 2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Network id=202 is already implemented 2014-03-18 10:03:27,157 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Lock is released for network id 202 as a part of network implement 2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking VirtualRouter to prepare for Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] 2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking Ovs to prepare for Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] 2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] 2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] 2014-03-18 10:03:27,187 WARN [network.element.VpcVirtualRouterElement] (Job-Executor-1:job-176) Network Ntwk[202|Control|3] is not associated with any VPC 2014-03-18 10:03:27,188 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking NiciraNvp to prepare for Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] 2014-03-18 10:03:27,188 DEBUG [network.element.NiciraNvpElement] (Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service Connectivity on network null 2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-1:job-176) Checking if we need to prepare 1 volumes for VM[DomainRouter|r-19-VM] 2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-1:job-176) No need to recreate the volume: Vol[24|vm=19|ROOT], since it already has a pool assigned: 200, adding disk to VM 2014-03-18 10:03:27,224 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-1:job-176) Boot Args for VM[DomainRouter|r-19-VM]: template=domP name=r-19-VM eth2ip=10.193.17.190 eth2mask=255.255.255.0 gateway=10.193.17.1 eth0ip=10.124.99.1 eth0mask=255.255.255.0 domain=cs6cloud.internal dhcprange=10.124.99.1 eth0ip=169.254.3.99 eth0mask=255.255.0.0 type=router disable_rp_filter=true dns1=10.193.17.1 2014-03-18 10:03:27,343 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-1:job-176) Found 8 ip(s) to apply as a part of domR VM[DomainRouter|r-19-VM] start. 2014-03-18 10:03:27,415 DEBUG