Re: VM orchestration, updating Best practices

2014-04-04 Thread Kambiz Darabi
Hi Lisa,

Erik Weber terbol...@gmail.com wrote:
 
 One way is to let puppet or whatever decide based on hostname, and pass the
 role that way. Or you could look at userdata, but that is hard to change
 later.

 Erik
 26. mars 2014 18:47 skrev X. S. nordlicht1...@hotmail.de følgende:

 Hey!

 I have several choices to make regarding orchestration of VMs:

 - when and where should I assign a role to a template/VM?

 - Should I have a Database template, a Webserver template etc? Or should I
 just have one basic ubuntu template with chef/puppet installed and pass the
 role somehow differently to the VM (how?) so all the rest of the
 installation is taken care of by those tools?

we use the following combination of tools/strategies:

- match host names by regular expressions in puppet

with this, every host with name www... has the role web-server

node /^www.*\.example\.com$/ inherits 'web-server-node' {
...
}

you can also use 'if' or 'case' statement inside definitions/classes

- specify a specific version of the package in puppet

package { 'tomcat7':
  ensure = '7.0.26-1ubuntu1.2'
}

- a proxy repository for OS packages

A caching proxy for the OS packages is a good measure to be able to
control which packages are available for installation in your VMs. Even
if the upstream repositories remove certain packages, your cache still
keeps them. We use apt-cacher on Ubuntu 12.04.


 - Should I turn on automatic updates in Ubuntu and how often should I
 create a new, up to date template?

 - is puppet/chef really worth having to change the recipes on every minor
 new version and coming up with a recipe every time I want to install
 something new? Is there a way of installing security patches etc.
 automatically but handle new versions manually via chef or puppet?

It depends on what you want to achieve. From your questions above, I
have the impression that strict control of package versions is your
goal. With puppet, you can be strict for certain packages and lenient
for others, as you can also just specify that a package should just be
present without giving a specific number:

package { 'tomcat7':
  ensure = 'present'
}

or tell puppet to always upgrade to the latest version with 'ensure =
latest'.

cf. 
http://docs.puppetlabs.com/references/latest/type.html#package-attribute-ensure

 - I guess the best way for updates would be to start a new VM with the new
 software and one by one move the workload to the updated VMs. On the other
 hand this seems not very feasible for the daily updates on the OS
 level!?

The way we do it is to create a template from a running VM, start that
template, change the versions of the relevant packages in the puppet
configuration to 'latest', and test the functionality.

If everything is OK, the versions which have been tested are written
into the puppet configuration and 'frozen' from that moment on until the
next round of updates.

 I have been researching this for a few weeks. Maybe you can share a thing
 or two before my head explodes...

 Thank you!
 Lisa

HTH

Kambiz


Re: Virtual Router doesn't start

2014-03-24 Thread Kambiz Darabi
Hi,

I hope I have understood what you wrote and created the following query
correctly:

select uip.vm_id, uip.network_id, uip.public_ip_address,
   n.state as nic_state, n.removed as nic_removed,
   vm.state as vm_state, vm.removed as vm_removed
from user_ip_address uip
 join nics n on uip.vm_id = n.instance_id
 join vm_instance vm on uip.vm_id = vm.id
where uip.id in (Select ip_address_id from firewall_rules fr where 
fr.network_id=205);


+---++---+--+-+---++
| vm_id | network_id | public_ip_address | nic_state| nic_removed | 
vm_state  | vm_removed |
+---++---+--+-+---++
| 6 |205 | 10.193.17.169 | Allocated| NULL| 
Stopped   | NULL   |
|10 |205 | 10.193.17.136 | Allocated| NULL| 
Stopped   | NULL   |
|12 |205 | 10.193.17.140 | Allocated| NULL| 
Stopped   | NULL   |
|13 |205 | 10.193.17.141 | Allocated| NULL| 
Stopped   | NULL   |
|14 |205 | 10.193.17.142 | Allocated| NULL| 
Stopped   | NULL   |
|15 |205 | 10.193.17.174 | Deallocating | 2014-03-18 23:00:53 | 
Expunging | NULL   |
|16 |205 | 10.193.17.103 | Allocated| NULL| 
Stopped   | NULL   |
+---++---+--+-+---++

Is VM id 15 what you are looking for?

Thank you


Kambiz

Alena Prokharchyk alena.prokharc...@citrix.com wrote:
 
 Kambiz, can you please try one more thing.

 1) Locate all the firewall rules for your guest network (205, right?)

 Select id, ip_address_id from firewall_rules where network_id=205;

 2) Now get all static nat enabled ip addresses for those rules:

 Select vm_id, network_id from user_ip_address where id in (Select id,
 ip_address_id from firewall_rules where network_id=205);

 For each vmId/networkId combo, check if there is non-removed nic and
 non-expunged vm. There might be some incorrect static nat ip/vm reference
 referring to vm that is removed already. If you find any, let me know and
 I will tell you how to clean it up

 -Alena.

 On 3/22/14, 5:41 AM, Kambiz Darabi dar...@m-creations.com wrote:

Hi Alena,

thank you for your help.

The query returns no rows, i.e. nics.removed was not null, but I removed
the row though to see what happens: a new virtual router was created
which also couldn't be started due to the same NPE. I reverted the
change by restoring from the dump.

I have to mention that prior to the restart, r-7-VM was the router which
was used by my instances. I deleted the router using the UI after the
first
occurrence of the NPE, because a post with a similar problem suggested
that the deleted router would be recreated again (and this procedure
solved the problem).

Below I have attached the state of the two tables.

Anything else I can try?

Thank you


Kambiz

mysql select n.id, n.removed, n.ip4_address, n.netmask, n.gateway,
n.ip_type, n.reserver_name, n.network_id, i.id as instance_id, i.name,
i.state, i.type from vm_instance i join nics n on n.instance_id = i.id
where i.type = 'DomainRouter';
++-+---+---+-+
-+--++-+-+
---+--+
| id | removed | ip4_address   | netmask   | gateway
| ip_type | reserver_name| network_id | instance_id | name
| state | type |
++-+---+---+-+
-+--++-+-+
---+--+
|  9 | 2014-03-17 11:27:58 | 10.124.99.1   | 255.255.255.0 | NULL
| NULL| ExternalGuestNetworkGuru |204 |   4 | r-4-VM
| Expunging | DomainRouter |
| 10 | 2014-03-17 11:27:58 | NULL  | NULL  | NULL
| NULL| ControlNetworkGuru   |202 |   4 | r-4-VM
| Expunging | DomainRouter |
| 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1
| NULL| PublicNetworkGuru|200 |   4 | r-4-VM
| Expunging | DomainRouter |
| 14 | 2014-03-17 11:27:52 | 10.124.99.1   | 255.255.255.0 | NULL
| NULL| ExternalGuestNetworkGuru |205 |   7 | r-7-VM
| Expunging | DomainRouter |
| 15 | 2014-03-17 11:27:52 | NULL  | NULL  | NULL
| NULL| ControlNetworkGuru   |202 |   7 | r-7-VM
| Expunging | DomainRouter |
| 16 | 2014-03-17 11:27:52 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1
| NULL| PublicNetworkGuru|200 |   7 | r-7-VM
| Expunging | DomainRouter |
| 26 | 2014-03-18 08:11:16 | 10.124.99.1   | 255.255.255.0

Re: Virtual Router doesn't start

2014-03-22 Thread Kambiz Darabi
Hi Alena,

thank you for your help.

The query returns no rows, i.e. nics.removed was not null, but I removed
the row though to see what happens: a new virtual router was created
which also couldn't be started due to the same NPE. I reverted the
change by restoring from the dump.

I have to mention that prior to the restart, r-7-VM was the router which
was used by my instances. I deleted the router using the UI after the first
occurrence of the NPE, because a post with a similar problem suggested
that the deleted router would be recreated again (and this procedure
solved the problem).

Below I have attached the state of the two tables.

Anything else I can try?

Thank you


Kambiz

mysql select n.id, n.removed, n.ip4_address, n.netmask, n.gateway, n.ip_type, 
n.reserver_name, n.network_id, i.id as instance_id, i.name, i.state, i.type 
from vm_instance i join nics n on n.instance_id = i.id where i.type = 
'DomainRouter';
++-+---+---+-+-+--++-+-+---+--+
| id | removed | ip4_address   | netmask   | gateway | 
ip_type | reserver_name| network_id | instance_id | name| state 
| type |
++-+---+---+-+-+--++-+-+---+--+
|  9 | 2014-03-17 11:27:58 | 10.124.99.1   | 255.255.255.0 | NULL| NULL 
   | ExternalGuestNetworkGuru |204 |   4 | r-4-VM  | Expunging 
| DomainRouter |
| 10 | 2014-03-17 11:27:58 | NULL  | NULL  | NULL| NULL 
   | ControlNetworkGuru   |202 |   4 | r-4-VM  | Expunging 
| DomainRouter |
| 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1 | NULL 
   | PublicNetworkGuru|200 |   4 | r-4-VM  | Expunging 
| DomainRouter |
| 14 | 2014-03-17 11:27:52 | 10.124.99.1   | 255.255.255.0 | NULL| NULL 
   | ExternalGuestNetworkGuru |205 |   7 | r-7-VM  | Expunging 
| DomainRouter |
| 15 | 2014-03-17 11:27:52 | NULL  | NULL  | NULL| NULL 
   | ControlNetworkGuru   |202 |   7 | r-7-VM  | Expunging 
| DomainRouter |
| 16 | 2014-03-17 11:27:52 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL 
   | PublicNetworkGuru|200 |   7 | r-7-VM  | Expunging 
| DomainRouter |
| 26 | 2014-03-18 08:11:16 | 10.124.99.1   | 255.255.255.0 | NULL| NULL 
   | ExternalGuestNetworkGuru |205 |  18 | r-18-VM | Expunging 
| DomainRouter |
| 27 | 2014-03-18 08:11:16 | NULL  | NULL  | NULL| NULL 
   | ControlNetworkGuru   |202 |  18 | r-18-VM | Expunging 
| DomainRouter |
| 28 | 2014-03-18 08:11:16 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL 
   | PublicNetworkGuru|200 |  18 | r-18-VM | Expunging 
| DomainRouter |
| 29 | NULL| 10.124.99.1   | 255.255.255.0 | NULL| NULL 
   | ExternalGuestNetworkGuru |205 |  19 | r-19-VM | Stopped   
| DomainRouter |
| 30 | NULL| NULL  | NULL  | NULL| NULL 
   | ControlNetworkGuru   |202 |  19 | r-19-VM | Stopped   
| DomainRouter |
| 31 | NULL| 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL 
   | PublicNetworkGuru|200 |  19 | r-19-VM | Stopped   
| DomainRouter |
++-+---+---+-+-+--++-+-+---+--+

mysql select * from router_network_ref;
++---+++
| id | router_id | network_id | guest_type |
++---+++
|  1 | 4 |204 | Isolated   |
|  2 | 7 |205 | Isolated   |
|  3 |18 |205 | Isolated   |
|  4 |19 |205 | Isolated   |
++---+++



Alena Prokharchyk alena.prokharc...@citrix.com wrote:
 
 The error happens not because Ip is null, but because the nic in a certain
 network can¹t be found. Looks like there is some bug in VPC nic
 plug/unplug for Guest networks process.

 Kambiz, please do the following to fix it:

 1) Stop the MS
 2) Take the DB dump of cloud db in case  you have to revert back.
 3) Run the query:

 select * from router_network_ref where router_id=id of your VR) and
 network_id not in (select network_id from nics where instance_id=ID of
 your VR and removed is null);

 It will give you the list of networks refs that somehow weren¹t cleaned
 during the nic detach. Remove the entry returned from router_network_ref
 table.

 Let me know how it works.

 -Alena.


 On 3/21/14, 3:36 PM, Kambiz Darabi dar...@m-creations.com

Virtual Router doesn't start

2014-03-21 Thread Kambiz Darabi
Hello,

as this is my first post to the list, I would like to thank all
contributors for Cloudstack which I use since last fall without any
problems. I run 4.1.1 with KVM and advanced networking.

After a restart of the management server (stopping and starting the java
process), the virtual domain router doesn't start and
management-server.log shows a NullPointerException in
NetworkModelImpl.getIpInNetwork (cf. stack trace below).

By putting the server in debug mode and remote debugging, I found out
that the reason is a row in the table nics which has NULL in ip (cf. row
with id 30 in the result of the select statement below).

What can I do to quickly solve this problem? Any pointers or suggestions
are appreciated as the system is currently unusable.

Thank you for your help


Kambiz


management-server.log:

2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking VirtualRouter to prepare for 
Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking Ovs to prepare for 
Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for 
Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for 
Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
2014-03-18 10:03:27,151 WARN  [network.element.VpcVirtualRouterElement] 
(Job-Executor-1:job-176) Network Ntwk[205|Guest|8] is not associated with any 
VPC
2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking NiciraNvp to prepare for 
Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
2014-03-18 10:03:27,151 DEBUG [network.element.NiciraNvpElement] 
(Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service 
Connectivity on network net1
2014-03-18 10:03:27,153 DEBUG [cloud.network.NetworkModelImpl] 
(Job-Executor-1:job-176) Service SecurityGroup is not supported in the network 
id=205
2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Lock is acquired for network id 202 as a part of 
network implement
2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Network id=202 is already implemented
2014-03-18 10:03:27,157 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Lock is released for network id 202 as a part of 
network implement
2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking VirtualRouter to prepare for 
Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking Ovs to prepare for 
Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for 
Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for 
Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
2014-03-18 10:03:27,187 WARN  [network.element.VpcVirtualRouterElement] 
(Job-Executor-1:job-176) Network Ntwk[202|Control|3] is not associated with any 
VPC
2014-03-18 10:03:27,188 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking NiciraNvp to prepare for 
Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
2014-03-18 10:03:27,188 DEBUG [network.element.NiciraNvpElement] 
(Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service 
Connectivity on network null
2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl] 
(Job-Executor-1:job-176) Checking if we need to prepare 1 volumes for 
VM[DomainRouter|r-19-VM]
2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl] 
(Job-Executor-1:job-176) No need to recreate the volume: Vol[24|vm=19|ROOT], 
since it already has a pool assigned: 200, adding disk to VM
2014-03-18 10:03:27,224 DEBUG 
[network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-1:job-176) 
Boot Args for VM[DomainRouter|r-19-VM]:  template=domP name=r-19-VM 
eth2ip=10.193.17.190 eth2mask=255.255.255.0 gateway=10.193.17.1 
eth0ip=10.124.99.1 eth0mask=255.255.255.0 domain=cs6cloud.internal 
dhcprange=10.124.99.1 eth0ip=169.254.3.99 eth0mask=255.255.0.0 type=router 
disable_rp_filter=true dns1=10.193.17.1
2014-03-18 10:03:27,343 DEBUG 
[network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-1:job-176) 
Found 8 ip(s) to apply as a part of domR VM[DomainRouter|r-19-VM] start.
2014-03-18 10:03:27,415 DEBUG