Re: Virtual Router doesn't start
I executed > update nics set device_id = 1 where id = 29; After restarting the router, the interfaces file now looks like this: root@r-19-VM:~# cat /etc/network/interfaces auto lo eth0 eth1 eth2 iface lo inet loopback iface eth0 inet static address 169.254.3.155 netmask 255.255.0.0 iface eth1 inet static address 10.124.99.1 netmask 255.255.255.0 iface eth2 inet static address 10.193.17.190 netmask 255.255.255.0 ifconfig shows this: root@r-19-VM:~# ifconfig eth0 Link encap:Ethernet HWaddr 0e:00:a9:fe:03:9b inet addr:169.254.3.155 Bcast:169.254.255.255 Mask:255.255.0.0 inet6 addr: fe80::c00:a9ff:fefe:39b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:61 errors:0 dropped:0 overruns:0 frame:0 TX packets:11 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3930 (3.8 KiB) TX bytes:730 (730.0 B) eth1 Link encap:Ethernet HWaddr 02:00:2a:43:00:0d inet addr:10.124.99.1 Bcast:10.124.99.255 Mask:255.255.255.0 inet6 addr: fe80::2aff:fe43:d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:12 errors:0 dropped:0 overruns:0 frame:0 TX packets:3 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:936 (936.0 B) TX bytes:318 (318.0 B) eth2 Link encap:Ethernet HWaddr 06:7e:fe:00:00:bf inet addr:10.193.17.190 Bcast:10.193.17.255 Mask:255.255.255.0 inet6 addr: fe80::47e:feff:fe00:bf/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11 errors:0 dropped:0 overruns:0 frame:0 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:846 (846.0 B) TX bytes:696 (696.0 B) >From inside the VM, I can ping the gateways of the public, guest and control network: root@r-19-VM:~# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 10.124.99.0 0.0.0.0 255.255.255.0 U 0 00 eth1 10.193.17.0 0.0.0.0 255.255.255.0 U 0 00 eth2 169.254.0.0 0.0.0.0 255.255.0.0 U 0 00 eth0 0.0.0.0 10.193.17.1 0.0.0.0 UG0 00 eth2 root@r-19-VM:~# ping 10.193.17.1 PING 10.193.17.1 (10.193.17.1): 56 data bytes 64 bytes from 10.193.17.1: icmp_seq=0 ttl=64 time=1.194 ms 64 bytes from 10.193.17.1: icmp_seq=1 ttl=64 time=0.329 ms root@r-19-VM:~# ping 10.124.99.1 PING 10.124.99.1 (10.124.99.1): 56 data bytes 64 bytes from 10.124.99.1: icmp_seq=0 ttl=64 time=0.128 ms root@r-19-VM:/etc/init.d# ping 169.254.0.1 PING 169.254.0.1 (169.254.0.1): 56 data bytes 64 bytes from 169.254.0.1: icmp_seq=0 ttl=64 time=0.292 ms And from outside, I can ping the different IPs of the router. But what is strange, is that in agent.log, I still find Ping command port, 169.254.3.155:3922 Trying to connect to 169.254.3.155 Could not connect to 169.254.3.155 And when I check on the router, the ssh daemon only listens on the guest network interface: # netstat -na | grep 3922 tcp0 0 10.124.99.1:39220.0.0.0:* LISTEN So, the connection attempt to 169.254.3.155:3922 fails: telnet 169.254.3.155 3922 Trying 169.254.3.155... Is that the normal situation? > Also after which point you started experiencing all this problems? Did > you upgrade to new CS version? Or does it fail for any specific network? No, I didn't upgrade to a new CS, I just stopped and started the management-server. Thanks Kambiz Alena Prokharchyk wrote: > > Kambiz, did you check the device id in nics table for the vm? If it has 2 > 0s, change one of them to the correct value and restart. If the testing > completes fine, we have a proof that its related to the device id mix up. > If not, there is gotta be something else, most likely misconfigured on KVM > stuff. > > Also after which point you started experiencing all this problems? Did > you upgrade to new CS version? Or does it fail for any specific network? > > -Alena. > > On 3/25/14, 2:11 PM, "Kambiz Darabi" wrote: > >>I updated nics.gateway for that network, but the VM still shows the same >>behaviour. >> >>If one compares interfaces: >> >>root@r-19-VM:~# cat /etc/network/interfaces >>auto lo eth0 eth1 eth2 >>iface lo inet loopback >> >>iface eth0 inet static >> address 169.254.1.242 >> netmask 255.255.0.0 >>iface eth1 inet static >> address 10.193.17.1 >> netmask >>iface eth2 inet static >> address 10.193.17.190 >> netmask 255.255.255.0 >> >>and the nics entry in the management-server.log (cf. below), one can see >>that eth0 is the second nic with deviceId 0 of type 'Control', eth2 is >>correctly set up with IP 10.193.17.190. >> >>The remaining nic is eth1 which corresponds to the first nic with >>deviceId 0 and ac
Re: Virtual Router doesn't start
I updated nics.gateway for that network, but the VM still shows the same behaviour. If one compares interfaces: root@r-19-VM:~# cat /etc/network/interfaces auto lo eth0 eth1 eth2 iface lo inet loopback iface eth0 inet static address 169.254.1.242 netmask 255.255.0.0 iface eth1 inet static address 10.193.17.1 netmask iface eth2 inet static address 10.193.17.190 netmask 255.255.255.0 and the nics entry in the management-server.log (cf. below), one can see that eth0 is the second nic with deviceId 0 of type 'Control', eth2 is correctly set up with IP 10.193.17.190. The remaining nic is eth1 which corresponds to the first nic with deviceId 0 and according to the nics entry should have IP 10.124.99.1, but there is no iface entry with that IP, but eth1 has the gateway address of the public nic as its IP address. Could the problem have something to do with the duplicate deviceId 0? Thanks Kambiz { "nics": [{"deviceId":2, "networkRateMbps":200, "defaultNic":true, "uuid":"22c19454-fd05-45c8-af6b-5f0ef073f86c", "ip":"10.193.17.190", "netmask":"255.255.255.0", "gateway":"10.193.17.1", "mac":"06:7e:fe:00:00:bf", "dns1":"10.193.17.1", "broadcastType":"Vlan", "type":"Public", "broadcastUri":"vlan://untagged", "isolationUri":"vlan://untagged", "isSecurityGroupEnabled":false, "name":"cloudbr0"}, {"deviceId":0, "networkRateMbps":200, "defaultNic":false, "uuid":"6c5a8337-620e-49eb-9309-cdfc7039d4a8", "ip":"10.124.99.1", "netmask":"255.255.255.0", "gateway":"10.124.99.1", "mac":"02:00:2a:43:00:0d", "dns1":"10.193.17.1", "broadcastType":"Vlan", "type":"Guest", "broadcastUri":"vlan://3925", "isolationUri":"vlan://3925", "isSecurityGroupEnabled":false, "name":"cloudbr1"}, {"deviceId":0, "networkRateMbps":-1, "defaultNic":false, "uuid":"cabd4cd9-c39f-423f-ad6a-ee3affe0bd9d", "ip":"169.254.1.242", "netmask":"255.255.0.0", "gateway":"169.254.0.1", "mac":"0e:00:a9:fe:01:f2", "broadcastType":"LinkLocal", "type":"Control", "isSecurityGroupEnabled":false} ] } Alena Prokharchyk wrote: > > So the gateway wasn’t set for the nic only. > > Kambiz, just to quickly test it, can you set missing gateway for the nic, > and stop/start the VR? And see if the start is completed normally, just to > find out if the missing gateway was the reason of the communication failure > > On 3/25/14, 1:31 PM, "Kambiz Darabi" wrote: > >>Hi, >> >>select id, name, traffic_type, broadcast_domain_type, cidr, gateway, >>mode, state, removed from networks where id = 205; >>+-+-+--+---++- >>+--+---+-+ >>| id | name| traffic_type | broadcast_domain_type | cidr | >>gateway | mode | state | removed | >>+-+-+--+---++- >>+--+---+-+ >>| 205 | default | Guest| Vlan | 10.124.99.0/24 | >>10.124.99.1 | Dhcp | Allocated | NULL| >>+-+-+--+---++- >>+--+---+-+ >> >>Cheers >> >> >>Kambiz >> >>Alena Prokharchyk wrote: >>> >>> No, it doesn’t seem right to me having 2 nics with device id 0. But >>>looks >>> like they’ve got programmed to correct devices on the backend per your >>> prev email? >>> >>> iface eth0 inet static >>> address 169.254.1.59 >>> netmask 255.255.0.0 >>> >>> iface eth1 inet static >>> address 10.193.17.1 >>> Netmask >>> >>> iface eth2 inet static >>> address 10.193.17.190 >>> netmask 255.255.255.0 >>> >>> >>> >>> I can see that only one parameter is missing from the start command, the >>> second nic (network id=205) doesn’t have the gateway. >>> From the command/DB, I see that the gateway is missing in the nics table >>> for the network 205? Can you check gateway information in the networks >>> table for the id=205 >>> >>> >>> >>> >>> On 3/25/14, 1:01 PM, "Kambiz Darabi" wrote: >>> Hi, select id,ip4_address,netmask,gateway,state,removed,network_id,reserver_name from nics where instance_id=19; ++---+---+-+---+ -+ +--+ | id | ip4_address | netmask | gateway | state | removed | network_id | reserver_name| ++---+---+-+---+ -+ +--+ | 29 | 10.124.99.1 | 255.255.255.0 | NULL| Allocated | NULL |205 | ExternalGuestNetworkGuru | | 30 | NULL | NULL | NULL| Allocated | NULL |202 | ControlNetworkGuru | | 31 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | Allocated | NULL |200 | PublicNetwork
Re: Virtual Router doesn't start
Hi, select id, name, traffic_type, broadcast_domain_type, cidr, gateway, mode, state, removed from networks where id = 205; +-+-+--+---++-+--+---+-+ | id | name| traffic_type | broadcast_domain_type | cidr | gateway | mode | state | removed | +-+-+--+---++-+--+---+-+ | 205 | default | Guest| Vlan | 10.124.99.0/24 | 10.124.99.1 | Dhcp | Allocated | NULL| +-+-+--+---++-+--+---+-+ Cheers Kambiz Alena Prokharchyk wrote: > > No, it doesn’t seem right to me having 2 nics with device id 0. But looks > like they’ve got programmed to correct devices on the backend per your > prev email? > > iface eth0 inet static > address 169.254.1.59 > netmask 255.255.0.0 > > iface eth1 inet static > address 10.193.17.1 > Netmask > > iface eth2 inet static > address 10.193.17.190 > netmask 255.255.255.0 > > > > I can see that only one parameter is missing from the start command, the > second nic (network id=205) doesn’t have the gateway. > From the command/DB, I see that the gateway is missing in the nics table > for the network 205? Can you check gateway information in the networks > table for the id=205 > > > > > On 3/25/14, 1:01 PM, "Kambiz Darabi" wrote: > >>Hi, >> >>select >>id,ip4_address,netmask,gateway,state,removed,network_id,reserver_name >>from nics where instance_id=19; >> >>++---+---+-+---+-+ >>+--+ >>| id | ip4_address | netmask | gateway | state | removed >>| network_id | reserver_name| >>++---+---+-+---+-+ >>+--+ >>| 29 | 10.124.99.1 | 255.255.255.0 | NULL| Allocated | NULL >>|205 | ExternalGuestNetworkGuru | >>| 30 | NULL | NULL | NULL| Allocated | NULL >>|202 | ControlNetworkGuru | >>| 31 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | Allocated | NULL >>|200 | PublicNetworkGuru| >>++---+---+-+---+-+ >>+--+ >> >>and this is the nics element from the StartCmd. Is it normal to have >>two nics with deviceId 0? >> >>"nics":[ >>{"deviceId":2, >> "networkRateMbps":200, >> "defaultNic":true, >> "uuid":"22c19454-fd05-45c8-af6b-5f0ef073f86c", >> "ip":"10.193.17.190", >> "netmask":"255.255.255.0", >> "gateway":"10.193.17.1", >> "mac":"06:7e:fe:00:00:bf", >> "dns1":"10.193.17.1", >> "broadcastType":"Vlan", >> "type":"Public", >> "broadcastUri":"vlan://untagged", >> "isolationUri":"vlan://untagged", >> "isSecurityGroupEnabled":false, >> "name":"cloudbr0"}, >>{"deviceId":0, >> "networkRateMbps":200, >> "defaultNic":false, >> "uuid":"6c5a8337-620e-49eb-9309-cdfc7039d4a8", >> "ip":"10.124.99.1", >> "netmask":"255.255.255.0", >> "mac":"02:00:2a:43:00:0d", >> "dns1":"10.193.17.1", >> "broadcastType":"Vlan", >> "type":"Guest", >> "broadcastUri":"vlan://3949", >> "isolationUri":"vlan://3949", >> "isSecurityGroupEnabled":false, >> "name":"cloudbr1"}, >>{"deviceId":0, >> "networkRateMbps":-1, >> "defaultNic":false, >> "uuid":"cabd4cd9-c39f-423f-ad6a-ee3affe0bd9d", >> "ip":"169.254.1.59", >> "netmask":"255.255.0.0", >> "gateway":"169.254.0.1", >> "mac":"0e:00:a9:fe:01:3b", >> "broadcastType":"LinkLocal", >> "type":"Control", >> "isSecurityGroupEnabled":false} >>] >> >>Thanks >> >> >>Kambiz >> >>Alena Prokharchyk wrote: >>> >>> Kambiz, the debug statements below are for the case when eth1 is a >>>control >>> interface as it was in your old command. I’ve looked at the new command, >>> eth1 is not control, its either public or guest >>> >>> eth0: - control >>> >>> iface eth0 inet static >>> address 169.254.1.59 >>> netmask 255.255.0.0 >>> >>> eth1: >>> >>> iface eth1 inet static >>> address 10.193.17.1 >>> Netmask >>> >>> So you need to execute the mysql statements for the traffic type of VR >>>nic >>> eth1 >>> >>> -Alena. >>> >>> >>> >>> >>> On 3/25/14, 9:57 AM, "Alena Prokharchyk" >>> wrote: >>> Kambiz, can you please check the following: 1) Check if the gateway is set on control network: mysql> select gateway, cidr from networks where traffic_type=‘Control’; 2) For router control nic, check if network/gateway are set. Select gateway,netmask from nics where instance_id= and network_id= -Alena. On 3/25/14, 5:47 AM, "Kambiz Darabi" wrote: >Hi, > >I looked up the
Re: Virtual Router doesn't start
No, it doesn’t seem right to me having 2 nics with device id 0. But looks like they’ve got programmed to correct devices on the backend per your prev email? iface eth0 inet static address 169.254.1.59 netmask 255.255.0.0 iface eth1 inet static address 10.193.17.1 Netmask iface eth2 inet static address 10.193.17.190 netmask 255.255.255.0 I can see that only one parameter is missing from the start command, the second nic (network id=205) doesn’t have the gateway. From the command/DB, I see that the gateway is missing in the nics table for the network 205? Can you check gateway information in the networks table for the id=205 On 3/25/14, 1:01 PM, "Kambiz Darabi" wrote: >Hi, > >select >id,ip4_address,netmask,gateway,state,removed,network_id,reserver_name >from nics where instance_id=19; > >++---+---+-+---+-+ >+--+ >| id | ip4_address | netmask | gateway | state | removed >| network_id | reserver_name| >++---+---+-+---+-+ >+--+ >| 29 | 10.124.99.1 | 255.255.255.0 | NULL| Allocated | NULL >|205 | ExternalGuestNetworkGuru | >| 30 | NULL | NULL | NULL| Allocated | NULL >|202 | ControlNetworkGuru | >| 31 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | Allocated | NULL >|200 | PublicNetworkGuru| >++---+---+-+---+-+ >+--+ > >and this is the nics element from the StartCmd. Is it normal to have >two nics with deviceId 0? > >"nics":[ >{"deviceId":2, > "networkRateMbps":200, > "defaultNic":true, > "uuid":"22c19454-fd05-45c8-af6b-5f0ef073f86c", > "ip":"10.193.17.190", > "netmask":"255.255.255.0", > "gateway":"10.193.17.1", > "mac":"06:7e:fe:00:00:bf", > "dns1":"10.193.17.1", > "broadcastType":"Vlan", > "type":"Public", > "broadcastUri":"vlan://untagged", > "isolationUri":"vlan://untagged", > "isSecurityGroupEnabled":false, > "name":"cloudbr0"}, >{"deviceId":0, > "networkRateMbps":200, > "defaultNic":false, > "uuid":"6c5a8337-620e-49eb-9309-cdfc7039d4a8", > "ip":"10.124.99.1", > "netmask":"255.255.255.0", > "mac":"02:00:2a:43:00:0d", > "dns1":"10.193.17.1", > "broadcastType":"Vlan", > "type":"Guest", > "broadcastUri":"vlan://3949", > "isolationUri":"vlan://3949", > "isSecurityGroupEnabled":false, > "name":"cloudbr1"}, >{"deviceId":0, > "networkRateMbps":-1, > "defaultNic":false, > "uuid":"cabd4cd9-c39f-423f-ad6a-ee3affe0bd9d", > "ip":"169.254.1.59", > "netmask":"255.255.0.0", > "gateway":"169.254.0.1", > "mac":"0e:00:a9:fe:01:3b", > "broadcastType":"LinkLocal", > "type":"Control", > "isSecurityGroupEnabled":false} >] > >Thanks > > >Kambiz > >Alena Prokharchyk wrote: >> >> Kambiz, the debug statements below are for the case when eth1 is a >>control >> interface as it was in your old command. I’ve looked at the new command, >> eth1 is not control, its either public or guest >> >> eth0: - control >> >> iface eth0 inet static >> address 169.254.1.59 >> netmask 255.255.0.0 >> >> eth1: >> >> iface eth1 inet static >> address 10.193.17.1 >> Netmask >> >> So you need to execute the mysql statements for the traffic type of VR >>nic >> eth1 >> >> -Alena. >> >> >> >> >> On 3/25/14, 9:57 AM, "Alena Prokharchyk" >> wrote: >> >>>Kambiz, can you please check the following: >>> >>> >>>1) Check if the gateway is set on control network: >>> >>>mysql> select gateway, cidr from networks where traffic_type=‘Control’; >>> >>>2) For router control nic, check if network/gateway are set. >>> >>>Select gateway,netmask from nics where instance_id= and >>>network_id= >>> >>>-Alena. >>> >>>On 3/25/14, 5:47 AM, "Kambiz Darabi" wrote: >>> Hi, I looked up the startup command of the old router instance which worked correctly: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l r-7-VM -t all -d /var/lib/libvirt/images/r-7-VM-patchdisk -p %template=domP%name=r-7-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0% ga t eway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6cl ou d .internal%dhcprange=10.124.99.1%eth1ip=169.254.2.46%eth1mask=255.255.0. 0% t ype=router%disable_rp_filter=true%dns1=10.193.17.1 The new command (cf. below) doesn't have the parameters eth1ip and eth1mask. Thanks Kambiz Alena Prokharchyk wrote: > > I don’t think its relevant as the piece we’ve fixed, just eliminated > static nat rule programming for non-existing vm. Missing netmask on >eth1 > doesn’t seem related to the problem (although we have t
Re: Virtual Router doesn't start
Hi, select id,ip4_address,netmask,gateway,state,removed,network_id,reserver_name from nics where instance_id=19; ++---+---+-+---+-++--+ | id | ip4_address | netmask | gateway | state | removed | network_id | reserver_name| ++---+---+-+---+-++--+ | 29 | 10.124.99.1 | 255.255.255.0 | NULL| Allocated | NULL| 205 | ExternalGuestNetworkGuru | | 30 | NULL | NULL | NULL| Allocated | NULL| 202 | ControlNetworkGuru | | 31 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | Allocated | NULL| 200 | PublicNetworkGuru| ++---+---+-+---+-++--+ and this is the nics element from the StartCmd. Is it normal to have two nics with deviceId 0? "nics":[ {"deviceId":2, "networkRateMbps":200, "defaultNic":true, "uuid":"22c19454-fd05-45c8-af6b-5f0ef073f86c", "ip":"10.193.17.190", "netmask":"255.255.255.0", "gateway":"10.193.17.1", "mac":"06:7e:fe:00:00:bf", "dns1":"10.193.17.1", "broadcastType":"Vlan", "type":"Public", "broadcastUri":"vlan://untagged", "isolationUri":"vlan://untagged", "isSecurityGroupEnabled":false, "name":"cloudbr0"}, {"deviceId":0, "networkRateMbps":200, "defaultNic":false, "uuid":"6c5a8337-620e-49eb-9309-cdfc7039d4a8", "ip":"10.124.99.1", "netmask":"255.255.255.0", "mac":"02:00:2a:43:00:0d", "dns1":"10.193.17.1", "broadcastType":"Vlan", "type":"Guest", "broadcastUri":"vlan://3949", "isolationUri":"vlan://3949", "isSecurityGroupEnabled":false, "name":"cloudbr1"}, {"deviceId":0, "networkRateMbps":-1, "defaultNic":false, "uuid":"cabd4cd9-c39f-423f-ad6a-ee3affe0bd9d", "ip":"169.254.1.59", "netmask":"255.255.0.0", "gateway":"169.254.0.1", "mac":"0e:00:a9:fe:01:3b", "broadcastType":"LinkLocal", "type":"Control", "isSecurityGroupEnabled":false} ] Thanks Kambiz Alena Prokharchyk wrote: > > Kambiz, the debug statements below are for the case when eth1 is a control > interface as it was in your old command. I’ve looked at the new command, > eth1 is not control, its either public or guest > > eth0: - control > > iface eth0 inet static > address 169.254.1.59 > netmask 255.255.0.0 > > eth1: > > iface eth1 inet static > address 10.193.17.1 > Netmask > > So you need to execute the mysql statements for the traffic type of VR nic > eth1 > > -Alena. > > > > > On 3/25/14, 9:57 AM, "Alena Prokharchyk" > wrote: > >>Kambiz, can you please check the following: >> >> >>1) Check if the gateway is set on control network: >> >>mysql> select gateway, cidr from networks where traffic_type=‘Control’; >> >>2) For router control nic, check if network/gateway are set. >> >>Select gateway,netmask from nics where instance_id= and >>network_id= >> >>-Alena. >> >>On 3/25/14, 5:47 AM, "Kambiz Darabi" wrote: >> >>>Hi, >>> >>>I looked up the startup command of the old router instance which worked >>>correctly: >>> >>>/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l >>>r-7-VM -t all -d /var/lib/libvirt/images/r-7-VM-patchdisk -p >>>%template=domP%name=r-7-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%ga >>>t >>>eway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6clou >>>d >>>.internal%dhcprange=10.124.99.1%eth1ip=169.254.2.46%eth1mask=255.255.0.0% >>>t >>>ype=router%disable_rp_filter=true%dns1=10.193.17.1 >>> >>>The new command (cf. below) doesn't have the parameters eth1ip and >>>eth1mask. >>> >>>Thanks >>> >>> >>>Kambiz >>> >>>Alena Prokharchyk wrote: I don’t think its relevant as the piece we’ve fixed, just eliminated static nat rule programming for non-existing vm. Missing netmask on eth1 doesn’t seem related to the problem (although we have to figure out why its missing), as the connection that fails, happening to link local 169.x eth0 interface. Edison, can you please tell us how to debug link local connection failure, on KVM agent? Thank you, Alena. On 3/24/14, 1:47 PM, "Kambiz Darabi" wrote: >Hi, > >thank you, the NullPointerException doesn't occur any more, but there >still seems to be a problem during startup of the router. > >When I start the virtual router, it comes up, but in agent.log, there >are lots of 'Could not connect to 169.254.1.x' messages. > >Then I logged into the virtual router to find out that the netmask of >eth1 is missing in the interfaces file: > >root@host:~# virsh console r-19-VM >Connected to domain r-19-VM >Escape character is ^] >
Re: Virtual Router doesn't start
Kambiz, the debug statements below are for the case when eth1 is a control interface as it was in your old command. I’ve looked at the new command, eth1 is not control, its either public or guest eth0: - control iface eth0 inet static address 169.254.1.59 netmask 255.255.0.0 eth1: iface eth1 inet static address 10.193.17.1 Netmask So you need to execute the mysql statements for the traffic type of VR nic eth1 -Alena. On 3/25/14, 9:57 AM, "Alena Prokharchyk" wrote: >Kambiz, can you please check the following: > > >1) Check if the gateway is set on control network: > >mysql> select gateway, cidr from networks where traffic_type=‘Control’; > >2) For router control nic, check if network/gateway are set. > >Select gateway,netmask from nics where instance_id= and >network_id= > >-Alena. > >On 3/25/14, 5:47 AM, "Kambiz Darabi" wrote: > >>Hi, >> >>I looked up the startup command of the old router instance which worked >>correctly: >> >>/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l >>r-7-VM -t all -d /var/lib/libvirt/images/r-7-VM-patchdisk -p >>%template=domP%name=r-7-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%ga >>t >>eway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6clou >>d >>.internal%dhcprange=10.124.99.1%eth1ip=169.254.2.46%eth1mask=255.255.0.0% >>t >>ype=router%disable_rp_filter=true%dns1=10.193.17.1 >> >>The new command (cf. below) doesn't have the parameters eth1ip and >>eth1mask. >> >>Thanks >> >> >>Kambiz >> >>Alena Prokharchyk wrote: >>> >>> I don’t think its relevant as the piece we’ve fixed, just eliminated >>> static nat rule programming for non-existing vm. Missing netmask on >>>eth1 >>> doesn’t seem related to the problem (although we have to figure out why >>> its missing), as the connection that fails, happening to link local >>>169.x >>> eth0 interface. >>> >>> Edison, can you please tell us how to debug link local connection >>>failure, >>> on KVM agent? >>> >>> Thank you, >>> Alena. >>> >>> On 3/24/14, 1:47 PM, "Kambiz Darabi" wrote: >>> Hi, thank you, the NullPointerException doesn't occur any more, but there still seems to be a problem during startup of the router. When I start the virtual router, it comes up, but in agent.log, there are lots of 'Could not connect to 169.254.1.x' messages. Then I logged into the virtual router to find out that the netmask of eth1 is missing in the interfaces file: root@host:~# virsh console r-19-VM Connected to domain r-19-VM Escape character is ^] Debian GNU/Linux 6.0 r-19-VM ttyS0 r-19-VM login: root ... root@r-19-VM:~# cat /etc/network/interfaces auto lo eth0 eth1 eth2 iface lo inet loopback iface eth0 inet static address 169.254.1.59 netmask 255.255.0.0 iface eth1 inet static address 10.193.17.1 netmask iface eth2 inet static address 10.193.17.190 netmask 255.255.255.0 I don't know if it is relevant, but this is the line from agent.log where the parameters are visible: 2014-03-24 21:36:17,681 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Executing: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l r-19-VM -t all -d /var/lib/libvirt/images/r-19-VM-patchdisk -p %template=domP%name=r-19-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0 % ga teway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6c l ou d.internal%dhcprange=10.124.99.1%eth0ip=169.254.1.60%eth0mask=255.255.0 . 0% type=router%disable_rp_filter=true%dns1=10.193.17.1 Any hint is appreciated. Thanks Kambiz Alena Prokharchyk wrote: > > Yes, Kambiz, you followed up right, and vm id=15 is the culprit. If >vm > id=15 is expunged, we have to clear out the reference to it from > user_ip_address table. Here is the flow: > > 1) Save the db dump. > 2) Run the query to cleanup the reference: > > Update user_ip_address set one_to_one_nat=0, instance_id=null where > id= > > > > Let me know how it works. > > -Alena. > > On 3/24/14, 10:55 AM, "Kambiz Darabi" wrote: > >>Hi, >> >>I hope I have understood what you wrote and created the following >>query >>correctly: >> >>select uip.vm_id, uip.network_id, uip.public_ip_address, >> n.state as nic_state, n.removed as nic_removed, >> vm.state as vm_state, vm.removed as vm_removed >>from user_ip_address uip >> join nics n on uip.vm_id = n.instance_id >> join vm_instance vm on uip.vm_id = vm.id >>where uip.id in (Select ip_address_id from firewall_rules fr where >>fr.network_id=205); >> >> >>+---++---+--+ >>- >>-- >>-- >
Re: Virtual Router doesn't start
Kambiz, can you please check the following: 1) Check if the gateway is set on control network: mysql> select gateway, cidr from networks where traffic_type=‘Control’; 2) For router control nic, check if network/gateway are set. Select gateway,netmask from nics where instance_id= and network_id= -Alena. On 3/25/14, 5:47 AM, "Kambiz Darabi" wrote: >Hi, > >I looked up the startup command of the old router instance which worked >correctly: > >/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l >r-7-VM -t all -d /var/lib/libvirt/images/r-7-VM-patchdisk -p >%template=domP%name=r-7-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%gat >eway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6cloud >.internal%dhcprange=10.124.99.1%eth1ip=169.254.2.46%eth1mask=255.255.0.0%t >ype=router%disable_rp_filter=true%dns1=10.193.17.1 > >The new command (cf. below) doesn't have the parameters eth1ip and >eth1mask. > >Thanks > > >Kambiz > >Alena Prokharchyk wrote: >> >> I don’t think its relevant as the piece we’ve fixed, just eliminated >> static nat rule programming for non-existing vm. Missing netmask on eth1 >> doesn’t seem related to the problem (although we have to figure out why >> its missing), as the connection that fails, happening to link local >>169.x >> eth0 interface. >> >> Edison, can you please tell us how to debug link local connection >>failure, >> on KVM agent? >> >> Thank you, >> Alena. >> >> On 3/24/14, 1:47 PM, "Kambiz Darabi" wrote: >> >>>Hi, >>> >>>thank you, the NullPointerException doesn't occur any more, but there >>>still seems to be a problem during startup of the router. >>> >>>When I start the virtual router, it comes up, but in agent.log, there >>>are lots of 'Could not connect to 169.254.1.x' messages. >>> >>>Then I logged into the virtual router to find out that the netmask of >>>eth1 is missing in the interfaces file: >>> >>>root@host:~# virsh console r-19-VM >>>Connected to domain r-19-VM >>>Escape character is ^] >>> >>>Debian GNU/Linux 6.0 r-19-VM ttyS0 >>> >>>r-19-VM login: root >>>... >>>root@r-19-VM:~# cat /etc/network/interfaces >>>auto lo eth0 eth1 eth2 >>>iface lo inet loopback >>> >>>iface eth0 inet static >>> address 169.254.1.59 >>> netmask 255.255.0.0 >>>iface eth1 inet static >>> address 10.193.17.1 >>> netmask >>>iface eth2 inet static >>> address 10.193.17.190 >>> netmask 255.255.255.0 >>> >>>I don't know if it is relevant, but this is the line from agent.log >>>where the parameters are visible: >>> >>>2014-03-24 21:36:17,681 DEBUG [kvm.resource.LibvirtComputingResource] >>>(agentRequest-Handler-2:null) Executing: >>>/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l >>>r-19-VM -t all -d /var/lib/libvirt/images/r-19-VM-patchdisk -p >>>%template=domP%name=r-19-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0% >>>ga >>>teway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6cl >>>ou >>>d.internal%dhcprange=10.124.99.1%eth0ip=169.254.1.60%eth0mask=255.255.0. >>>0% >>>type=router%disable_rp_filter=true%dns1=10.193.17.1 >>> >>> >>>Any hint is appreciated. >>> >>>Thanks >>> >>> >>>Kambiz >>> >>> >>>Alena Prokharchyk wrote: Yes, Kambiz, you followed up right, and vm id=15 is the culprit. If vm id=15 is expunged, we have to clear out the reference to it from user_ip_address table. Here is the flow: 1) Save the db dump. 2) Run the query to cleanup the reference: Update user_ip_address set one_to_one_nat=0, instance_id=null where id= Let me know how it works. -Alena. On 3/24/14, 10:55 AM, "Kambiz Darabi" wrote: >Hi, > >I hope I have understood what you wrote and created the following >query >correctly: > >select uip.vm_id, uip.network_id, uip.public_ip_address, > n.state as nic_state, n.removed as nic_removed, > vm.state as vm_state, vm.removed as vm_removed >from user_ip_address uip > join nics n on uip.vm_id = n.instance_id > join vm_instance vm on uip.vm_id = vm.id >where uip.id in (Select ip_address_id from firewall_rules fr where >fr.network_id=205); > > >+---++---+--+- >-- >-- >+---++ >| vm_id | network_id | public_ip_address | nic_state| nic_removed >| vm_state | vm_removed | >+---++---+--+- >-- >-- >+---++ >| 6 |205 | 10.193.17.169 | Allocated| NULL >| Stopped | NULL | >|10 |205 | 10.193.17.136 | Allocated| NULL >| Stopped | NULL | >|12 |205 | 10.193.17.140 | Allocated| NULL >| Stopped | NULL | >|13 |205 | 10.193.17.141 | Allocated| NULL >| Stopped | NULL
Re: Virtual Router doesn't start
Hi, I looked up the startup command of the old router instance which worked correctly: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l r-7-VM -t all -d /var/lib/libvirt/images/r-7-VM-patchdisk -p %template=domP%name=r-7-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%gateway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6cloud.internal%dhcprange=10.124.99.1%eth1ip=169.254.2.46%eth1mask=255.255.0.0%type=router%disable_rp_filter=true%dns1=10.193.17.1 The new command (cf. below) doesn't have the parameters eth1ip and eth1mask. Thanks Kambiz Alena Prokharchyk wrote: > > I don’t think its relevant as the piece we’ve fixed, just eliminated > static nat rule programming for non-existing vm. Missing netmask on eth1 > doesn’t seem related to the problem (although we have to figure out why > its missing), as the connection that fails, happening to link local 169.x > eth0 interface. > > Edison, can you please tell us how to debug link local connection failure, > on KVM agent? > > Thank you, > Alena. > > On 3/24/14, 1:47 PM, "Kambiz Darabi" wrote: > >>Hi, >> >>thank you, the NullPointerException doesn't occur any more, but there >>still seems to be a problem during startup of the router. >> >>When I start the virtual router, it comes up, but in agent.log, there >>are lots of 'Could not connect to 169.254.1.x' messages. >> >>Then I logged into the virtual router to find out that the netmask of >>eth1 is missing in the interfaces file: >> >>root@host:~# virsh console r-19-VM >>Connected to domain r-19-VM >>Escape character is ^] >> >>Debian GNU/Linux 6.0 r-19-VM ttyS0 >> >>r-19-VM login: root >>... >>root@r-19-VM:~# cat /etc/network/interfaces >>auto lo eth0 eth1 eth2 >>iface lo inet loopback >> >>iface eth0 inet static >> address 169.254.1.59 >> netmask 255.255.0.0 >>iface eth1 inet static >> address 10.193.17.1 >> netmask >>iface eth2 inet static >> address 10.193.17.190 >> netmask 255.255.255.0 >> >>I don't know if it is relevant, but this is the line from agent.log >>where the parameters are visible: >> >>2014-03-24 21:36:17,681 DEBUG [kvm.resource.LibvirtComputingResource] >>(agentRequest-Handler-2:null) Executing: >>/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l >>r-19-VM -t all -d /var/lib/libvirt/images/r-19-VM-patchdisk -p >>%template=domP%name=r-19-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%ga >>teway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6clou >>d.internal%dhcprange=10.124.99.1%eth0ip=169.254.1.60%eth0mask=255.255.0.0% >>type=router%disable_rp_filter=true%dns1=10.193.17.1 >> >> >>Any hint is appreciated. >> >>Thanks >> >> >>Kambiz >> >> >>Alena Prokharchyk wrote: >>> >>> Yes, Kambiz, you followed up right, and vm id=15 is the culprit. If vm >>> id=15 is expunged, we have to clear out the reference to it from >>> user_ip_address table. Here is the flow: >>> >>> 1) Save the db dump. >>> 2) Run the query to cleanup the reference: >>> >>> Update user_ip_address set one_to_one_nat=0, instance_id=null where >>> id= >>> >>> >>> >>> Let me know how it works. >>> >>> -Alena. >>> >>> On 3/24/14, 10:55 AM, "Kambiz Darabi" wrote: >>> Hi, I hope I have understood what you wrote and created the following query correctly: select uip.vm_id, uip.network_id, uip.public_ip_address, n.state as nic_state, n.removed as nic_removed, vm.state as vm_state, vm.removed as vm_removed from user_ip_address uip join nics n on uip.vm_id = n.instance_id join vm_instance vm on uip.vm_id = vm.id where uip.id in (Select ip_address_id from firewall_rules fr where fr.network_id=205); +---++---+--+--- -- +---++ | vm_id | network_id | public_ip_address | nic_state| nic_removed | vm_state | vm_removed | +---++---+--+--- -- +---++ | 6 |205 | 10.193.17.169 | Allocated| NULL | Stopped | NULL | |10 |205 | 10.193.17.136 | Allocated| NULL | Stopped | NULL | |12 |205 | 10.193.17.140 | Allocated| NULL | Stopped | NULL | |13 |205 | 10.193.17.141 | Allocated| NULL | Stopped | NULL | |14 |205 | 10.193.17.142 | Allocated| NULL | Stopped | NULL | |15 |205 | 10.193.17.174 | Deallocating | 2014-03-18 23:00:53 | Expunging | NULL | |16 |205 | 10.193.17.103 | Allocated| NULL | Stopped | NULL | +---++---+--+--- -- +---++ Is VM id 15 what you are looking for? Thank you Kambiz
Re: Virtual Router doesn't start
I don’t think its relevant as the piece we’ve fixed, just eliminated static nat rule programming for non-existing vm. Missing netmask on eth1 doesn’t seem related to the problem (although we have to figure out why its missing), as the connection that fails, happening to link local 169.x eth0 interface. Edison, can you please tell us how to debug link local connection failure, on KVM agent? Thank you, Alena. On 3/24/14, 1:47 PM, "Kambiz Darabi" wrote: >Hi, > >thank you, the NullPointerException doesn't occur any more, but there >still seems to be a problem during startup of the router. > >When I start the virtual router, it comes up, but in agent.log, there >are lots of 'Could not connect to 169.254.1.x' messages. > >Then I logged into the virtual router to find out that the netmask of >eth1 is missing in the interfaces file: > >root@host:~# virsh console r-19-VM >Connected to domain r-19-VM >Escape character is ^] > >Debian GNU/Linux 6.0 r-19-VM ttyS0 > >r-19-VM login: root >... >root@r-19-VM:~# cat /etc/network/interfaces >auto lo eth0 eth1 eth2 >iface lo inet loopback > >iface eth0 inet static > address 169.254.1.59 > netmask 255.255.0.0 >iface eth1 inet static > address 10.193.17.1 > netmask >iface eth2 inet static > address 10.193.17.190 > netmask 255.255.255.0 > >I don't know if it is relevant, but this is the line from agent.log >where the parameters are visible: > >2014-03-24 21:36:17,681 DEBUG [kvm.resource.LibvirtComputingResource] >(agentRequest-Handler-2:null) Executing: >/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l >r-19-VM -t all -d /var/lib/libvirt/images/r-19-VM-patchdisk -p >%template=domP%name=r-19-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%ga >teway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6clou >d.internal%dhcprange=10.124.99.1%eth0ip=169.254.1.60%eth0mask=255.255.0.0% >type=router%disable_rp_filter=true%dns1=10.193.17.1 > > >Any hint is appreciated. > >Thanks > > >Kambiz > > >Alena Prokharchyk wrote: >> >> Yes, Kambiz, you followed up right, and vm id=15 is the culprit. If vm >> id=15 is expunged, we have to clear out the reference to it from >> user_ip_address table. Here is the flow: >> >> 1) Save the db dump. >> 2) Run the query to cleanup the reference: >> >> Update user_ip_address set one_to_one_nat=0, instance_id=null where >> id= >> >> >> >> Let me know how it works. >> >> -Alena. >> >> On 3/24/14, 10:55 AM, "Kambiz Darabi" wrote: >> >>>Hi, >>> >>>I hope I have understood what you wrote and created the following query >>>correctly: >>> >>>select uip.vm_id, uip.network_id, uip.public_ip_address, >>> n.state as nic_state, n.removed as nic_removed, >>> vm.state as vm_state, vm.removed as vm_removed >>>from user_ip_address uip >>> join nics n on uip.vm_id = n.instance_id >>> join vm_instance vm on uip.vm_id = vm.id >>>where uip.id in (Select ip_address_id from firewall_rules fr where >>>fr.network_id=205); >>> >>> >>>+---++---+--+--- >>>-- >>>+---++ >>>| vm_id | network_id | public_ip_address | nic_state| nic_removed >>>| vm_state | vm_removed | >>>+---++---+--+--- >>>-- >>>+---++ >>>| 6 |205 | 10.193.17.169 | Allocated| NULL >>>| Stopped | NULL | >>>|10 |205 | 10.193.17.136 | Allocated| NULL >>>| Stopped | NULL | >>>|12 |205 | 10.193.17.140 | Allocated| NULL >>>| Stopped | NULL | >>>|13 |205 | 10.193.17.141 | Allocated| NULL >>>| Stopped | NULL | >>>|14 |205 | 10.193.17.142 | Allocated| NULL >>>| Stopped | NULL | >>>|15 |205 | 10.193.17.174 | Deallocating | 2014-03-18 >>>23:00:53 | Expunging | NULL | >>>|16 |205 | 10.193.17.103 | Allocated| NULL >>>| Stopped | NULL | >>>+---++---+--+--- >>>-- >>>+---++ >>> >>>Is VM id 15 what you are looking for? >>> >>>Thank you >>> >>> >>>Kambiz >>> >>>Alena Prokharchyk wrote: Kambiz, can you please try one more thing. 1) Locate all the firewall rules for your guest network (205, right?) Select id, ip_address_id from firewall_rules where network_id=205; 2) Now get all static nat enabled ip addresses for those rules: Select vm_id, network_id from user_ip_address where id in (Select id, ip_address_id from firewall_rules where network_id=205); For each vmId/networkId combo, check if there is non-removed nic and non-expunged vm. There might be some incorrect static nat ip/vm reference referring to vm that is removed already. If you find any, let me know and I will tell you how to clean it up -Alena.
Re: Virtual Router doesn't start
Hi, thank you, the NullPointerException doesn't occur any more, but there still seems to be a problem during startup of the router. When I start the virtual router, it comes up, but in agent.log, there are lots of 'Could not connect to 169.254.1.x' messages. Then I logged into the virtual router to find out that the netmask of eth1 is missing in the interfaces file: root@host:~# virsh console r-19-VM Connected to domain r-19-VM Escape character is ^] Debian GNU/Linux 6.0 r-19-VM ttyS0 r-19-VM login: root ... root@r-19-VM:~# cat /etc/network/interfaces auto lo eth0 eth1 eth2 iface lo inet loopback iface eth0 inet static address 169.254.1.59 netmask 255.255.0.0 iface eth1 inet static address 10.193.17.1 netmask iface eth2 inet static address 10.193.17.190 netmask 255.255.255.0 I don't know if it is relevant, but this is the line from agent.log where the parameters are visible: 2014-03-24 21:36:17,681 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Executing: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l r-19-VM -t all -d /var/lib/libvirt/images/r-19-VM-patchdisk -p %template=domP%name=r-19-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%gateway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6cloud.internal%dhcprange=10.124.99.1%eth0ip=169.254.1.60%eth0mask=255.255.0.0%type=router%disable_rp_filter=true%dns1=10.193.17.1 Any hint is appreciated. Thanks Kambiz Alena Prokharchyk wrote: > > Yes, Kambiz, you followed up right, and vm id=15 is the culprit. If vm > id=15 is expunged, we have to clear out the reference to it from > user_ip_address table. Here is the flow: > > 1) Save the db dump. > 2) Run the query to cleanup the reference: > > Update user_ip_address set one_to_one_nat=0, instance_id=null where > id= > > > > Let me know how it works. > > -Alena. > > On 3/24/14, 10:55 AM, "Kambiz Darabi" wrote: > >>Hi, >> >>I hope I have understood what you wrote and created the following query >>correctly: >> >>select uip.vm_id, uip.network_id, uip.public_ip_address, >> n.state as nic_state, n.removed as nic_removed, >> vm.state as vm_state, vm.removed as vm_removed >>from user_ip_address uip >> join nics n on uip.vm_id = n.instance_id >> join vm_instance vm on uip.vm_id = vm.id >>where uip.id in (Select ip_address_id from firewall_rules fr where >>fr.network_id=205); >> >> >>+---++---+--+- >>+---++ >>| vm_id | network_id | public_ip_address | nic_state| nic_removed >>| vm_state | vm_removed | >>+---++---+--+- >>+---++ >>| 6 |205 | 10.193.17.169 | Allocated| NULL >>| Stopped | NULL | >>|10 |205 | 10.193.17.136 | Allocated| NULL >>| Stopped | NULL | >>|12 |205 | 10.193.17.140 | Allocated| NULL >>| Stopped | NULL | >>|13 |205 | 10.193.17.141 | Allocated| NULL >>| Stopped | NULL | >>|14 |205 | 10.193.17.142 | Allocated| NULL >>| Stopped | NULL | >>|15 |205 | 10.193.17.174 | Deallocating | 2014-03-18 >>23:00:53 | Expunging | NULL | >>|16 |205 | 10.193.17.103 | Allocated| NULL >>| Stopped | NULL | >>+---++---+--+- >>+---++ >> >>Is VM id 15 what you are looking for? >> >>Thank you >> >> >>Kambiz >> >>Alena Prokharchyk wrote: >>> >>> Kambiz, can you please try one more thing. >>> >>> 1) Locate all the firewall rules for your guest network (205, right?) >>> >>> Select id, ip_address_id from firewall_rules where network_id=205; >>> >>> 2) Now get all static nat enabled ip addresses for those rules: >>> >>> Select vm_id, network_id from user_ip_address where id in (Select id, >>> ip_address_id from firewall_rules where network_id=205); >>> >>> For each vmId/networkId combo, check if there is non-removed nic and >>> non-expunged vm. There might be some incorrect static nat ip/vm >>>reference >>> referring to vm that is removed already. If you find any, let me know >>>and >>> I will tell you how to clean it up >>> >>> -Alena. >>> >>> On 3/22/14, 5:41 AM, "Kambiz Darabi" wrote: >>> Hi Alena, thank you for your help. The query returns no rows, i.e. nics.removed was not null, but I removed the row though to see what happens: a new virtual router was created which also couldn't be started due to the same NPE. I reverted the change by restoring from the dump. I have to mention that prior to the restart, r-7-VM was the router which was used by my instances. I deleted the router using the UI after the first occurrence of the NPE, because a post with a similar problem suggested >>>
Re: Virtual Router doesn't start
Yes, Kambiz, you followed up right, and vm id=15 is the culprit. If vm id=15 is expunged, we have to clear out the reference to it from user_ip_address table. Here is the flow: 1) Save the db dump. 2) Run the query to cleanup the reference: Update user_ip_address set one_to_one_nat=0, instance_id=null where id= Let me know how it works. -Alena. On 3/24/14, 10:55 AM, "Kambiz Darabi" wrote: >Hi, > >I hope I have understood what you wrote and created the following query >correctly: > >select uip.vm_id, uip.network_id, uip.public_ip_address, > n.state as nic_state, n.removed as nic_removed, > vm.state as vm_state, vm.removed as vm_removed >from user_ip_address uip > join nics n on uip.vm_id = n.instance_id > join vm_instance vm on uip.vm_id = vm.id >where uip.id in (Select ip_address_id from firewall_rules fr where >fr.network_id=205); > > >+---++---+--+- >+---++ >| vm_id | network_id | public_ip_address | nic_state| nic_removed >| vm_state | vm_removed | >+---++---+--+- >+---++ >| 6 |205 | 10.193.17.169 | Allocated| NULL >| Stopped | NULL | >|10 |205 | 10.193.17.136 | Allocated| NULL >| Stopped | NULL | >|12 |205 | 10.193.17.140 | Allocated| NULL >| Stopped | NULL | >|13 |205 | 10.193.17.141 | Allocated| NULL >| Stopped | NULL | >|14 |205 | 10.193.17.142 | Allocated| NULL >| Stopped | NULL | >|15 |205 | 10.193.17.174 | Deallocating | 2014-03-18 >23:00:53 | Expunging | NULL | >|16 |205 | 10.193.17.103 | Allocated| NULL >| Stopped | NULL | >+---++---+--+- >+---++ > >Is VM id 15 what you are looking for? > >Thank you > > >Kambiz > >Alena Prokharchyk wrote: >> >> Kambiz, can you please try one more thing. >> >> 1) Locate all the firewall rules for your guest network (205, right?) >> >> Select id, ip_address_id from firewall_rules where network_id=205; >> >> 2) Now get all static nat enabled ip addresses for those rules: >> >> Select vm_id, network_id from user_ip_address where id in (Select id, >> ip_address_id from firewall_rules where network_id=205); >> >> For each vmId/networkId combo, check if there is non-removed nic and >> non-expunged vm. There might be some incorrect static nat ip/vm >>reference >> referring to vm that is removed already. If you find any, let me know >>and >> I will tell you how to clean it up >> >> -Alena. >> >> On 3/22/14, 5:41 AM, "Kambiz Darabi" wrote: >> >>>Hi Alena, >>> >>>thank you for your help. >>> >>>The query returns no rows, i.e. nics.removed was not null, but I removed >>>the row though to see what happens: a new virtual router was created >>>which also couldn't be started due to the same NPE. I reverted the >>>change by restoring from the dump. >>> >>>I have to mention that prior to the restart, r-7-VM was the router which >>>was used by my instances. I deleted the router using the UI after the >>>first >>>occurrence of the NPE, because a post with a similar problem suggested >>>that the deleted router would be recreated again (and this procedure >>>solved the problem). >>> >>>Below I have attached the state of the two tables. >>> >>>Anything else I can try? >>> >>>Thank you >>> >>> >>>Kambiz >>> >>>mysql> select n.id, n.removed, n.ip4_address, n.netmask, n.gateway, >>>n.ip_type, n.reserver_name, n.network_id, i.id as instance_id, i.name, >>>i.state, i.type from vm_instance i join nics n on n.instance_id = i.id >>>where i.type = 'DomainRouter'; >>>++-+---+---+ >>>-+ >>>-+--++-+ >>>-+ >>>---+--+ >>>| id | removed | ip4_address | netmask | gateway >>>| ip_type | reserver_name| network_id | instance_id | name >>>| state | type | >>>++-+---+---+ >>>-+ >>>-+--++-+ >>>-+ >>>---+--+ >>>| 9 | 2014-03-17 11:27:58 | 10.124.99.1 | 255.255.255.0 | NULL >>>| NULL| ExternalGuestNetworkGuru |204 | 4 | r-4-VM >>>| Expunging | DomainRouter | >>>| 10 | 2014-03-17 11:27:58 | NULL | NULL | NULL >>>| NULL| ControlNetworkGuru |202 | 4 | r-4-VM >>>| Expunging | DomainRouter | >>>| 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1 >>>| NULL| PublicNetworkGuru|200 | 4 | r-4-VM >>>| Expunging | DomainRouter | >>>| 14 | 2014-03-17 11:27:52 | 10.124.99.1 |
Re: Virtual Router doesn't start
Hi, I hope I have understood what you wrote and created the following query correctly: select uip.vm_id, uip.network_id, uip.public_ip_address, n.state as nic_state, n.removed as nic_removed, vm.state as vm_state, vm.removed as vm_removed from user_ip_address uip join nics n on uip.vm_id = n.instance_id join vm_instance vm on uip.vm_id = vm.id where uip.id in (Select ip_address_id from firewall_rules fr where fr.network_id=205); +---++---+--+-+---++ | vm_id | network_id | public_ip_address | nic_state| nic_removed | vm_state | vm_removed | +---++---+--+-+---++ | 6 |205 | 10.193.17.169 | Allocated| NULL| Stopped | NULL | |10 |205 | 10.193.17.136 | Allocated| NULL| Stopped | NULL | |12 |205 | 10.193.17.140 | Allocated| NULL| Stopped | NULL | |13 |205 | 10.193.17.141 | Allocated| NULL| Stopped | NULL | |14 |205 | 10.193.17.142 | Allocated| NULL| Stopped | NULL | |15 |205 | 10.193.17.174 | Deallocating | 2014-03-18 23:00:53 | Expunging | NULL | |16 |205 | 10.193.17.103 | Allocated| NULL| Stopped | NULL | +---++---+--+-+---++ Is VM id 15 what you are looking for? Thank you Kambiz Alena Prokharchyk wrote: > > Kambiz, can you please try one more thing. > > 1) Locate all the firewall rules for your guest network (205, right?) > > Select id, ip_address_id from firewall_rules where network_id=205; > > 2) Now get all static nat enabled ip addresses for those rules: > > Select vm_id, network_id from user_ip_address where id in (Select id, > ip_address_id from firewall_rules where network_id=205); > > For each vmId/networkId combo, check if there is non-removed nic and > non-expunged vm. There might be some incorrect static nat ip/vm reference > referring to vm that is removed already. If you find any, let me know and > I will tell you how to clean it up > > -Alena. > > On 3/22/14, 5:41 AM, "Kambiz Darabi" wrote: > >>Hi Alena, >> >>thank you for your help. >> >>The query returns no rows, i.e. nics.removed was not null, but I removed >>the row though to see what happens: a new virtual router was created >>which also couldn't be started due to the same NPE. I reverted the >>change by restoring from the dump. >> >>I have to mention that prior to the restart, r-7-VM was the router which >>was used by my instances. I deleted the router using the UI after the >>first >>occurrence of the NPE, because a post with a similar problem suggested >>that the deleted router would be recreated again (and this procedure >>solved the problem). >> >>Below I have attached the state of the two tables. >> >>Anything else I can try? >> >>Thank you >> >> >>Kambiz >> >>mysql> select n.id, n.removed, n.ip4_address, n.netmask, n.gateway, >>n.ip_type, n.reserver_name, n.network_id, i.id as instance_id, i.name, >>i.state, i.type from vm_instance i join nics n on n.instance_id = i.id >>where i.type = 'DomainRouter'; >>++-+---+---+-+ >>-+--++-+-+ >>---+--+ >>| id | removed | ip4_address | netmask | gateway >>| ip_type | reserver_name| network_id | instance_id | name >>| state | type | >>++-+---+---+-+ >>-+--++-+-+ >>---+--+ >>| 9 | 2014-03-17 11:27:58 | 10.124.99.1 | 255.255.255.0 | NULL >>| NULL| ExternalGuestNetworkGuru |204 | 4 | r-4-VM >>| Expunging | DomainRouter | >>| 10 | 2014-03-17 11:27:58 | NULL | NULL | NULL >>| NULL| ControlNetworkGuru |202 | 4 | r-4-VM >>| Expunging | DomainRouter | >>| 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1 >>| NULL| PublicNetworkGuru|200 | 4 | r-4-VM >>| Expunging | DomainRouter | >>| 14 | 2014-03-17 11:27:52 | 10.124.99.1 | 255.255.255.0 | NULL >>| NULL| ExternalGuestNetworkGuru |205 | 7 | r-7-VM >>| Expunging | DomainRouter | >>| 15 | 2014-03-17 11:27:52 | NULL | NULL | NULL >>| NULL| ControlNetworkGuru |202 | 7 | r-7-VM >>| Expunging | DomainRouter | >>| 16 | 2014-03-17 11:27:52 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 >>| NULL| PublicNetworkGuru|200 | 7 | r-7-VM >>| Expun
Re: Virtual Router doesn't start
Kambiz, can you please try one more thing. 1) Locate all the firewall rules for your guest network (205, right?) Select id, ip_address_id from firewall_rules where network_id=205; 2) Now get all static nat enabled ip addresses for those rules: Select vm_id, network_id from user_ip_address where id in (Select id, ip_address_id from firewall_rules where network_id=205); For each vmId/networkId combo, check if there is non-removed nic and non-expunged vm. There might be some incorrect static nat ip/vm reference referring to vm that is removed already. If you find any, let me know and I will tell you how to clean it up -Alena. On 3/22/14, 5:41 AM, "Kambiz Darabi" wrote: >Hi Alena, > >thank you for your help. > >The query returns no rows, i.e. nics.removed was not null, but I removed >the row though to see what happens: a new virtual router was created >which also couldn't be started due to the same NPE. I reverted the >change by restoring from the dump. > >I have to mention that prior to the restart, r-7-VM was the router which >was used by my instances. I deleted the router using the UI after the >first >occurrence of the NPE, because a post with a similar problem suggested >that the deleted router would be recreated again (and this procedure >solved the problem). > >Below I have attached the state of the two tables. > >Anything else I can try? > >Thank you > > >Kambiz > >mysql> select n.id, n.removed, n.ip4_address, n.netmask, n.gateway, >n.ip_type, n.reserver_name, n.network_id, i.id as instance_id, i.name, >i.state, i.type from vm_instance i join nics n on n.instance_id = i.id >where i.type = 'DomainRouter'; >++-+---+---+-+ >-+--++-+-+ >---+--+ >| id | removed | ip4_address | netmask | gateway >| ip_type | reserver_name| network_id | instance_id | name >| state | type | >++-+---+---+-+ >-+--++-+-+ >---+--+ >| 9 | 2014-03-17 11:27:58 | 10.124.99.1 | 255.255.255.0 | NULL >| NULL| ExternalGuestNetworkGuru |204 | 4 | r-4-VM >| Expunging | DomainRouter | >| 10 | 2014-03-17 11:27:58 | NULL | NULL | NULL >| NULL| ControlNetworkGuru |202 | 4 | r-4-VM >| Expunging | DomainRouter | >| 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1 >| NULL| PublicNetworkGuru|200 | 4 | r-4-VM >| Expunging | DomainRouter | >| 14 | 2014-03-17 11:27:52 | 10.124.99.1 | 255.255.255.0 | NULL >| NULL| ExternalGuestNetworkGuru |205 | 7 | r-7-VM >| Expunging | DomainRouter | >| 15 | 2014-03-17 11:27:52 | NULL | NULL | NULL >| NULL| ControlNetworkGuru |202 | 7 | r-7-VM >| Expunging | DomainRouter | >| 16 | 2014-03-17 11:27:52 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 >| NULL| PublicNetworkGuru|200 | 7 | r-7-VM >| Expunging | DomainRouter | >| 26 | 2014-03-18 08:11:16 | 10.124.99.1 | 255.255.255.0 | NULL >| NULL| ExternalGuestNetworkGuru |205 | 18 | r-18-VM >| Expunging | DomainRouter | >| 27 | 2014-03-18 08:11:16 | NULL | NULL | NULL >| NULL| ControlNetworkGuru |202 | 18 | r-18-VM >| Expunging | DomainRouter | >| 28 | 2014-03-18 08:11:16 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 >| NULL| PublicNetworkGuru|200 | 18 | r-18-VM >| Expunging | DomainRouter | >| 29 | NULL| 10.124.99.1 | 255.255.255.0 | NULL >| NULL| ExternalGuestNetworkGuru |205 | 19 | r-19-VM >| Stopped | DomainRouter | >| 30 | NULL| NULL | NULL | NULL >| NULL| ControlNetworkGuru |202 | 19 | r-19-VM >| Stopped | DomainRouter | >| 31 | NULL| 10.193.17.190 | 255.255.255.0 | 10.193.17.1 >| NULL| PublicNetworkGuru|200 | 19 | r-19-VM >| Stopped | DomainRouter | >++-+---+---+-+ >-+--++-+-+ >---+--+ > >mysql> select * from router_network_ref; >++---+++ >| id | router_id | network_id | guest_type | >++---+++ >| 1 | 4 |204 | Isolated | >| 2 | 7 |205 | Isolated | >| 3 |18 |205 | Isolated | >| 4 |19 |205 | Isolated | >++---+++ > > > >Alena Prokharchyk wrote: >> >> The error happens not because Ip is null, but because the nic in a >>certain >> network can¹t be foun
Re: Virtual Router doesn't start
Hi Alena, thank you for your help. The query returns no rows, i.e. nics.removed was not null, but I removed the row though to see what happens: a new virtual router was created which also couldn't be started due to the same NPE. I reverted the change by restoring from the dump. I have to mention that prior to the restart, r-7-VM was the router which was used by my instances. I deleted the router using the UI after the first occurrence of the NPE, because a post with a similar problem suggested that the deleted router would be recreated again (and this procedure solved the problem). Below I have attached the state of the two tables. Anything else I can try? Thank you Kambiz mysql> select n.id, n.removed, n.ip4_address, n.netmask, n.gateway, n.ip_type, n.reserver_name, n.network_id, i.id as instance_id, i.name, i.state, i.type from vm_instance i join nics n on n.instance_id = i.id where i.type = 'DomainRouter'; ++-+---+---+-+-+--++-+-+---+--+ | id | removed | ip4_address | netmask | gateway | ip_type | reserver_name| network_id | instance_id | name| state | type | ++-+---+---+-+-+--++-+-+---+--+ | 9 | 2014-03-17 11:27:58 | 10.124.99.1 | 255.255.255.0 | NULL| NULL | ExternalGuestNetworkGuru |204 | 4 | r-4-VM | Expunging | DomainRouter | | 10 | 2014-03-17 11:27:58 | NULL | NULL | NULL| NULL | ControlNetworkGuru |202 | 4 | r-4-VM | Expunging | DomainRouter | | 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1 | NULL | PublicNetworkGuru|200 | 4 | r-4-VM | Expunging | DomainRouter | | 14 | 2014-03-17 11:27:52 | 10.124.99.1 | 255.255.255.0 | NULL| NULL | ExternalGuestNetworkGuru |205 | 7 | r-7-VM | Expunging | DomainRouter | | 15 | 2014-03-17 11:27:52 | NULL | NULL | NULL| NULL | ControlNetworkGuru |202 | 7 | r-7-VM | Expunging | DomainRouter | | 16 | 2014-03-17 11:27:52 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL | PublicNetworkGuru|200 | 7 | r-7-VM | Expunging | DomainRouter | | 26 | 2014-03-18 08:11:16 | 10.124.99.1 | 255.255.255.0 | NULL| NULL | ExternalGuestNetworkGuru |205 | 18 | r-18-VM | Expunging | DomainRouter | | 27 | 2014-03-18 08:11:16 | NULL | NULL | NULL| NULL | ControlNetworkGuru |202 | 18 | r-18-VM | Expunging | DomainRouter | | 28 | 2014-03-18 08:11:16 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL | PublicNetworkGuru|200 | 18 | r-18-VM | Expunging | DomainRouter | | 29 | NULL| 10.124.99.1 | 255.255.255.0 | NULL| NULL | ExternalGuestNetworkGuru |205 | 19 | r-19-VM | Stopped | DomainRouter | | 30 | NULL| NULL | NULL | NULL| NULL | ControlNetworkGuru |202 | 19 | r-19-VM | Stopped | DomainRouter | | 31 | NULL| 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL | PublicNetworkGuru|200 | 19 | r-19-VM | Stopped | DomainRouter | ++-+---+---+-+-+--++-+-+---+--+ mysql> select * from router_network_ref; ++---+++ | id | router_id | network_id | guest_type | ++---+++ | 1 | 4 |204 | Isolated | | 2 | 7 |205 | Isolated | | 3 |18 |205 | Isolated | | 4 |19 |205 | Isolated | ++---+++ Alena Prokharchyk wrote: > > The error happens not because Ip is null, but because the nic in a certain > network can¹t be found. Looks like there is some bug in VPC nic > plug/unplug for Guest networks process. > > Kambiz, please do the following to fix it: > > 1) Stop the MS > 2) Take the DB dump of cloud db in case you have to revert back. > 3) Run the query: > > select * from router_network_ref where router_id= network_id not in (select network_id from nics where instance_id= your VR> and removed is null); > > It will give you the list of networks refs that somehow weren¹t cleaned > during the nic detach. Remove the entry returned from router_network_ref > table. > > Let me know how it works. > > -Alena. > > > On 3/21/14, 3:36 PM, "Kambiz Darabi" wrote: > >>Hello, >> >>as this is my first post to t
Re: Virtual Router doesn't start
The error happens not because Ip is null, but because the nic in a certain network can¹t be found. Looks like there is some bug in VPC nic plug/unplug for Guest networks process. Kambiz, please do the following to fix it: 1) Stop the MS 2) Take the DB dump of cloud db in case you have to revert back. 3) Run the query: select * from router_network_ref where router_id= and removed is null); It will give you the list of networks refs that somehow weren¹t cleaned during the nic detach. Remove the entry returned from router_network_ref table. Let me know how it works. -Alena. On 3/21/14, 3:36 PM, "Kambiz Darabi" wrote: >Hello, > >as this is my first post to the list, I would like to thank all >contributors for Cloudstack which I use since last fall without any >problems. I run 4.1.1 with KVM and advanced networking. > >After a restart of the management server (stopping and starting the java >process), the virtual domain router doesn't start and >management-server.log shows a NullPointerException in >NetworkModelImpl.getIpInNetwork (cf. stack trace below). > >By putting the server in debug mode and remote debugging, I found out >that the reason is a row in the table nics which has NULL in ip (cf. row >with id 30 in the result of the select statement below). > >What can I do to quickly solve this problem? Any pointers or suggestions >are appreciated as the system is currently unusable. > >Thank you for your help > > >Kambiz > > >management-server.log: > >2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Asking VirtualRouter to prepare for >Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] >2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Asking Ovs to prepare for >Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] >2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for >Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] >2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for >Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] >2014-03-18 10:03:27,151 WARN [network.element.VpcVirtualRouterElement] >(Job-Executor-1:job-176) Network Ntwk[205|Guest|8] is not associated with >any VPC >2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Asking NiciraNvp to prepare for >Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] >2014-03-18 10:03:27,151 DEBUG [network.element.NiciraNvpElement] >(Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service >Connectivity on network net1 >2014-03-18 10:03:27,153 DEBUG [cloud.network.NetworkModelImpl] >(Job-Executor-1:job-176) Service SecurityGroup is not supported in the >network id=205 >2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Lock is acquired for network id 202 as a part of >network implement >2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Network id=202 is already implemented >2014-03-18 10:03:27,157 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Lock is released for network id 202 as a part of >network implement >2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Asking VirtualRouter to prepare for >Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] >2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Asking Ovs to prepare for >Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] >2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for >Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] >2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for >Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] >2014-03-18 10:03:27,187 WARN [network.element.VpcVirtualRouterElement] >(Job-Executor-1:job-176) Network Ntwk[202|Control|3] is not associated >with any VPC >2014-03-18 10:03:27,188 DEBUG [cloud.network.NetworkManagerImpl] >(Job-Executor-1:job-176) Asking NiciraNvp to prepare for >Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] >2014-03-18 10:03:27,188 DEBUG [network.element.NiciraNvpElement] >(Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service >Connectivity on network null >2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl] >(Job-Executor-1:job-176) Checking if we need to prepare 1 volumes for >VM[DomainRouter|r-19-VM] >2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl] >(Job-Executor-1:job-176) No need to recreate the volume: >Vol[24|vm=19|ROOT], since it already has a pool assigned: 200, adding >disk to VM
Virtual Router doesn't start
Hello, as this is my first post to the list, I would like to thank all contributors for Cloudstack which I use since last fall without any problems. I run 4.1.1 with KVM and advanced networking. After a restart of the management server (stopping and starting the java process), the virtual domain router doesn't start and management-server.log shows a NullPointerException in NetworkModelImpl.getIpInNetwork (cf. stack trace below). By putting the server in debug mode and remote debugging, I found out that the reason is a row in the table nics which has NULL in ip (cf. row with id 30 in the result of the select statement below). What can I do to quickly solve this problem? Any pointers or suggestions are appreciated as the system is currently unusable. Thank you for your help Kambiz management-server.log: 2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking VirtualRouter to prepare for Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] 2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking Ovs to prepare for Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] 2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] 2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] 2014-03-18 10:03:27,151 WARN [network.element.VpcVirtualRouterElement] (Job-Executor-1:job-176) Network Ntwk[205|Guest|8] is not associated with any VPC 2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking NiciraNvp to prepare for Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1] 2014-03-18 10:03:27,151 DEBUG [network.element.NiciraNvpElement] (Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service Connectivity on network net1 2014-03-18 10:03:27,153 DEBUG [cloud.network.NetworkModelImpl] (Job-Executor-1:job-176) Service SecurityGroup is not supported in the network id=205 2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Lock is acquired for network id 202 as a part of network implement 2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Network id=202 is already implemented 2014-03-18 10:03:27,157 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Lock is released for network id 202 as a part of network implement 2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking VirtualRouter to prepare for Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] 2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking Ovs to prepare for Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] 2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] 2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] 2014-03-18 10:03:27,187 WARN [network.element.VpcVirtualRouterElement] (Job-Executor-1:job-176) Network Ntwk[202|Control|3] is not associated with any VPC 2014-03-18 10:03:27,188 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-1:job-176) Asking NiciraNvp to prepare for Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99] 2014-03-18 10:03:27,188 DEBUG [network.element.NiciraNvpElement] (Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service Connectivity on network null 2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-1:job-176) Checking if we need to prepare 1 volumes for VM[DomainRouter|r-19-VM] 2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-1:job-176) No need to recreate the volume: Vol[24|vm=19|ROOT], since it already has a pool assigned: 200, adding disk to VM 2014-03-18 10:03:27,224 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-1:job-176) Boot Args for VM[DomainRouter|r-19-VM]: template=domP name=r-19-VM eth2ip=10.193.17.190 eth2mask=255.255.255.0 gateway=10.193.17.1 eth0ip=10.124.99.1 eth0mask=255.255.255.0 domain=cs6cloud.internal dhcprange=10.124.99.1 eth0ip=169.254.3.99 eth0mask=255.255.0.0 type=router disable_rp_filter=true dns1=10.193.17.1 2014-03-18 10:03:27,343 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-1:job-176) Found 8 ip(s) to apply as a part of domR VM[DomainRouter|r-19-VM] start. 2014-03-18 10:03:27,415 DEBUG [network.router.VirtualNetworkApplianceM