Re: Virtual Router doesn't start

2014-03-25 Thread Kambiz Darabi
I executed

> update nics set device_id = 1 where id = 29;

After restarting the router, the interfaces file now looks like this:

root@r-19-VM:~# cat /etc/network/interfaces 
auto lo eth0 eth1 eth2
iface lo inet loopback

iface  eth0 inet static
  address 169.254.3.155 
  netmask 255.255.0.0
iface  eth1 inet static
  address 10.124.99.1 
  netmask 255.255.255.0
iface  eth2 inet static
  address 10.193.17.190 
  netmask 255.255.255.0

ifconfig shows this:

root@r-19-VM:~# ifconfig
eth0  Link encap:Ethernet  HWaddr 0e:00:a9:fe:03:9b  
  inet addr:169.254.3.155  Bcast:169.254.255.255  Mask:255.255.0.0
  inet6 addr: fe80::c00:a9ff:fefe:39b/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:61 errors:0 dropped:0 overruns:0 frame:0
  TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:3930 (3.8 KiB)  TX bytes:730 (730.0 B)

eth1  Link encap:Ethernet  HWaddr 02:00:2a:43:00:0d  
  inet addr:10.124.99.1  Bcast:10.124.99.255  Mask:255.255.255.0
  inet6 addr: fe80::2aff:fe43:d/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:12 errors:0 dropped:0 overruns:0 frame:0
  TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:936 (936.0 B)  TX bytes:318 (318.0 B)

eth2  Link encap:Ethernet  HWaddr 06:7e:fe:00:00:bf  
  inet addr:10.193.17.190  Bcast:10.193.17.255  Mask:255.255.255.0
  inet6 addr: fe80::47e:feff:fe00:bf/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:11 errors:0 dropped:0 overruns:0 frame:0
  TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:846 (846.0 B)  TX bytes:696 (696.0 B)

>From inside the VM, I can ping the gateways of the public, guest and
control network:

root@r-19-VM:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric RefUse Iface
10.124.99.0 0.0.0.0 255.255.255.0   U 0  00 eth1
10.193.17.0 0.0.0.0 255.255.255.0   U 0  00 eth2
169.254.0.0 0.0.0.0 255.255.0.0 U 0  00 eth0
0.0.0.0 10.193.17.1 0.0.0.0 UG0  00 eth2

root@r-19-VM:~# ping 10.193.17.1
PING 10.193.17.1 (10.193.17.1): 56 data bytes
64 bytes from 10.193.17.1: icmp_seq=0 ttl=64 time=1.194 ms
64 bytes from 10.193.17.1: icmp_seq=1 ttl=64 time=0.329 ms

root@r-19-VM:~# ping 10.124.99.1
PING 10.124.99.1 (10.124.99.1): 56 data bytes
64 bytes from 10.124.99.1: icmp_seq=0 ttl=64 time=0.128 ms

root@r-19-VM:/etc/init.d# ping 169.254.0.1
PING 169.254.0.1 (169.254.0.1): 56 data bytes
64 bytes from 169.254.0.1: icmp_seq=0 ttl=64 time=0.292 ms

And from outside, I can ping the different IPs of the router.

But what is strange, is that in agent.log, I still find

Ping command port, 169.254.3.155:3922
Trying to connect to 169.254.3.155
Could not connect to 169.254.3.155

And when I check on the router, the ssh daemon only listens on the guest
network interface:

# netstat -na | grep 3922
tcp0  0 10.124.99.1:39220.0.0.0:*   LISTEN

So, the connection attempt to 169.254.3.155:3922 fails:

telnet 169.254.3.155 3922
Trying 169.254.3.155...

Is that the normal situation?

> Also after which point you started experiencing all this problems? Did
> you upgrade to new CS version? Or does it fail for any specific network?

No, I didn't upgrade to a new CS, I just stopped and started the
management-server.

Thanks


Kambiz

Alena Prokharchyk  wrote:
> 
> Kambiz, did you check the device id in nics table for the vm? If it has 2
> 0s, change one of them to the correct value and restart. If the testing
> completes fine, we have a proof that its related to the device id mix up.
> If not, there is gotta be something else, most likely misconfigured on KVM
> stuff.
>
> Also after which point you started experiencing all this problems? Did
> you upgrade to new CS version? Or does it fail for any specific network?
>
> -Alena.
>
> On 3/25/14, 2:11 PM, "Kambiz Darabi"  wrote:
>
>>I updated nics.gateway for that network, but the VM still shows the same
>>behaviour.
>>
>>If one compares interfaces:
>>
>>root@r-19-VM:~# cat /etc/network/interfaces
>>auto lo eth0 eth1 eth2
>>iface lo inet loopback
>>
>>iface  eth0 inet static
>>  address 169.254.1.242
>>  netmask 255.255.0.0
>>iface  eth1 inet static
>>  address 10.193.17.1
>>  netmask 
>>iface  eth2 inet static
>>  address 10.193.17.190
>>  netmask 255.255.255.0
>>
>>and the nics entry in the management-server.log (cf. below), one can see
>>that eth0 is the second nic with deviceId 0 of type 'Control', eth2 is
>>correctly set up with IP 10.193.17.190.
>>
>>The remaining nic is eth1 which corresponds to the first nic with
>>deviceId 0 and ac

Re: Virtual Router doesn't start

2014-03-25 Thread Kambiz Darabi
I updated nics.gateway for that network, but the VM still shows the same
behaviour.

If one compares interfaces:

root@r-19-VM:~# cat /etc/network/interfaces 
auto lo eth0 eth1 eth2
iface lo inet loopback

iface  eth0 inet static
  address 169.254.1.242 
  netmask 255.255.0.0
iface  eth1 inet static
  address 10.193.17.1 
  netmask 
iface  eth2 inet static
  address 10.193.17.190 
  netmask 255.255.255.0

and the nics entry in the management-server.log (cf. below), one can see
that eth0 is the second nic with deviceId 0 of type 'Control', eth2 is
correctly set up with IP 10.193.17.190.

The remaining nic is eth1 which corresponds to the first nic with
deviceId 0 and according to the nics entry should have IP 10.124.99.1,
but there is no iface entry with that IP, but eth1 has the gateway
address of the public nic as its IP address.

Could the problem have something to do with the duplicate deviceId 0?

Thanks


Kambiz

{ "nics":
  [{"deviceId":2,
"networkRateMbps":200,
"defaultNic":true,
"uuid":"22c19454-fd05-45c8-af6b-5f0ef073f86c",
"ip":"10.193.17.190",
"netmask":"255.255.255.0",
"gateway":"10.193.17.1",
"mac":"06:7e:fe:00:00:bf",
"dns1":"10.193.17.1",
"broadcastType":"Vlan",
"type":"Public",
"broadcastUri":"vlan://untagged",
"isolationUri":"vlan://untagged",
"isSecurityGroupEnabled":false,
"name":"cloudbr0"},
   {"deviceId":0,
"networkRateMbps":200,
"defaultNic":false,
"uuid":"6c5a8337-620e-49eb-9309-cdfc7039d4a8",
"ip":"10.124.99.1",
"netmask":"255.255.255.0",
"gateway":"10.124.99.1",
"mac":"02:00:2a:43:00:0d",
"dns1":"10.193.17.1",
"broadcastType":"Vlan",
"type":"Guest",
"broadcastUri":"vlan://3925",
"isolationUri":"vlan://3925",
"isSecurityGroupEnabled":false,
"name":"cloudbr1"},
   {"deviceId":0,
"networkRateMbps":-1,
"defaultNic":false,
"uuid":"cabd4cd9-c39f-423f-ad6a-ee3affe0bd9d",
"ip":"169.254.1.242",
"netmask":"255.255.0.0",
"gateway":"169.254.0.1",
"mac":"0e:00:a9:fe:01:f2",
"broadcastType":"LinkLocal",
"type":"Control",
"isSecurityGroupEnabled":false}
  ]
}

Alena Prokharchyk  wrote:
> 
> So the gateway wasn’t set for the nic only.
>
> Kambiz, just to quickly test it, can you set missing gateway for the nic,
> and stop/start the VR? And see if the start is completed normally, just to
> find out if the missing gateway was the reason of the communication failure
>
> On 3/25/14, 1:31 PM, "Kambiz Darabi"  wrote:
>
>>Hi,
>>
>>select id, name, traffic_type, broadcast_domain_type, cidr, gateway,
>>mode, state, removed from networks where id = 205;
>>+-+-+--+---++-
>>+--+---+-+
>>| id  | name| traffic_type | broadcast_domain_type | cidr   |
>>gateway | mode | state | removed |
>>+-+-+--+---++-
>>+--+---+-+
>>| 205 | default | Guest| Vlan  | 10.124.99.0/24 |
>>10.124.99.1 | Dhcp | Allocated | NULL|
>>+-+-+--+---++-
>>+--+---+-+
>>
>>Cheers
>>
>>
>>Kambiz
>>
>>Alena Prokharchyk  wrote:
>>> 
>>> No, it doesn’t seem right to me having 2 nics with device id 0. But
>>>looks
>>> like they’ve got programmed to correct devices on the backend per your
>>> prev email? 
>>>
>>> iface  eth0 inet static
>>>   address 169.254.1.59
>>>   netmask 255.255.0.0
>>>
>>> iface  eth1 inet static
>>>   address 10.193.17.1
>>>   Netmask
>>>
>>> iface  eth2 inet static
>>>   address 10.193.17.190
>>>   netmask 255.255.255.0
>>>
>>>
>>>
>>> I can see that only one parameter is missing from the start command, the
>>> second nic (network id=205) doesn’t have the gateway.
>>> From the command/DB, I see that the gateway is missing in the nics table
>>> for the network 205? Can you check gateway information in the networks
>>> table for the id=205
>>>
>>>
>>>
>>>
>>> On 3/25/14, 1:01 PM, "Kambiz Darabi"  wrote:
>>>
Hi,

select 
id,ip4_address,netmask,gateway,state,removed,network_id,reserver_name
from nics where instance_id=19;

++---+---+-+---+
-+
+--+
| id | ip4_address   | netmask   | gateway | state | removed
| network_id | reserver_name|
++---+---+-+---+
-+
+--+
| 29 | 10.124.99.1   | 255.255.255.0 | NULL| Allocated | NULL
|205 | ExternalGuestNetworkGuru |
| 30 | NULL  | NULL  | NULL| Allocated | NULL
|202 | ControlNetworkGuru   |
| 31 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | Allocated | NULL
|200 | PublicNetwork

Re: Virtual Router doesn't start

2014-03-25 Thread Kambiz Darabi
Hi,

select id, name, traffic_type, broadcast_domain_type, cidr, gateway, mode, 
state, removed from networks where id = 205;
+-+-+--+---++-+--+---+-+
| id  | name| traffic_type | broadcast_domain_type | cidr   | 
gateway | mode | state | removed |
+-+-+--+---++-+--+---+-+
| 205 | default | Guest| Vlan  | 10.124.99.0/24 | 
10.124.99.1 | Dhcp | Allocated | NULL|
+-+-+--+---++-+--+---+-+

Cheers


Kambiz

Alena Prokharchyk  wrote:
> 
> No, it doesn’t seem right to me having 2 nics with device id 0. But looks
> like they’ve got programmed to correct devices on the backend per your
> prev email? 
>
> iface  eth0 inet static
>   address 169.254.1.59
>   netmask 255.255.0.0
>
> iface  eth1 inet static
>   address 10.193.17.1
>   Netmask
>
> iface  eth2 inet static
>   address 10.193.17.190
>   netmask 255.255.255.0
>
>
>
> I can see that only one parameter is missing from the start command, the
> second nic (network id=205) doesn’t have the gateway.
> From the command/DB, I see that the gateway is missing in the nics table
> for the network 205? Can you check gateway information in the networks
> table for the id=205
>
>
>
>
> On 3/25/14, 1:01 PM, "Kambiz Darabi"  wrote:
>
>>Hi,
>>
>>select 
>>id,ip4_address,netmask,gateway,state,removed,network_id,reserver_name
>>from nics where instance_id=19;
>>
>>++---+---+-+---+-+
>>+--+
>>| id | ip4_address   | netmask   | gateway | state | removed
>>| network_id | reserver_name|
>>++---+---+-+---+-+
>>+--+
>>| 29 | 10.124.99.1   | 255.255.255.0 | NULL| Allocated | NULL
>>|205 | ExternalGuestNetworkGuru |
>>| 30 | NULL  | NULL  | NULL| Allocated | NULL
>>|202 | ControlNetworkGuru   |
>>| 31 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | Allocated | NULL
>>|200 | PublicNetworkGuru|
>>++---+---+-+---+-+
>>+--+
>>
>>and this is the nics element from the StartCmd. Is it normal to have
>>two nics with deviceId 0?
>>
>>"nics":[
>>{"deviceId":2,
>> "networkRateMbps":200,
>> "defaultNic":true,
>> "uuid":"22c19454-fd05-45c8-af6b-5f0ef073f86c",
>> "ip":"10.193.17.190",
>> "netmask":"255.255.255.0",
>> "gateway":"10.193.17.1",
>> "mac":"06:7e:fe:00:00:bf",
>> "dns1":"10.193.17.1",
>> "broadcastType":"Vlan",
>> "type":"Public",
>> "broadcastUri":"vlan://untagged",
>> "isolationUri":"vlan://untagged",
>> "isSecurityGroupEnabled":false,
>> "name":"cloudbr0"},
>>{"deviceId":0,
>> "networkRateMbps":200,
>> "defaultNic":false,
>> "uuid":"6c5a8337-620e-49eb-9309-cdfc7039d4a8",
>> "ip":"10.124.99.1",
>> "netmask":"255.255.255.0",
>> "mac":"02:00:2a:43:00:0d",
>> "dns1":"10.193.17.1",
>> "broadcastType":"Vlan",
>> "type":"Guest",
>> "broadcastUri":"vlan://3949",
>> "isolationUri":"vlan://3949",
>> "isSecurityGroupEnabled":false,
>> "name":"cloudbr1"},
>>{"deviceId":0,
>> "networkRateMbps":-1,
>> "defaultNic":false,
>> "uuid":"cabd4cd9-c39f-423f-ad6a-ee3affe0bd9d",
>> "ip":"169.254.1.59",
>> "netmask":"255.255.0.0",
>> "gateway":"169.254.0.1",
>> "mac":"0e:00:a9:fe:01:3b",
>> "broadcastType":"LinkLocal",
>> "type":"Control",
>> "isSecurityGroupEnabled":false}
>>]
>>
>>Thanks
>>
>>
>>Kambiz
>>
>>Alena Prokharchyk  wrote:
>>> 
>>> Kambiz, the debug statements below are for the case when eth1 is a
>>>control
>>> interface as it was in your old command. I’ve looked at the new command,
>>> eth1 is not control, its either public or guest
>>>
>>> eth0: - control 
>>>
>>> iface  eth0 inet static
>>>   address 169.254.1.59
>>>   netmask 255.255.0.0
>>>
>>> eth1: 
>>>
>>> iface  eth1 inet static
>>>   address 10.193.17.1
>>>   Netmask
>>>
>>> So you need to execute the mysql statements for the traffic type of VR
>>>nic
>>> eth1
>>>
>>> -Alena.
>>>
>>>
>>>
>>>
>>> On 3/25/14, 9:57 AM, "Alena Prokharchyk" 
>>> wrote:
>>>
Kambiz, can you please check the following:


1) Check if the gateway is set on control network:

mysql> select gateway, cidr from networks where traffic_type=‘Control’;

2) For router control nic, check if network/gateway are set.

Select gateway,netmask from nics where instance_id= and
network_id=

-Alena.

On 3/25/14, 5:47 AM, "Kambiz Darabi"  wrote:

>Hi,
>
>I looked up the

Re: Virtual Router doesn't start

2014-03-25 Thread Alena Prokharchyk
No, it doesn’t seem right to me having 2 nics with device id 0. But looks
like they’ve got programmed to correct devices on the backend per your
prev email? 

iface  eth0 inet static
  address 169.254.1.59
  netmask 255.255.0.0

iface  eth1 inet static
  address 10.193.17.1
  Netmask

iface  eth2 inet static
  address 10.193.17.190
  netmask 255.255.255.0



I can see that only one parameter is missing from the start command, the
second nic (network id=205) doesn’t have the gateway.
From the command/DB, I see that the gateway is missing in the nics table
for the network 205? Can you check gateway information in the networks
table for the id=205




On 3/25/14, 1:01 PM, "Kambiz Darabi"  wrote:

>Hi,
>
>select 
>id,ip4_address,netmask,gateway,state,removed,network_id,reserver_name
>from nics where instance_id=19;
>
>++---+---+-+---+-+
>+--+
>| id | ip4_address   | netmask   | gateway | state | removed
>| network_id | reserver_name|
>++---+---+-+---+-+
>+--+
>| 29 | 10.124.99.1   | 255.255.255.0 | NULL| Allocated | NULL
>|205 | ExternalGuestNetworkGuru |
>| 30 | NULL  | NULL  | NULL| Allocated | NULL
>|202 | ControlNetworkGuru   |
>| 31 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | Allocated | NULL
>|200 | PublicNetworkGuru|
>++---+---+-+---+-+
>+--+
>
>and this is the nics element from the StartCmd. Is it normal to have
>two nics with deviceId 0?
>
>"nics":[
>{"deviceId":2,
> "networkRateMbps":200,
> "defaultNic":true,
> "uuid":"22c19454-fd05-45c8-af6b-5f0ef073f86c",
> "ip":"10.193.17.190",
> "netmask":"255.255.255.0",
> "gateway":"10.193.17.1",
> "mac":"06:7e:fe:00:00:bf",
> "dns1":"10.193.17.1",
> "broadcastType":"Vlan",
> "type":"Public",
> "broadcastUri":"vlan://untagged",
> "isolationUri":"vlan://untagged",
> "isSecurityGroupEnabled":false,
> "name":"cloudbr0"},
>{"deviceId":0,
> "networkRateMbps":200,
> "defaultNic":false,
> "uuid":"6c5a8337-620e-49eb-9309-cdfc7039d4a8",
> "ip":"10.124.99.1",
> "netmask":"255.255.255.0",
> "mac":"02:00:2a:43:00:0d",
> "dns1":"10.193.17.1",
> "broadcastType":"Vlan",
> "type":"Guest",
> "broadcastUri":"vlan://3949",
> "isolationUri":"vlan://3949",
> "isSecurityGroupEnabled":false,
> "name":"cloudbr1"},
>{"deviceId":0,
> "networkRateMbps":-1,
> "defaultNic":false,
> "uuid":"cabd4cd9-c39f-423f-ad6a-ee3affe0bd9d",
> "ip":"169.254.1.59",
> "netmask":"255.255.0.0",
> "gateway":"169.254.0.1",
> "mac":"0e:00:a9:fe:01:3b",
> "broadcastType":"LinkLocal",
> "type":"Control",
> "isSecurityGroupEnabled":false}
>]
>
>Thanks
>
>
>Kambiz
>
>Alena Prokharchyk  wrote:
>> 
>> Kambiz, the debug statements below are for the case when eth1 is a
>>control
>> interface as it was in your old command. I’ve looked at the new command,
>> eth1 is not control, its either public or guest
>>
>> eth0: - control 
>>
>> iface  eth0 inet static
>>   address 169.254.1.59
>>   netmask 255.255.0.0
>>
>> eth1: 
>>
>> iface  eth1 inet static
>>   address 10.193.17.1
>>   Netmask
>>
>> So you need to execute the mysql statements for the traffic type of VR
>>nic
>> eth1
>>
>> -Alena.
>>
>>
>>
>>
>> On 3/25/14, 9:57 AM, "Alena Prokharchyk" 
>> wrote:
>>
>>>Kambiz, can you please check the following:
>>>
>>>
>>>1) Check if the gateway is set on control network:
>>>
>>>mysql> select gateway, cidr from networks where traffic_type=‘Control’;
>>>
>>>2) For router control nic, check if network/gateway are set.
>>>
>>>Select gateway,netmask from nics where instance_id= and
>>>network_id=
>>>
>>>-Alena.
>>>
>>>On 3/25/14, 5:47 AM, "Kambiz Darabi"  wrote:
>>>
Hi,

I looked up the startup command of the old router instance which worked
correctly:

/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l
r-7-VM -t all -d /var/lib/libvirt/images/r-7-VM-patchdisk -p
%template=domP%name=r-7-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%
ga
t
eway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6cl
ou
d
.internal%dhcprange=10.124.99.1%eth1ip=169.254.2.46%eth1mask=255.255.0.
0%
t
ype=router%disable_rp_filter=true%dns1=10.193.17.1

The new command (cf. below) doesn't have the parameters eth1ip and
eth1mask.

Thanks


Kambiz

Alena Prokharchyk  wrote:
> 
> I don’t think its relevant as the piece we’ve fixed, just eliminated
> static nat rule programming for non-existing vm. Missing netmask on
>eth1
> doesn’t seem related to the problem (although we have t

Re: Virtual Router doesn't start

2014-03-25 Thread Kambiz Darabi
Hi,

select id,ip4_address,netmask,gateway,state,removed,network_id,reserver_name 
from nics where instance_id=19;

++---+---+-+---+-++--+
| id | ip4_address   | netmask   | gateway | state | removed | 
network_id | reserver_name|
++---+---+-+---+-++--+
| 29 | 10.124.99.1   | 255.255.255.0 | NULL| Allocated | NULL|  
  205 | ExternalGuestNetworkGuru |
| 30 | NULL  | NULL  | NULL| Allocated | NULL|  
  202 | ControlNetworkGuru   |
| 31 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | Allocated | NULL|  
  200 | PublicNetworkGuru|
++---+---+-+---+-++--+

and this is the nics element from the StartCmd. Is it normal to have
two nics with deviceId 0?

"nics":[
{"deviceId":2,
 "networkRateMbps":200,
 "defaultNic":true,
 "uuid":"22c19454-fd05-45c8-af6b-5f0ef073f86c",
 "ip":"10.193.17.190",
 "netmask":"255.255.255.0",
 "gateway":"10.193.17.1",
 "mac":"06:7e:fe:00:00:bf",
 "dns1":"10.193.17.1",
 "broadcastType":"Vlan",
 "type":"Public",
 "broadcastUri":"vlan://untagged",
 "isolationUri":"vlan://untagged",
 "isSecurityGroupEnabled":false,
 "name":"cloudbr0"},
{"deviceId":0,
 "networkRateMbps":200,
 "defaultNic":false,
 "uuid":"6c5a8337-620e-49eb-9309-cdfc7039d4a8",
 "ip":"10.124.99.1",
 "netmask":"255.255.255.0",
 "mac":"02:00:2a:43:00:0d",
 "dns1":"10.193.17.1",
 "broadcastType":"Vlan",
 "type":"Guest",
 "broadcastUri":"vlan://3949",
 "isolationUri":"vlan://3949",
 "isSecurityGroupEnabled":false,
 "name":"cloudbr1"},
{"deviceId":0,
 "networkRateMbps":-1,
 "defaultNic":false,
 "uuid":"cabd4cd9-c39f-423f-ad6a-ee3affe0bd9d",
 "ip":"169.254.1.59",
 "netmask":"255.255.0.0",
 "gateway":"169.254.0.1",
 "mac":"0e:00:a9:fe:01:3b",
 "broadcastType":"LinkLocal",
 "type":"Control",
 "isSecurityGroupEnabled":false}
]

Thanks


Kambiz

Alena Prokharchyk  wrote:
> 
> Kambiz, the debug statements below are for the case when eth1 is a control
> interface as it was in your old command. I’ve looked at the new command,
> eth1 is not control, its either public or guest
>
> eth0: - control 
>
> iface  eth0 inet static
>   address 169.254.1.59
>   netmask 255.255.0.0
>
> eth1: 
>
> iface  eth1 inet static
>   address 10.193.17.1
>   Netmask
>
> So you need to execute the mysql statements for the traffic type of VR nic
> eth1
>
> -Alena.
>
>
>
>
> On 3/25/14, 9:57 AM, "Alena Prokharchyk" 
> wrote:
>
>>Kambiz, can you please check the following:
>>
>>
>>1) Check if the gateway is set on control network:
>>
>>mysql> select gateway, cidr from networks where traffic_type=‘Control’;
>>
>>2) For router control nic, check if network/gateway are set.
>>
>>Select gateway,netmask from nics where instance_id= and
>>network_id=
>>
>>-Alena.
>>
>>On 3/25/14, 5:47 AM, "Kambiz Darabi"  wrote:
>>
>>>Hi,
>>>
>>>I looked up the startup command of the old router instance which worked
>>>correctly:
>>>
>>>/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l
>>>r-7-VM -t all -d /var/lib/libvirt/images/r-7-VM-patchdisk -p
>>>%template=domP%name=r-7-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%ga
>>>t
>>>eway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6clou
>>>d
>>>.internal%dhcprange=10.124.99.1%eth1ip=169.254.2.46%eth1mask=255.255.0.0%
>>>t
>>>ype=router%disable_rp_filter=true%dns1=10.193.17.1
>>>
>>>The new command (cf. below) doesn't have the parameters eth1ip and
>>>eth1mask.
>>>
>>>Thanks
>>>
>>>
>>>Kambiz
>>>
>>>Alena Prokharchyk  wrote:
 
 I don’t think its relevant as the piece we’ve fixed, just eliminated
 static nat rule programming for non-existing vm. Missing netmask on
eth1
 doesn’t seem related to the problem (although we have to figure out why
 its missing), as the connection that fails, happening to link local
169.x
 eth0 interface.

 Edison, can you please tell us how to debug link local connection
failure,
 on KVM agent?

 Thank you,
 Alena.

 On 3/24/14, 1:47 PM, "Kambiz Darabi"  wrote:

>Hi,
>
>thank you, the NullPointerException doesn't occur any more, but there
>still seems to be a problem during startup of the router.
>
>When I start the virtual router, it comes up, but in agent.log, there
>are lots of 'Could not connect to 169.254.1.x'  messages.
>
>Then I logged into the virtual router to find out that the netmask of
>eth1 is missing in the interfaces file:
>
>root@host:~# virsh console r-19-VM
>Connected to domain r-19-VM
>Escape character is ^]
>

Re: Virtual Router doesn't start

2014-03-25 Thread Alena Prokharchyk
Kambiz, the debug statements below are for the case when eth1 is a control
interface as it was in your old command. I’ve looked at the new command,
eth1 is not control, its either public or guest

eth0: - control 

iface  eth0 inet static
  address 169.254.1.59
  netmask 255.255.0.0

eth1: 

iface  eth1 inet static
  address 10.193.17.1
  Netmask

So you need to execute the mysql statements for the traffic type of VR nic
eth1

-Alena.




On 3/25/14, 9:57 AM, "Alena Prokharchyk" 
wrote:

>Kambiz, can you please check the following:
>
>
>1) Check if the gateway is set on control network:
>
>mysql> select gateway, cidr from networks where traffic_type=‘Control’;
>
>2) For router control nic, check if network/gateway are set.
>
>Select gateway,netmask from nics where instance_id= and
>network_id=
>
>-Alena.
>
>On 3/25/14, 5:47 AM, "Kambiz Darabi"  wrote:
>
>>Hi,
>>
>>I looked up the startup command of the old router instance which worked
>>correctly:
>>
>>/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l
>>r-7-VM -t all -d /var/lib/libvirt/images/r-7-VM-patchdisk -p
>>%template=domP%name=r-7-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%ga
>>t
>>eway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6clou
>>d
>>.internal%dhcprange=10.124.99.1%eth1ip=169.254.2.46%eth1mask=255.255.0.0%
>>t
>>ype=router%disable_rp_filter=true%dns1=10.193.17.1
>>
>>The new command (cf. below) doesn't have the parameters eth1ip and
>>eth1mask.
>>
>>Thanks
>>
>>
>>Kambiz
>>
>>Alena Prokharchyk  wrote:
>>> 
>>> I don’t think its relevant as the piece we’ve fixed, just eliminated
>>> static nat rule programming for non-existing vm. Missing netmask on
>>>eth1
>>> doesn’t seem related to the problem (although we have to figure out why
>>> its missing), as the connection that fails, happening to link local
>>>169.x
>>> eth0 interface.
>>>
>>> Edison, can you please tell us how to debug link local connection
>>>failure,
>>> on KVM agent?
>>>
>>> Thank you,
>>> Alena.
>>>
>>> On 3/24/14, 1:47 PM, "Kambiz Darabi"  wrote:
>>>
Hi,

thank you, the NullPointerException doesn't occur any more, but there
still seems to be a problem during startup of the router.

When I start the virtual router, it comes up, but in agent.log, there
are lots of 'Could not connect to 169.254.1.x'  messages.

Then I logged into the virtual router to find out that the netmask of
eth1 is missing in the interfaces file:

root@host:~# virsh console r-19-VM
Connected to domain r-19-VM
Escape character is ^]

Debian GNU/Linux 6.0 r-19-VM ttyS0

r-19-VM login: root
...
root@r-19-VM:~# cat /etc/network/interfaces
auto lo eth0 eth1 eth2
iface lo inet loopback

iface  eth0 inet static
  address 169.254.1.59
  netmask 255.255.0.0
iface  eth1 inet static
  address 10.193.17.1
  netmask 
iface  eth2 inet static
  address 10.193.17.190
  netmask 255.255.255.0

I don't know if it is relevant, but this is the line from agent.log
where the parameters are visible:

2014-03-24 21:36:17,681 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Executing:
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l
r-19-VM -t all -d /var/lib/libvirt/images/r-19-VM-patchdisk -p
%template=domP%name=r-19-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0
%
ga
teway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6c
l
ou
d.internal%dhcprange=10.124.99.1%eth0ip=169.254.1.60%eth0mask=255.255.0
.
0%
type=router%disable_rp_filter=true%dns1=10.193.17.1


Any hint is appreciated.

Thanks


Kambiz


Alena Prokharchyk  wrote:
> 
> Yes, Kambiz, you followed up right, and vm id=15 is the culprit. If
>vm
> id=15 is expunged, we have to clear out the reference to it from
> user_ip_address table. Here is the flow:
>
> 1) Save the db dump.
> 2) Run the query to cleanup the reference:
>
> Update user_ip_address set one_to_one_nat=0, instance_id=null where
> id=
>
>
>
> Let me know how it works.
>
> -Alena.
>
> On 3/24/14, 10:55 AM, "Kambiz Darabi"  wrote:
>
>>Hi,
>>
>>I hope I have understood what you wrote and created the following
>>query
>>correctly:
>>
>>select uip.vm_id, uip.network_id, uip.public_ip_address,
>>   n.state as nic_state, n.removed as nic_removed,
>>   vm.state as vm_state, vm.removed as vm_removed
>>from user_ip_address uip
>> join nics n on uip.vm_id = n.instance_id
>> join vm_instance vm on uip.vm_id = vm.id
>>where uip.id in (Select ip_address_id from firewall_rules fr where
>>fr.network_id=205);
>>
>>
>>+---++---+--+
>>-
>>--
>>--
>

Re: Virtual Router doesn't start

2014-03-25 Thread Alena Prokharchyk
Kambiz, can you please check the following:


1) Check if the gateway is set on control network:

mysql> select gateway, cidr from networks where traffic_type=‘Control’;

2) For router control nic, check if network/gateway are set.

Select gateway,netmask from nics where instance_id= and
network_id=

-Alena.

On 3/25/14, 5:47 AM, "Kambiz Darabi"  wrote:

>Hi,
>
>I looked up the startup command of the old router instance which worked
>correctly:
>
>/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l
>r-7-VM -t all -d /var/lib/libvirt/images/r-7-VM-patchdisk -p
>%template=domP%name=r-7-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%gat
>eway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6cloud
>.internal%dhcprange=10.124.99.1%eth1ip=169.254.2.46%eth1mask=255.255.0.0%t
>ype=router%disable_rp_filter=true%dns1=10.193.17.1
>
>The new command (cf. below) doesn't have the parameters eth1ip and
>eth1mask.
>
>Thanks
>
>
>Kambiz
>
>Alena Prokharchyk  wrote:
>> 
>> I don’t think its relevant as the piece we’ve fixed, just eliminated
>> static nat rule programming for non-existing vm. Missing netmask on eth1
>> doesn’t seem related to the problem (although we have to figure out why
>> its missing), as the connection that fails, happening to link local
>>169.x
>> eth0 interface.
>>
>> Edison, can you please tell us how to debug link local connection
>>failure,
>> on KVM agent?
>>
>> Thank you,
>> Alena.
>>
>> On 3/24/14, 1:47 PM, "Kambiz Darabi"  wrote:
>>
>>>Hi,
>>>
>>>thank you, the NullPointerException doesn't occur any more, but there
>>>still seems to be a problem during startup of the router.
>>>
>>>When I start the virtual router, it comes up, but in agent.log, there
>>>are lots of 'Could not connect to 169.254.1.x'  messages.
>>>
>>>Then I logged into the virtual router to find out that the netmask of
>>>eth1 is missing in the interfaces file:
>>>
>>>root@host:~# virsh console r-19-VM
>>>Connected to domain r-19-VM
>>>Escape character is ^]
>>>
>>>Debian GNU/Linux 6.0 r-19-VM ttyS0
>>>
>>>r-19-VM login: root
>>>...
>>>root@r-19-VM:~# cat /etc/network/interfaces
>>>auto lo eth0 eth1 eth2
>>>iface lo inet loopback
>>>
>>>iface  eth0 inet static
>>>  address 169.254.1.59
>>>  netmask 255.255.0.0
>>>iface  eth1 inet static
>>>  address 10.193.17.1
>>>  netmask 
>>>iface  eth2 inet static
>>>  address 10.193.17.190
>>>  netmask 255.255.255.0
>>>
>>>I don't know if it is relevant, but this is the line from agent.log
>>>where the parameters are visible:
>>>
>>>2014-03-24 21:36:17,681 DEBUG [kvm.resource.LibvirtComputingResource]
>>>(agentRequest-Handler-2:null) Executing:
>>>/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l
>>>r-19-VM -t all -d /var/lib/libvirt/images/r-19-VM-patchdisk -p
>>>%template=domP%name=r-19-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%
>>>ga
>>>teway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6cl
>>>ou
>>>d.internal%dhcprange=10.124.99.1%eth0ip=169.254.1.60%eth0mask=255.255.0.
>>>0%
>>>type=router%disable_rp_filter=true%dns1=10.193.17.1
>>>
>>>
>>>Any hint is appreciated.
>>>
>>>Thanks
>>>
>>>
>>>Kambiz
>>>
>>>
>>>Alena Prokharchyk  wrote:
 
 Yes, Kambiz, you followed up right, and vm id=15 is the culprit. If vm
 id=15 is expunged, we have to clear out the reference to it from
 user_ip_address table. Here is the flow:

 1) Save the db dump.
 2) Run the query to cleanup the reference:

 Update user_ip_address set one_to_one_nat=0, instance_id=null where
 id=



 Let me know how it works.

 -Alena.

 On 3/24/14, 10:55 AM, "Kambiz Darabi"  wrote:

>Hi,
>
>I hope I have understood what you wrote and created the following
>query
>correctly:
>
>select uip.vm_id, uip.network_id, uip.public_ip_address,
>   n.state as nic_state, n.removed as nic_removed,
>   vm.state as vm_state, vm.removed as vm_removed
>from user_ip_address uip
> join nics n on uip.vm_id = n.instance_id
> join vm_instance vm on uip.vm_id = vm.id
>where uip.id in (Select ip_address_id from firewall_rules fr where
>fr.network_id=205);
>
>
>+---++---+--+-
>--
>--
>+---++
>| vm_id | network_id | public_ip_address | nic_state| nic_removed
>| vm_state  | vm_removed |
>+---++---+--+-
>--
>--
>+---++
>| 6 |205 | 10.193.17.169 | Allocated| NULL
>| Stopped   | NULL   |
>|10 |205 | 10.193.17.136 | Allocated| NULL
>| Stopped   | NULL   |
>|12 |205 | 10.193.17.140 | Allocated| NULL
>| Stopped   | NULL   |
>|13 |205 | 10.193.17.141 | Allocated| NULL
>| Stopped   | NULL 

Re: Virtual Router doesn't start

2014-03-25 Thread Kambiz Darabi
Hi,

I looked up the startup command of the old router instance which worked
correctly:

/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l r-7-VM 
-t all -d /var/lib/libvirt/images/r-7-VM-patchdisk -p 
%template=domP%name=r-7-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%gateway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6cloud.internal%dhcprange=10.124.99.1%eth1ip=169.254.2.46%eth1mask=255.255.0.0%type=router%disable_rp_filter=true%dns1=10.193.17.1

The new command (cf. below) doesn't have the parameters eth1ip and
eth1mask.

Thanks


Kambiz

Alena Prokharchyk  wrote:
> 
> I don’t think its relevant as the piece we’ve fixed, just eliminated
> static nat rule programming for non-existing vm. Missing netmask on eth1
> doesn’t seem related to the problem (although we have to figure out why
> its missing), as the connection that fails, happening to link local 169.x
> eth0 interface.
>
> Edison, can you please tell us how to debug link local connection failure,
> on KVM agent?
>
> Thank you,
> Alena.
>
> On 3/24/14, 1:47 PM, "Kambiz Darabi"  wrote:
>
>>Hi,
>>
>>thank you, the NullPointerException doesn't occur any more, but there
>>still seems to be a problem during startup of the router.
>>
>>When I start the virtual router, it comes up, but in agent.log, there
>>are lots of 'Could not connect to 169.254.1.x'  messages.
>>
>>Then I logged into the virtual router to find out that the netmask of
>>eth1 is missing in the interfaces file:
>>
>>root@host:~# virsh console r-19-VM
>>Connected to domain r-19-VM
>>Escape character is ^]
>>
>>Debian GNU/Linux 6.0 r-19-VM ttyS0
>>
>>r-19-VM login: root
>>...
>>root@r-19-VM:~# cat /etc/network/interfaces
>>auto lo eth0 eth1 eth2
>>iface lo inet loopback
>>
>>iface  eth0 inet static
>>  address 169.254.1.59
>>  netmask 255.255.0.0
>>iface  eth1 inet static
>>  address 10.193.17.1
>>  netmask 
>>iface  eth2 inet static
>>  address 10.193.17.190
>>  netmask 255.255.255.0
>>
>>I don't know if it is relevant, but this is the line from agent.log
>>where the parameters are visible:
>>
>>2014-03-24 21:36:17,681 DEBUG [kvm.resource.LibvirtComputingResource]
>>(agentRequest-Handler-2:null) Executing:
>>/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l
>>r-19-VM -t all -d /var/lib/libvirt/images/r-19-VM-patchdisk -p
>>%template=domP%name=r-19-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%ga
>>teway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6clou
>>d.internal%dhcprange=10.124.99.1%eth0ip=169.254.1.60%eth0mask=255.255.0.0%
>>type=router%disable_rp_filter=true%dns1=10.193.17.1
>>
>>
>>Any hint is appreciated.
>>
>>Thanks
>>
>>
>>Kambiz
>>
>>
>>Alena Prokharchyk  wrote:
>>> 
>>> Yes, Kambiz, you followed up right, and vm id=15 is the culprit. If vm
>>> id=15 is expunged, we have to clear out the reference to it from
>>> user_ip_address table. Here is the flow:
>>>
>>> 1) Save the db dump.
>>> 2) Run the query to cleanup the reference:
>>>
>>> Update user_ip_address set one_to_one_nat=0, instance_id=null where
>>> id=
>>>
>>>
>>>
>>> Let me know how it works.
>>>
>>> -Alena.
>>>
>>> On 3/24/14, 10:55 AM, "Kambiz Darabi"  wrote:
>>>
Hi,

I hope I have understood what you wrote and created the following query
correctly:

select uip.vm_id, uip.network_id, uip.public_ip_address,
   n.state as nic_state, n.removed as nic_removed,
   vm.state as vm_state, vm.removed as vm_removed
from user_ip_address uip
 join nics n on uip.vm_id = n.instance_id
 join vm_instance vm on uip.vm_id = vm.id
where uip.id in (Select ip_address_id from firewall_rules fr where
fr.network_id=205);


+---++---+--+---
--
+---++
| vm_id | network_id | public_ip_address | nic_state| nic_removed
| vm_state  | vm_removed |
+---++---+--+---
--
+---++
| 6 |205 | 10.193.17.169 | Allocated| NULL
| Stopped   | NULL   |
|10 |205 | 10.193.17.136 | Allocated| NULL
| Stopped   | NULL   |
|12 |205 | 10.193.17.140 | Allocated| NULL
| Stopped   | NULL   |
|13 |205 | 10.193.17.141 | Allocated| NULL
| Stopped   | NULL   |
|14 |205 | 10.193.17.142 | Allocated| NULL
| Stopped   | NULL   |
|15 |205 | 10.193.17.174 | Deallocating | 2014-03-18
23:00:53 | Expunging | NULL   |
|16 |205 | 10.193.17.103 | Allocated| NULL
| Stopped   | NULL   |
+---++---+--+---
--
+---++

Is VM id 15 what you are looking for?

Thank you


Kambiz


Re: Virtual Router doesn't start

2014-03-24 Thread Alena Prokharchyk
I don’t think its relevant as the piece we’ve fixed, just eliminated
static nat rule programming for non-existing vm. Missing netmask on eth1
doesn’t seem related to the problem (although we have to figure out why
its missing), as the connection that fails, happening to link local 169.x
eth0 interface.

Edison, can you please tell us how to debug link local connection failure,
on KVM agent?

Thank you,
Alena.

On 3/24/14, 1:47 PM, "Kambiz Darabi"  wrote:

>Hi,
>
>thank you, the NullPointerException doesn't occur any more, but there
>still seems to be a problem during startup of the router.
>
>When I start the virtual router, it comes up, but in agent.log, there
>are lots of 'Could not connect to 169.254.1.x'  messages.
>
>Then I logged into the virtual router to find out that the netmask of
>eth1 is missing in the interfaces file:
>
>root@host:~# virsh console r-19-VM
>Connected to domain r-19-VM
>Escape character is ^]
>
>Debian GNU/Linux 6.0 r-19-VM ttyS0
>
>r-19-VM login: root
>...
>root@r-19-VM:~# cat /etc/network/interfaces
>auto lo eth0 eth1 eth2
>iface lo inet loopback
>
>iface  eth0 inet static
>  address 169.254.1.59
>  netmask 255.255.0.0
>iface  eth1 inet static
>  address 10.193.17.1
>  netmask 
>iface  eth2 inet static
>  address 10.193.17.190
>  netmask 255.255.255.0
>
>I don't know if it is relevant, but this is the line from agent.log
>where the parameters are visible:
>
>2014-03-24 21:36:17,681 DEBUG [kvm.resource.LibvirtComputingResource]
>(agentRequest-Handler-2:null) Executing:
>/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l
>r-19-VM -t all -d /var/lib/libvirt/images/r-19-VM-patchdisk -p
>%template=domP%name=r-19-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%ga
>teway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6clou
>d.internal%dhcprange=10.124.99.1%eth0ip=169.254.1.60%eth0mask=255.255.0.0%
>type=router%disable_rp_filter=true%dns1=10.193.17.1
>
>
>Any hint is appreciated.
>
>Thanks
>
>
>Kambiz
>
>
>Alena Prokharchyk  wrote:
>> 
>> Yes, Kambiz, you followed up right, and vm id=15 is the culprit. If vm
>> id=15 is expunged, we have to clear out the reference to it from
>> user_ip_address table. Here is the flow:
>>
>> 1) Save the db dump.
>> 2) Run the query to cleanup the reference:
>>
>> Update user_ip_address set one_to_one_nat=0, instance_id=null where
>> id=
>>
>>
>>
>> Let me know how it works.
>>
>> -Alena.
>>
>> On 3/24/14, 10:55 AM, "Kambiz Darabi"  wrote:
>>
>>>Hi,
>>>
>>>I hope I have understood what you wrote and created the following query
>>>correctly:
>>>
>>>select uip.vm_id, uip.network_id, uip.public_ip_address,
>>>   n.state as nic_state, n.removed as nic_removed,
>>>   vm.state as vm_state, vm.removed as vm_removed
>>>from user_ip_address uip
>>> join nics n on uip.vm_id = n.instance_id
>>> join vm_instance vm on uip.vm_id = vm.id
>>>where uip.id in (Select ip_address_id from firewall_rules fr where
>>>fr.network_id=205);
>>>
>>>
>>>+---++---+--+---
>>>--
>>>+---++
>>>| vm_id | network_id | public_ip_address | nic_state| nic_removed
>>>| vm_state  | vm_removed |
>>>+---++---+--+---
>>>--
>>>+---++
>>>| 6 |205 | 10.193.17.169 | Allocated| NULL
>>>| Stopped   | NULL   |
>>>|10 |205 | 10.193.17.136 | Allocated| NULL
>>>| Stopped   | NULL   |
>>>|12 |205 | 10.193.17.140 | Allocated| NULL
>>>| Stopped   | NULL   |
>>>|13 |205 | 10.193.17.141 | Allocated| NULL
>>>| Stopped   | NULL   |
>>>|14 |205 | 10.193.17.142 | Allocated| NULL
>>>| Stopped   | NULL   |
>>>|15 |205 | 10.193.17.174 | Deallocating | 2014-03-18
>>>23:00:53 | Expunging | NULL   |
>>>|16 |205 | 10.193.17.103 | Allocated| NULL
>>>| Stopped   | NULL   |
>>>+---++---+--+---
>>>--
>>>+---++
>>>
>>>Is VM id 15 what you are looking for?
>>>
>>>Thank you
>>>
>>>
>>>Kambiz
>>>
>>>Alena Prokharchyk  wrote:
 
 Kambiz, can you please try one more thing.

 1) Locate all the firewall rules for your guest network (205, right?)

 Select id, ip_address_id from firewall_rules where network_id=205;

 2) Now get all static nat enabled ip addresses for those rules:

 Select vm_id, network_id from user_ip_address where id in (Select id,
 ip_address_id from firewall_rules where network_id=205);

 For each vmId/networkId combo, check if there is non-removed nic and
 non-expunged vm. There might be some incorrect static nat ip/vm
reference
 referring to vm that is removed already. If you find any, let me know
and
 I will tell you how to clean it up

 -Alena.


Re: Virtual Router doesn't start

2014-03-24 Thread Kambiz Darabi
Hi,

thank you, the NullPointerException doesn't occur any more, but there
still seems to be a problem during startup of the router.

When I start the virtual router, it comes up, but in agent.log, there
are lots of 'Could not connect to 169.254.1.x'  messages.

Then I logged into the virtual router to find out that the netmask of
eth1 is missing in the interfaces file:

root@host:~# virsh console r-19-VM
Connected to domain r-19-VM
Escape character is ^]

Debian GNU/Linux 6.0 r-19-VM ttyS0

r-19-VM login: root
...
root@r-19-VM:~# cat /etc/network/interfaces 
auto lo eth0 eth1 eth2
iface lo inet loopback

iface  eth0 inet static
  address 169.254.1.59 
  netmask 255.255.0.0
iface  eth1 inet static
  address 10.193.17.1 
  netmask 
iface  eth2 inet static
  address 10.193.17.190 
  netmask 255.255.255.0

I don't know if it is relevant, but this is the line from agent.log
where the parameters are visible:

2014-03-24 21:36:17,681 DEBUG [kvm.resource.LibvirtComputingResource] 
(agentRequest-Handler-2:null) Executing: 
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/rundomrpre.sh -l r-19-VM 
-t all -d /var/lib/libvirt/images/r-19-VM-patchdisk -p 
%template=domP%name=r-19-VM%eth2ip=10.193.17.190%eth2mask=255.255.255.0%gateway=10.193.17.1%eth0ip=10.124.99.1%eth0mask=255.255.255.0%domain=cs6cloud.internal%dhcprange=10.124.99.1%eth0ip=169.254.1.60%eth0mask=255.255.0.0%type=router%disable_rp_filter=true%dns1=10.193.17.1
 


Any hint is appreciated.

Thanks


Kambiz


Alena Prokharchyk  wrote:
> 
> Yes, Kambiz, you followed up right, and vm id=15 is the culprit. If vm
> id=15 is expunged, we have to clear out the reference to it from
> user_ip_address table. Here is the flow:
>
> 1) Save the db dump.
> 2) Run the query to cleanup the reference:
>
> Update user_ip_address set one_to_one_nat=0, instance_id=null where
> id=
>
>
>
> Let me know how it works.
>
> -Alena.
>
> On 3/24/14, 10:55 AM, "Kambiz Darabi"  wrote:
>
>>Hi,
>>
>>I hope I have understood what you wrote and created the following query
>>correctly:
>>
>>select uip.vm_id, uip.network_id, uip.public_ip_address,
>>   n.state as nic_state, n.removed as nic_removed,
>>   vm.state as vm_state, vm.removed as vm_removed
>>from user_ip_address uip
>> join nics n on uip.vm_id = n.instance_id
>> join vm_instance vm on uip.vm_id = vm.id
>>where uip.id in (Select ip_address_id from firewall_rules fr where
>>fr.network_id=205);
>>
>>
>>+---++---+--+-
>>+---++
>>| vm_id | network_id | public_ip_address | nic_state| nic_removed
>>| vm_state  | vm_removed |
>>+---++---+--+-
>>+---++
>>| 6 |205 | 10.193.17.169 | Allocated| NULL
>>| Stopped   | NULL   |
>>|10 |205 | 10.193.17.136 | Allocated| NULL
>>| Stopped   | NULL   |
>>|12 |205 | 10.193.17.140 | Allocated| NULL
>>| Stopped   | NULL   |
>>|13 |205 | 10.193.17.141 | Allocated| NULL
>>| Stopped   | NULL   |
>>|14 |205 | 10.193.17.142 | Allocated| NULL
>>| Stopped   | NULL   |
>>|15 |205 | 10.193.17.174 | Deallocating | 2014-03-18
>>23:00:53 | Expunging | NULL   |
>>|16 |205 | 10.193.17.103 | Allocated| NULL
>>| Stopped   | NULL   |
>>+---++---+--+-
>>+---++
>>
>>Is VM id 15 what you are looking for?
>>
>>Thank you
>>
>>
>>Kambiz
>>
>>Alena Prokharchyk  wrote:
>>> 
>>> Kambiz, can you please try one more thing.
>>>
>>> 1) Locate all the firewall rules for your guest network (205, right?)
>>>
>>> Select id, ip_address_id from firewall_rules where network_id=205;
>>>
>>> 2) Now get all static nat enabled ip addresses for those rules:
>>>
>>> Select vm_id, network_id from user_ip_address where id in (Select id,
>>> ip_address_id from firewall_rules where network_id=205);
>>>
>>> For each vmId/networkId combo, check if there is non-removed nic and
>>> non-expunged vm. There might be some incorrect static nat ip/vm
>>>reference
>>> referring to vm that is removed already. If you find any, let me know
>>>and
>>> I will tell you how to clean it up
>>>
>>> -Alena.
>>>
>>> On 3/22/14, 5:41 AM, "Kambiz Darabi"  wrote:
>>>
Hi Alena,

thank you for your help.

The query returns no rows, i.e. nics.removed was not null, but I removed
the row though to see what happens: a new virtual router was created
which also couldn't be started due to the same NPE. I reverted the
change by restoring from the dump.

I have to mention that prior to the restart, r-7-VM was the router which
was used by my instances. I deleted the router using the UI after the
first
occurrence of the NPE, because a post with a similar problem suggested
>>>

Re: Virtual Router doesn't start

2014-03-24 Thread Alena Prokharchyk
Yes, Kambiz, you followed up right, and vm id=15 is the culprit. If vm
id=15 is expunged, we have to clear out the reference to it from
user_ip_address table. Here is the flow:

1) Save the db dump.
2) Run the query to cleanup the reference:

Update user_ip_address set one_to_one_nat=0, instance_id=null where
id=



Let me know how it works.

-Alena.

On 3/24/14, 10:55 AM, "Kambiz Darabi"  wrote:

>Hi,
>
>I hope I have understood what you wrote and created the following query
>correctly:
>
>select uip.vm_id, uip.network_id, uip.public_ip_address,
>   n.state as nic_state, n.removed as nic_removed,
>   vm.state as vm_state, vm.removed as vm_removed
>from user_ip_address uip
> join nics n on uip.vm_id = n.instance_id
> join vm_instance vm on uip.vm_id = vm.id
>where uip.id in (Select ip_address_id from firewall_rules fr where
>fr.network_id=205);
>
>
>+---++---+--+-
>+---++
>| vm_id | network_id | public_ip_address | nic_state| nic_removed
>| vm_state  | vm_removed |
>+---++---+--+-
>+---++
>| 6 |205 | 10.193.17.169 | Allocated| NULL
>| Stopped   | NULL   |
>|10 |205 | 10.193.17.136 | Allocated| NULL
>| Stopped   | NULL   |
>|12 |205 | 10.193.17.140 | Allocated| NULL
>| Stopped   | NULL   |
>|13 |205 | 10.193.17.141 | Allocated| NULL
>| Stopped   | NULL   |
>|14 |205 | 10.193.17.142 | Allocated| NULL
>| Stopped   | NULL   |
>|15 |205 | 10.193.17.174 | Deallocating | 2014-03-18
>23:00:53 | Expunging | NULL   |
>|16 |205 | 10.193.17.103 | Allocated| NULL
>| Stopped   | NULL   |
>+---++---+--+-
>+---++
>
>Is VM id 15 what you are looking for?
>
>Thank you
>
>
>Kambiz
>
>Alena Prokharchyk  wrote:
>> 
>> Kambiz, can you please try one more thing.
>>
>> 1) Locate all the firewall rules for your guest network (205, right?)
>>
>> Select id, ip_address_id from firewall_rules where network_id=205;
>>
>> 2) Now get all static nat enabled ip addresses for those rules:
>>
>> Select vm_id, network_id from user_ip_address where id in (Select id,
>> ip_address_id from firewall_rules where network_id=205);
>>
>> For each vmId/networkId combo, check if there is non-removed nic and
>> non-expunged vm. There might be some incorrect static nat ip/vm
>>reference
>> referring to vm that is removed already. If you find any, let me know
>>and
>> I will tell you how to clean it up
>>
>> -Alena.
>>
>> On 3/22/14, 5:41 AM, "Kambiz Darabi"  wrote:
>>
>>>Hi Alena,
>>>
>>>thank you for your help.
>>>
>>>The query returns no rows, i.e. nics.removed was not null, but I removed
>>>the row though to see what happens: a new virtual router was created
>>>which also couldn't be started due to the same NPE. I reverted the
>>>change by restoring from the dump.
>>>
>>>I have to mention that prior to the restart, r-7-VM was the router which
>>>was used by my instances. I deleted the router using the UI after the
>>>first
>>>occurrence of the NPE, because a post with a similar problem suggested
>>>that the deleted router would be recreated again (and this procedure
>>>solved the problem).
>>>
>>>Below I have attached the state of the two tables.
>>>
>>>Anything else I can try?
>>>
>>>Thank you
>>>
>>>
>>>Kambiz
>>>
>>>mysql> select n.id, n.removed, n.ip4_address, n.netmask, n.gateway,
>>>n.ip_type, n.reserver_name, n.network_id, i.id as instance_id, i.name,
>>>i.state, i.type from vm_instance i join nics n on n.instance_id = i.id
>>>where i.type = 'DomainRouter';
>>>++-+---+---+
>>>-+
>>>-+--++-+
>>>-+
>>>---+--+
>>>| id | removed | ip4_address   | netmask   | gateway
>>>| ip_type | reserver_name| network_id | instance_id | name
>>>| state | type |
>>>++-+---+---+
>>>-+
>>>-+--++-+
>>>-+
>>>---+--+
>>>|  9 | 2014-03-17 11:27:58 | 10.124.99.1   | 255.255.255.0 | NULL
>>>| NULL| ExternalGuestNetworkGuru |204 |   4 | r-4-VM
>>>| Expunging | DomainRouter |
>>>| 10 | 2014-03-17 11:27:58 | NULL  | NULL  | NULL
>>>| NULL| ControlNetworkGuru   |202 |   4 | r-4-VM
>>>| Expunging | DomainRouter |
>>>| 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1
>>>| NULL| PublicNetworkGuru|200 |   4 | r-4-VM
>>>| Expunging | DomainRouter |
>>>| 14 | 2014-03-17 11:27:52 | 10.124.99.1   | 

Re: Virtual Router doesn't start

2014-03-24 Thread Kambiz Darabi
Hi,

I hope I have understood what you wrote and created the following query
correctly:

select uip.vm_id, uip.network_id, uip.public_ip_address,
   n.state as nic_state, n.removed as nic_removed,
   vm.state as vm_state, vm.removed as vm_removed
from user_ip_address uip
 join nics n on uip.vm_id = n.instance_id
 join vm_instance vm on uip.vm_id = vm.id
where uip.id in (Select ip_address_id from firewall_rules fr where 
fr.network_id=205);


+---++---+--+-+---++
| vm_id | network_id | public_ip_address | nic_state| nic_removed | 
vm_state  | vm_removed |
+---++---+--+-+---++
| 6 |205 | 10.193.17.169 | Allocated| NULL| 
Stopped   | NULL   |
|10 |205 | 10.193.17.136 | Allocated| NULL| 
Stopped   | NULL   |
|12 |205 | 10.193.17.140 | Allocated| NULL| 
Stopped   | NULL   |
|13 |205 | 10.193.17.141 | Allocated| NULL| 
Stopped   | NULL   |
|14 |205 | 10.193.17.142 | Allocated| NULL| 
Stopped   | NULL   |
|15 |205 | 10.193.17.174 | Deallocating | 2014-03-18 23:00:53 | 
Expunging | NULL   |
|16 |205 | 10.193.17.103 | Allocated| NULL| 
Stopped   | NULL   |
+---++---+--+-+---++

Is VM id 15 what you are looking for?

Thank you


Kambiz

Alena Prokharchyk  wrote:
> 
> Kambiz, can you please try one more thing.
>
> 1) Locate all the firewall rules for your guest network (205, right?)
>
> Select id, ip_address_id from firewall_rules where network_id=205;
>
> 2) Now get all static nat enabled ip addresses for those rules:
>
> Select vm_id, network_id from user_ip_address where id in (Select id,
> ip_address_id from firewall_rules where network_id=205);
>
> For each vmId/networkId combo, check if there is non-removed nic and
> non-expunged vm. There might be some incorrect static nat ip/vm reference
> referring to vm that is removed already. If you find any, let me know and
> I will tell you how to clean it up
>
> -Alena.
>
> On 3/22/14, 5:41 AM, "Kambiz Darabi"  wrote:
>
>>Hi Alena,
>>
>>thank you for your help.
>>
>>The query returns no rows, i.e. nics.removed was not null, but I removed
>>the row though to see what happens: a new virtual router was created
>>which also couldn't be started due to the same NPE. I reverted the
>>change by restoring from the dump.
>>
>>I have to mention that prior to the restart, r-7-VM was the router which
>>was used by my instances. I deleted the router using the UI after the
>>first
>>occurrence of the NPE, because a post with a similar problem suggested
>>that the deleted router would be recreated again (and this procedure
>>solved the problem).
>>
>>Below I have attached the state of the two tables.
>>
>>Anything else I can try?
>>
>>Thank you
>>
>>
>>Kambiz
>>
>>mysql> select n.id, n.removed, n.ip4_address, n.netmask, n.gateway,
>>n.ip_type, n.reserver_name, n.network_id, i.id as instance_id, i.name,
>>i.state, i.type from vm_instance i join nics n on n.instance_id = i.id
>>where i.type = 'DomainRouter';
>>++-+---+---+-+
>>-+--++-+-+
>>---+--+
>>| id | removed | ip4_address   | netmask   | gateway
>>| ip_type | reserver_name| network_id | instance_id | name
>>| state | type |
>>++-+---+---+-+
>>-+--++-+-+
>>---+--+
>>|  9 | 2014-03-17 11:27:58 | 10.124.99.1   | 255.255.255.0 | NULL
>>| NULL| ExternalGuestNetworkGuru |204 |   4 | r-4-VM
>>| Expunging | DomainRouter |
>>| 10 | 2014-03-17 11:27:58 | NULL  | NULL  | NULL
>>| NULL| ControlNetworkGuru   |202 |   4 | r-4-VM
>>| Expunging | DomainRouter |
>>| 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1
>>| NULL| PublicNetworkGuru|200 |   4 | r-4-VM
>>| Expunging | DomainRouter |
>>| 14 | 2014-03-17 11:27:52 | 10.124.99.1   | 255.255.255.0 | NULL
>>| NULL| ExternalGuestNetworkGuru |205 |   7 | r-7-VM
>>| Expunging | DomainRouter |
>>| 15 | 2014-03-17 11:27:52 | NULL  | NULL  | NULL
>>| NULL| ControlNetworkGuru   |202 |   7 | r-7-VM
>>| Expunging | DomainRouter |
>>| 16 | 2014-03-17 11:27:52 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1
>>| NULL| PublicNetworkGuru|200 |   7 | r-7-VM
>>| Expun

Re: Virtual Router doesn't start

2014-03-24 Thread Alena Prokharchyk
Kambiz, can you please try one more thing.

1) Locate all the firewall rules for your guest network (205, right?)

Select id, ip_address_id from firewall_rules where network_id=205;

2) Now get all static nat enabled ip addresses for those rules:

Select vm_id, network_id from user_ip_address where id in (Select id,
ip_address_id from firewall_rules where network_id=205);

For each vmId/networkId combo, check if there is non-removed nic and
non-expunged vm. There might be some incorrect static nat ip/vm reference
referring to vm that is removed already. If you find any, let me know and
I will tell you how to clean it up

-Alena.

On 3/22/14, 5:41 AM, "Kambiz Darabi"  wrote:

>Hi Alena,
>
>thank you for your help.
>
>The query returns no rows, i.e. nics.removed was not null, but I removed
>the row though to see what happens: a new virtual router was created
>which also couldn't be started due to the same NPE. I reverted the
>change by restoring from the dump.
>
>I have to mention that prior to the restart, r-7-VM was the router which
>was used by my instances. I deleted the router using the UI after the
>first
>occurrence of the NPE, because a post with a similar problem suggested
>that the deleted router would be recreated again (and this procedure
>solved the problem).
>
>Below I have attached the state of the two tables.
>
>Anything else I can try?
>
>Thank you
>
>
>Kambiz
>
>mysql> select n.id, n.removed, n.ip4_address, n.netmask, n.gateway,
>n.ip_type, n.reserver_name, n.network_id, i.id as instance_id, i.name,
>i.state, i.type from vm_instance i join nics n on n.instance_id = i.id
>where i.type = 'DomainRouter';
>++-+---+---+-+
>-+--++-+-+
>---+--+
>| id | removed | ip4_address   | netmask   | gateway
>| ip_type | reserver_name| network_id | instance_id | name
>| state | type |
>++-+---+---+-+
>-+--++-+-+
>---+--+
>|  9 | 2014-03-17 11:27:58 | 10.124.99.1   | 255.255.255.0 | NULL
>| NULL| ExternalGuestNetworkGuru |204 |   4 | r-4-VM
>| Expunging | DomainRouter |
>| 10 | 2014-03-17 11:27:58 | NULL  | NULL  | NULL
>| NULL| ControlNetworkGuru   |202 |   4 | r-4-VM
>| Expunging | DomainRouter |
>| 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1
>| NULL| PublicNetworkGuru|200 |   4 | r-4-VM
>| Expunging | DomainRouter |
>| 14 | 2014-03-17 11:27:52 | 10.124.99.1   | 255.255.255.0 | NULL
>| NULL| ExternalGuestNetworkGuru |205 |   7 | r-7-VM
>| Expunging | DomainRouter |
>| 15 | 2014-03-17 11:27:52 | NULL  | NULL  | NULL
>| NULL| ControlNetworkGuru   |202 |   7 | r-7-VM
>| Expunging | DomainRouter |
>| 16 | 2014-03-17 11:27:52 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1
>| NULL| PublicNetworkGuru|200 |   7 | r-7-VM
>| Expunging | DomainRouter |
>| 26 | 2014-03-18 08:11:16 | 10.124.99.1   | 255.255.255.0 | NULL
>| NULL| ExternalGuestNetworkGuru |205 |  18 | r-18-VM
>| Expunging | DomainRouter |
>| 27 | 2014-03-18 08:11:16 | NULL  | NULL  | NULL
>| NULL| ControlNetworkGuru   |202 |  18 | r-18-VM
>| Expunging | DomainRouter |
>| 28 | 2014-03-18 08:11:16 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1
>| NULL| PublicNetworkGuru|200 |  18 | r-18-VM
>| Expunging | DomainRouter |
>| 29 | NULL| 10.124.99.1   | 255.255.255.0 | NULL
>| NULL| ExternalGuestNetworkGuru |205 |  19 | r-19-VM
>| Stopped   | DomainRouter |
>| 30 | NULL| NULL  | NULL  | NULL
>| NULL| ControlNetworkGuru   |202 |  19 | r-19-VM
>| Stopped   | DomainRouter |
>| 31 | NULL| 10.193.17.190 | 255.255.255.0 | 10.193.17.1
>| NULL| PublicNetworkGuru|200 |  19 | r-19-VM
>| Stopped   | DomainRouter |
>++-+---+---+-+
>-+--++-+-+
>---+--+
>
>mysql> select * from router_network_ref;
>++---+++
>| id | router_id | network_id | guest_type |
>++---+++
>|  1 | 4 |204 | Isolated   |
>|  2 | 7 |205 | Isolated   |
>|  3 |18 |205 | Isolated   |
>|  4 |19 |205 | Isolated   |
>++---+++
>
>
>
>Alena Prokharchyk  wrote:
>> 
>> The error happens not because Ip is null, but because the nic in a
>>certain
>> network can¹t be foun

Re: Virtual Router doesn't start

2014-03-22 Thread Kambiz Darabi
Hi Alena,

thank you for your help.

The query returns no rows, i.e. nics.removed was not null, but I removed
the row though to see what happens: a new virtual router was created
which also couldn't be started due to the same NPE. I reverted the
change by restoring from the dump.

I have to mention that prior to the restart, r-7-VM was the router which
was used by my instances. I deleted the router using the UI after the first
occurrence of the NPE, because a post with a similar problem suggested
that the deleted router would be recreated again (and this procedure
solved the problem).

Below I have attached the state of the two tables.

Anything else I can try?

Thank you


Kambiz

mysql> select n.id, n.removed, n.ip4_address, n.netmask, n.gateway, n.ip_type, 
n.reserver_name, n.network_id, i.id as instance_id, i.name, i.state, i.type 
from vm_instance i join nics n on n.instance_id = i.id where i.type = 
'DomainRouter';
++-+---+---+-+-+--++-+-+---+--+
| id | removed | ip4_address   | netmask   | gateway | 
ip_type | reserver_name| network_id | instance_id | name| state 
| type |
++-+---+---+-+-+--++-+-+---+--+
|  9 | 2014-03-17 11:27:58 | 10.124.99.1   | 255.255.255.0 | NULL| NULL 
   | ExternalGuestNetworkGuru |204 |   4 | r-4-VM  | Expunging 
| DomainRouter |
| 10 | 2014-03-17 11:27:58 | NULL  | NULL  | NULL| NULL 
   | ControlNetworkGuru   |202 |   4 | r-4-VM  | Expunging 
| DomainRouter |
| 11 | 2014-03-17 11:27:58 | 10.193.17.139 | 255.255.255.0 | 10.193.17.1 | NULL 
   | PublicNetworkGuru|200 |   4 | r-4-VM  | Expunging 
| DomainRouter |
| 14 | 2014-03-17 11:27:52 | 10.124.99.1   | 255.255.255.0 | NULL| NULL 
   | ExternalGuestNetworkGuru |205 |   7 | r-7-VM  | Expunging 
| DomainRouter |
| 15 | 2014-03-17 11:27:52 | NULL  | NULL  | NULL| NULL 
   | ControlNetworkGuru   |202 |   7 | r-7-VM  | Expunging 
| DomainRouter |
| 16 | 2014-03-17 11:27:52 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL 
   | PublicNetworkGuru|200 |   7 | r-7-VM  | Expunging 
| DomainRouter |
| 26 | 2014-03-18 08:11:16 | 10.124.99.1   | 255.255.255.0 | NULL| NULL 
   | ExternalGuestNetworkGuru |205 |  18 | r-18-VM | Expunging 
| DomainRouter |
| 27 | 2014-03-18 08:11:16 | NULL  | NULL  | NULL| NULL 
   | ControlNetworkGuru   |202 |  18 | r-18-VM | Expunging 
| DomainRouter |
| 28 | 2014-03-18 08:11:16 | 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL 
   | PublicNetworkGuru|200 |  18 | r-18-VM | Expunging 
| DomainRouter |
| 29 | NULL| 10.124.99.1   | 255.255.255.0 | NULL| NULL 
   | ExternalGuestNetworkGuru |205 |  19 | r-19-VM | Stopped   
| DomainRouter |
| 30 | NULL| NULL  | NULL  | NULL| NULL 
   | ControlNetworkGuru   |202 |  19 | r-19-VM | Stopped   
| DomainRouter |
| 31 | NULL| 10.193.17.190 | 255.255.255.0 | 10.193.17.1 | NULL 
   | PublicNetworkGuru|200 |  19 | r-19-VM | Stopped   
| DomainRouter |
++-+---+---+-+-+--++-+-+---+--+

mysql> select * from router_network_ref;
++---+++
| id | router_id | network_id | guest_type |
++---+++
|  1 | 4 |204 | Isolated   |
|  2 | 7 |205 | Isolated   |
|  3 |18 |205 | Isolated   |
|  4 |19 |205 | Isolated   |
++---+++



Alena Prokharchyk  wrote:
> 
> The error happens not because Ip is null, but because the nic in a certain
> network can¹t be found. Looks like there is some bug in VPC nic
> plug/unplug for Guest networks process.
>
> Kambiz, please do the following to fix it:
>
> 1) Stop the MS
> 2) Take the DB dump of cloud db in case  you have to revert back.
> 3) Run the query:
>
> select * from router_network_ref where router_id= network_id not in (select network_id from nics where instance_id= your VR> and removed is null);
>
> It will give you the list of networks refs that somehow weren¹t cleaned
> during the nic detach. Remove the entry returned from router_network_ref
> table.
>
> Let me know how it works.
>
> -Alena.
>
>
> On 3/21/14, 3:36 PM, "Kambiz Darabi"  wrote:
>
>>Hello,
>>
>>as this is my first post to t

Re: Virtual Router doesn't start

2014-03-21 Thread Alena Prokharchyk
The error happens not because Ip is null, but because the nic in a certain
network can¹t be found. Looks like there is some bug in VPC nic
plug/unplug for Guest networks process.

Kambiz, please do the following to fix it:

1) Stop the MS
2) Take the DB dump of cloud db in case  you have to revert back.
3) Run the query:

select * from router_network_ref where router_id= and removed is null);

It will give you the list of networks refs that somehow weren¹t cleaned
during the nic detach. Remove the entry returned from router_network_ref
table.

Let me know how it works.

-Alena.


On 3/21/14, 3:36 PM, "Kambiz Darabi"  wrote:

>Hello,
>
>as this is my first post to the list, I would like to thank all
>contributors for Cloudstack which I use since last fall without any
>problems. I run 4.1.1 with KVM and advanced networking.
>
>After a restart of the management server (stopping and starting the java
>process), the virtual domain router doesn't start and
>management-server.log shows a NullPointerException in
>NetworkModelImpl.getIpInNetwork (cf. stack trace below).
>
>By putting the server in debug mode and remote debugging, I found out
>that the reason is a row in the table nics which has NULL in ip (cf. row
>with id 30 in the result of the select statement below).
>
>What can I do to quickly solve this problem? Any pointers or suggestions
>are appreciated as the system is currently unusable.
>
>Thank you for your help
>
>
>Kambiz
>
>
>management-server.log:
>
>2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Asking VirtualRouter to prepare for
>Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
>2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Asking Ovs to prepare for
>Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
>2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for
>Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
>2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for
>Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
>2014-03-18 10:03:27,151 WARN  [network.element.VpcVirtualRouterElement]
>(Job-Executor-1:job-176) Network Ntwk[205|Guest|8] is not associated with
>any VPC
>2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Asking NiciraNvp to prepare for
>Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
>2014-03-18 10:03:27,151 DEBUG [network.element.NiciraNvpElement]
>(Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service
>Connectivity on network net1
>2014-03-18 10:03:27,153 DEBUG [cloud.network.NetworkModelImpl]
>(Job-Executor-1:job-176) Service SecurityGroup is not supported in the
>network id=205
>2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Lock is acquired for network id 202 as a part of
>network implement
>2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Network id=202 is already implemented
>2014-03-18 10:03:27,157 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Lock is released for network id 202 as a part of
>network implement
>2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Asking VirtualRouter to prepare for
>Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
>2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Asking Ovs to prepare for
>Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
>2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for
>Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
>2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for
>Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
>2014-03-18 10:03:27,187 WARN  [network.element.VpcVirtualRouterElement]
>(Job-Executor-1:job-176) Network Ntwk[202|Control|3] is not associated
>with any VPC
>2014-03-18 10:03:27,188 DEBUG [cloud.network.NetworkManagerImpl]
>(Job-Executor-1:job-176) Asking NiciraNvp to prepare for
>Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
>2014-03-18 10:03:27,188 DEBUG [network.element.NiciraNvpElement]
>(Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service
>Connectivity on network null
>2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl]
>(Job-Executor-1:job-176) Checking if we need to prepare 1 volumes for
>VM[DomainRouter|r-19-VM]
>2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl]
>(Job-Executor-1:job-176) No need to recreate the volume:
>Vol[24|vm=19|ROOT], since it already has a pool assigned: 200, adding
>disk to VM

Virtual Router doesn't start

2014-03-21 Thread Kambiz Darabi
Hello,

as this is my first post to the list, I would like to thank all
contributors for Cloudstack which I use since last fall without any
problems. I run 4.1.1 with KVM and advanced networking.

After a restart of the management server (stopping and starting the java
process), the virtual domain router doesn't start and
management-server.log shows a NullPointerException in
NetworkModelImpl.getIpInNetwork (cf. stack trace below).

By putting the server in debug mode and remote debugging, I found out
that the reason is a row in the table nics which has NULL in ip (cf. row
with id 30 in the result of the select statement below).

What can I do to quickly solve this problem? Any pointers or suggestions
are appreciated as the system is currently unusable.

Thank you for your help


Kambiz


management-server.log:

2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking VirtualRouter to prepare for 
Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking Ovs to prepare for 
Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for 
Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for 
Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
2014-03-18 10:03:27,151 WARN  [network.element.VpcVirtualRouterElement] 
(Job-Executor-1:job-176) Network Ntwk[205|Guest|8] is not associated with any 
VPC
2014-03-18 10:03:27,151 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking NiciraNvp to prepare for 
Nic[29-19-30e229ba-21bd-4ab5-8570-9f495bce5019-10.124.99.1]
2014-03-18 10:03:27,151 DEBUG [network.element.NiciraNvpElement] 
(Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service 
Connectivity on network net1
2014-03-18 10:03:27,153 DEBUG [cloud.network.NetworkModelImpl] 
(Job-Executor-1:job-176) Service SecurityGroup is not supported in the network 
id=205
2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Lock is acquired for network id 202 as a part of 
network implement
2014-03-18 10:03:27,156 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Network id=202 is already implemented
2014-03-18 10:03:27,157 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Lock is released for network id 202 as a part of 
network implement
2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking VirtualRouter to prepare for 
Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking Ovs to prepare for 
Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking SecurityGroupProvider to prepare for 
Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
2014-03-18 10:03:27,187 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking VpcVirtualRouter to prepare for 
Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
2014-03-18 10:03:27,187 WARN  [network.element.VpcVirtualRouterElement] 
(Job-Executor-1:job-176) Network Ntwk[202|Control|3] is not associated with any 
VPC
2014-03-18 10:03:27,188 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-1:job-176) Asking NiciraNvp to prepare for 
Nic[30-19-30e229ba-21bd-4ab5-8570-9f495bce5019-169.254.3.99]
2014-03-18 10:03:27,188 DEBUG [network.element.NiciraNvpElement] 
(Job-Executor-1:job-176) Checking if NiciraNvpElement can handle service 
Connectivity on network null
2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl] 
(Job-Executor-1:job-176) Checking if we need to prepare 1 volumes for 
VM[DomainRouter|r-19-VM]
2014-03-18 10:03:27,190 DEBUG [cloud.storage.StorageManagerImpl] 
(Job-Executor-1:job-176) No need to recreate the volume: Vol[24|vm=19|ROOT], 
since it already has a pool assigned: 200, adding disk to VM
2014-03-18 10:03:27,224 DEBUG 
[network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-1:job-176) 
Boot Args for VM[DomainRouter|r-19-VM]:  template=domP name=r-19-VM 
eth2ip=10.193.17.190 eth2mask=255.255.255.0 gateway=10.193.17.1 
eth0ip=10.124.99.1 eth0mask=255.255.255.0 domain=cs6cloud.internal 
dhcprange=10.124.99.1 eth0ip=169.254.3.99 eth0mask=255.255.0.0 type=router 
disable_rp_filter=true dns1=10.193.17.1
2014-03-18 10:03:27,343 DEBUG 
[network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-1:job-176) 
Found 8 ip(s) to apply as a part of domR VM[DomainRouter|r-19-VM] start.
2014-03-18 10:03:27,415 DEBUG 
[network.router.VirtualNetworkApplianceM