Re: Question about Basic and Advanced Network

2019-08-12 Thread Jon Marshall
1) Netscaler provides local balancing functions rather than IPs.  For both 
basic and advanced networking you can either assign IPs statically to your VMs 
or you can use DHCP on your virtual routers to provide the IPs.

Public vs private IPs , doesn't really make any difference.

2) You can setup your Cloudstack using Adavnced network with Security Groups 
which is pretty much basic networking but with multiple subnets/vlans.

However if you use Advanced networking (without Security Groups) then no you 
cannot have isolated networks using SG but Advanced networking does support 
firewalling to isolated and VPC networks.

Jon


From: Francisco Germano 
Sent: 11 August 2019 22:51
To: 'users@cloudstack.apache.org' 
Subject: Question about Basic and Advanced Network

Greetings,

My team and I are working for open-source software and our next step are to 
implement an integration with the Cloudstack. We are implementing the network 
context and we have some doubts. Could you help us?

About Basic Network:
1 - A Citrix NetScaler provides public IP, right? Is it possible to control the 
Public IPs using just the CloudStack API? If yes, how?

About Advanced Network:
2 - Is it possible to use Security Group in the Isolated Network?

Best regards,
Francisco Germano


Re: Best use of server NICs.

2019-03-19 Thread Jon Marshall
Hi Dag

Many thanks for that,  option 1 it is then 

Jon


From: Dag Sonstebo 
Sent: 19 March 2019 09:29
To: users@cloudstack.apache.org
Subject: Re: Best use of server NICs.

Hi Jon,

In short "it depends...". Going by your hardware spec (only 1GBps NICs) I will 
assume (please correct me if wrong) that this is a smaller environment / lab / 
proof of concept? If so you won't see much of a benefit from option 2 since you 
simply won't have that much secondary storage traffic going through to cause 
noisy neighbour problems - hence my advice would be option 1) to give you 
redundancy.

Option 2) would be at risk of no redundancy for management and storage (bad), 
and would only make sense if you had guest VMs with high network IO. Even if 
you had a lot of secondary storage traffic I would advise against this. If you 
absolutely wanted to run secondary storage traffic separately I would run a 
bond for management and primary storage and a NIC each for secondary and guest 
traffic - but I would still say 1) is the better option.

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue


On 18/03/2019, 19:02, "Jon Marshall"  wrote:


I have  4 1Gbps NICs in each compute node and was considering 2 deployment 
options (Advanced network with Security Groups) -

1)  2 NICs bonded together and used for all storage and management and the 
other 2 NIC bonded together and used for guest VM traffic.

2)  1 NIC or management and primary storage, 1 NIC for secondary storage 
and the remaining 2 NICs bonded together for guest VM traffic.

Option 1 would give more redundancy but is there any benefit to separating 
storage that would outweigh this ?

Or is there a better option I have overlooked.

Any advice much appreciated





dag.sonst...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue





Best use of server NICs.

2019-03-18 Thread Jon Marshall

I have  4 1Gbps NICs in each compute node and was considering 2 deployment 
options (Advanced network with Security Groups) -

1)  2 NICs bonded together and used for all storage and management and the 
other 2 NIC bonded together and used for guest VM traffic.

2)  1 NIC or management and primary storage, 1 NIC for secondary storage and 
the remaining 2 NICs bonded together for guest VM traffic.

Option 1 would give more redundancy but is there any benefit to separating 
storage that would outweigh this ?

Or is there a better option I have overlooked.

Any advice much appreciated




KVM Host HA and power lost to host.

2019-03-04 Thread Jon Marshall

I have KVM Host HA enabled and power is lost to one of the compute nodes.   The 
host has it's state marked as alert and the HA states go through degraded to 
suspect to Fencing.

The problem is that the host is never fenced because there is no power to it so 
none of the OOBM commands work which means the VMs are never migrated.

 From the management server logs -

2019-03-04 11:02:48,288 WARN  [o.a.c.h.t.BaseHATask] (pool-6-thread-9:null) 
(logid:d0a19f20) Exception occurred while running FenceTask on a resource: 
org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not 
configured or enabled for this host dcp-cscn2.local
org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not 
configured or enabled for this host dcp-cscn2.local
at 
org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:99)
at 
org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42)
at 
org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42)
at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86)
at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band 
Management action (OFF) on host (b53122bc-1446-4ffd-a179-e363ad0d541f) failed 
with error: Get Auth Capabilities error
Error issuing Get Channel Authentication Capabilities request
Error: Unable to establish IPMI v2 / RMCP+ session

at 
org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423)
at sun.reflect.GeneratedMethodAccessor225.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
... 21 more


which begs the question how is this meant to work for a host whose power has 
failed.


If I turn off KVM Host HA and change the ping interval to 30 and ping timeout 
to 2 then the VMs failover to another host within 5 mins.

I understand what Host HA is meant for but it seems for a failed host in terms 
of power it doesn't work.

Jon


Re: Not able to access the vm from outside network

2019-03-01 Thread Jon Marshall
ip6tables -A i-2-40-VM -j DROP
2019-03-01 10:46:23,739 - Programmed default rules for vm i-2-40-VM
2019-03-01 10:46:24,255 - Executing command: add_network_rules
2019-03-01 10:46:24,259 - programming network rules for IP:
172.20.109.167 vmname=i-2-40-VM
2019-03-01 10:46:24,260 - iptables -F i-2-40-VM
2019-03-01 10:46:24,273 - ip6tables -F i-2-40-VM
2019-03-01 10:46:24,287 - iptables -F i-2-40-VM-eg
2019-03-01 10:46:24,298 - ip6tables -F i-2-40-VM-eg
2019-03-01 10:46:24,312 - iptables -I i-2-40-VM -p tcp -m tcp --dport
0:12000 -m state --state NEW -s 0.0.0.0/24 -j ACCEPT
2019-03-01 10:46:24,325 - iptables -I i-2-40-VM-eg -p tcp -m tcp --dport
0:12000 -m state --state NEW -d 0.0.0.0/24 -j RETURN
2019-03-01 10:46:24,339 - iptables -A i-2-40-VM-eg -j DROP
2019-03-01 10:46:24,351 - ip6tables -A i-2-40-VM-eg -j RETURN
2019-03-01 10:46:24,364 - iptables -A i-2-40-VM -j DROP
2019-03-01 10:46:24,376 - ip6tables -A i-2-40-VM -j DROP
2019-03-01 10:46:24,389 - Writing log to /var/run/cloud/i-2-40-VM.log
2019-03-01 10:46:31,575 - Executing command: get_rule_logs_for_vms
2019-03-01 10:47:31,513 - Executing command: get_rule_logs_for_vms
2019-03-01 10:48:31,515 - Executing command: get_rule_logs_for_vms
2019-03-01 10:49:31,517 - Executing command: get_rule_logs_for_vms
2019-03-01 10:50:31,520 - Executing command: get_rule_logs_for_vms
2019-03-01 10:51:31,522 - Executing command: get_rule_logs_for_vms
2019-03-01 10:52:31,527 - Executing command: get_rule_logs_for_vms
2019-03-01 10:53:31,528 - Executing command: get_rule_logs_for_vms
2019-03-01 10:54:31,529 - Executing command: get_rule_logs_for_vms
2019-03-01 10:55:31,581 - Executing command: get_rule_logs_for_vms
Regards
Soundar

On Fri, Mar 1, 2019 at 1:12 AM Jon Marshall  wrote:

> Is this after you migrated the VM to another compute node ?
>
> It looks suspiciously like the issue I saw ie. I was using advanced
> networking with security groups and the security policy for the VM was not
> migrated to the new compute node.
>
> There is a bug filed for it and a workaround -
>
> https://github.com/apache/cloudstack/issues/3088
>
> the fix is in the comments but basically you need to need to edit this
> file - "/usr/share/cloudstack-common/scripts/vm/network/security_group.py"
>
> and change line 490 from -
>
>  if ips[0] == "0":
>
> to -
>
> if len(ips) == 0 or ips[0] == "0":
>
> and that should fix it.
>
> The will be included in CS v4.11.3
>
> Jon
>
>
> 
> From: soundar rajan 
> Sent: 28 February 2019 13:52
> To: d...@cloudstack.apache.org; users@cloudstack.apache.org
> Subject: Not able to access the vm from outside network
>
> Hi,
>
> VM outbound is working fine. Inbound is not  not able to access from
> outside network
>
> Error Log
> 2019-02-28 18:12:25,112 - Failed to network rule !
> Traceback (most recent call last):
>   File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py",
> line 995, in add_network_rules
> default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname,
> sec_ips)
>   File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py",
> line 490, in default_network_rules
> if ips[0] == "0":
> IndexError: list index out of range
> 2019-02-28 18:13:16,635 - Executing command: cleanup_rules
> 2019-02-28 18:13:16,645 -  Vms on the host : ['i-2-40-VM', 'i-2-90-VM',
> 'i-2-112-VM']
> 2019-02-28 18:13:16,645 - iptables-save | grep -P '^:(?!.*-(def|eg))' | awk
> '{sub(/^:/, "", $1) ; print $1}' | sort | uniq
> 2019-02-28 18:13:16,671 -  iptables chains in the host :['BF-cloudbr0',
> 'BF-cloudbr0-IN', 'BF-cloudbr0-OUT', 'FORWARD', 'i-2-112-VM', 'i-2-40-VM',
> 'i-2-90-VM', 'INPUT', 'OUTPUT', 'POSTROUTING', 'PREROUTING', '']
> 2019-02-28 18:13:16,672 - grep -E '^ebtable_' /proc/modules | cut -f1 -d' '
> | sed s/ebtable_//
> 2019-02-28 18:13:16,693 - ebtables -t nat -L | awk '/chain:/ {
> gsub(/(^.*chain: |-(in|out|ips).*)/, ""); print $1}' | sort | uniq
> 2019-02-28 18:13:16,716 - ebtables -t filter -L | awk '/chain:/ {
> gsub(/(^.*chain: |-(in|out|ips).*)/, ""); print $1}' | sort | uniq
> 2019-02-28 18:13:16,738 -  ebtables chains in the host: ['FORWARD,',
> 'INPUT,', 'OUTPUT,', '']
> 2019-02-28 18:13:16,739 - Cleaned up rules for 0 chains
> 2019-02-28 18:13:23,959 - Executing command: get_rule_logs_for_vms
>
> It happens to particular vm
>
> Please help..
>


Re: Not able to access the vm from outside network

2019-02-28 Thread Jon Marshall
Is this after you migrated the VM to another compute node ?

It looks suspiciously like the issue I saw ie. I was using advanced networking 
with security groups and the security policy for the VM was not migrated to the 
new compute node.

There is a bug filed for it and a workaround -

https://github.com/apache/cloudstack/issues/3088

the fix is in the comments but basically you need to need to edit this file - 
"/usr/share/cloudstack-common/scripts/vm/network/security_group.py"

and change line 490 from -

 if ips[0] == "0":

to -

if len(ips) == 0 or ips[0] == "0":

and that should fix it.

The will be included in CS v4.11.3

Jon



From: soundar rajan 
Sent: 28 February 2019 13:52
To: d...@cloudstack.apache.org; users@cloudstack.apache.org
Subject: Not able to access the vm from outside network

Hi,

VM outbound is working fine. Inbound is not  not able to access from
outside network

Error Log
2019-02-28 18:12:25,112 - Failed to network rule !
Traceback (most recent call last):
  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py",
line 995, in add_network_rules
default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname,
sec_ips)
  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py",
line 490, in default_network_rules
if ips[0] == "0":
IndexError: list index out of range
2019-02-28 18:13:16,635 - Executing command: cleanup_rules
2019-02-28 18:13:16,645 -  Vms on the host : ['i-2-40-VM', 'i-2-90-VM',
'i-2-112-VM']
2019-02-28 18:13:16,645 - iptables-save | grep -P '^:(?!.*-(def|eg))' | awk
'{sub(/^:/, "", $1) ; print $1}' | sort | uniq
2019-02-28 18:13:16,671 -  iptables chains in the host :['BF-cloudbr0',
'BF-cloudbr0-IN', 'BF-cloudbr0-OUT', 'FORWARD', 'i-2-112-VM', 'i-2-40-VM',
'i-2-90-VM', 'INPUT', 'OUTPUT', 'POSTROUTING', 'PREROUTING', '']
2019-02-28 18:13:16,672 - grep -E '^ebtable_' /proc/modules | cut -f1 -d' '
| sed s/ebtable_//
2019-02-28 18:13:16,693 - ebtables -t nat -L | awk '/chain:/ {
gsub(/(^.*chain: |-(in|out|ips).*)/, ""); print $1}' | sort | uniq
2019-02-28 18:13:16,716 - ebtables -t filter -L | awk '/chain:/ {
gsub(/(^.*chain: |-(in|out|ips).*)/, ""); print $1}' | sort | uniq
2019-02-28 18:13:16,738 -  ebtables chains in the host: ['FORWARD,',
'INPUT,', 'OUTPUT,', '']
2019-02-28 18:13:16,739 - Cleaned up rules for 0 chains
2019-02-28 18:13:23,959 - Executing command: get_rule_logs_for_vms

It happens to particular vm

Please help..


Re: Possible bug fix - sanity check please

2019-01-25 Thread Jon Marshall
Hi Yiping

It is this related  bug -


https://github.com/apache/cloudstack/issues/3088

Have a look at the comments but to summarise you need to replace this line 
(line 490) -



if ips[0] == "0":

with -


if len(ips) == 0 or ips[0] == "0":

I have tested it and it fixes the problem I was seeing.

The fix will be included in 4.11.3 apparently.

Jon


From: Yiping Zhang 
Sent: 24 January 2019 23:18
To: users@cloudstack.apache.org
Subject: Re: Possible bug fix - sanity check please

Hi, Jon:

Would you please describe this bug a little more? How do I reproduce it?  Is 
there a Jira or Github issue number for it?

It sounds like a bug in 4.11.2.0 affecting VM live migration.  I am in the 
middle of upgrading to 4.11.2.0, and on my lab system I see that the line 488 
of file /usr/share/cloudstack-common/scripts/vm/network/security_group.py does 
have a ";" instead of a ":".

Thanks,

Yiping


On 1/24/19, 12:54 AM, "Jon Marshall"  wrote:

Please ignore, it has already been fixed but it is not included in the 
4.11.2 release (due in the 4.11.3 one).

    ____
From: Jon Marshall 
Sent: 23 January 2019 15:30
To: users@cloudstack.apache.org
Subject: Possible bug fix - sanity check please

The following issue was seen using  CS 4.11.2 in advanced mode with 
security group isolation.

VM (internal name i-2-29-VM)  - is created and works fine with default 
security group allowing inbound SSH and ICMP echo request.

Migrate the VM to another of the compute nodes and the VM migrates and from 
the proxy console the VM can connect out but the default security group inbound 
is not copied across the compute node.   The 
/var/log/cloudstack/agent/security_group.log shows on the compute node the VM 
has migrated to -

2019-01-18 14:54:25,724 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,724 - ebtables -t nat -F i-2-29-VM-out
2019-01-18 14:54:25,730 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,730 - ebtables -t nat -F i-2-29-VM-in-ips
2019-01-18 14:54:25,735 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,735 - ebtables -t nat -F i-2-29-VM-out-ips
2019-01-18 14:54:25,740 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,741 - iptables -N i-2-29-VM
2019-01-18 14:54:25,745 - ip6tables -N i-2-29-VM
2019-01-18 14:54:25,749 - iptables -N i-2-29-VM-eg
2019-01-18 14:54:25,753 - ip6tables -N i-2-29-VM-eg
2019-01-18 14:54:25,758 - iptables -N i-2-29-def
2019-01-18 14:54:25,763 - ip6tables -N i-2-29-def
2019-01-18 14:54:25,767 - Creating ipset chain  i-2-29-VM
2019-01-18 14:54:25,768 - ipset -F i-2-29-VM
2019-01-18 14:54:25,772 - ipset chain not exists creating i-2-29-VM
2019-01-18 14:54:25,772 - ipset -N i-2-29-VM iphash family inet
2019-01-18 14:54:25,777 - vm ip 172.30.6.62019-01-18 14:54:25,777 - ipset 
-A i-2-29-VM 172.30.6.60
2019-01-18 14:54:25,782 - Failed to network rule !
Traceback (most recent call last):
  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", 
line 995, in add_network_rules
default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname, 
sec_ips)
  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", 
line 490, in default_network_rules
if ips[0] == "0":
IndexError: list index out of range

 Added a few lines to debug the script security_group.py and it would 
appear this line (line 487) is the culprit -

ips = sec_ips.split(';')

as far as I can tell the separator should be a colon (':') and not a semi 
colon or at least on my setup.  Once changed to -

ips = sec_ips.split(':')

the iptables rules were updated correctly on the host the VM was migrated 
to.

I don't know if this is the right change to make as the script is over a 
1000 lines long and imports other modules so woudl appreciate any input as this 
seems to be a key function of Advanced with security groups.

Thanks

Jon






Re: Possible bug fix - sanity check please

2019-01-24 Thread Jon Marshall
Please ignore, it has already been fixed but it is not included in the 4.11.2 
release (due in the 4.11.3 one).


From: Jon Marshall 
Sent: 23 January 2019 15:30
To: users@cloudstack.apache.org
Subject: Possible bug fix - sanity check please

The following issue was seen using  CS 4.11.2 in advanced mode with security 
group isolation.

VM (internal name i-2-29-VM)  - is created and works fine with default security 
group allowing inbound SSH and ICMP echo request.

Migrate the VM to another of the compute nodes and the VM migrates and from the 
proxy console the VM can connect out but the default security group inbound is 
not copied across the compute node.   The 
/var/log/cloudstack/agent/security_group.log shows on the compute node the VM 
has migrated to -

2019-01-18 14:54:25,724 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,724 - ebtables -t nat -F i-2-29-VM-out
2019-01-18 14:54:25,730 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,730 - ebtables -t nat -F i-2-29-VM-in-ips
2019-01-18 14:54:25,735 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,735 - ebtables -t nat -F i-2-29-VM-out-ips
2019-01-18 14:54:25,740 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,741 - iptables -N i-2-29-VM
2019-01-18 14:54:25,745 - ip6tables -N i-2-29-VM
2019-01-18 14:54:25,749 - iptables -N i-2-29-VM-eg
2019-01-18 14:54:25,753 - ip6tables -N i-2-29-VM-eg
2019-01-18 14:54:25,758 - iptables -N i-2-29-def
2019-01-18 14:54:25,763 - ip6tables -N i-2-29-def
2019-01-18 14:54:25,767 - Creating ipset chain  i-2-29-VM
2019-01-18 14:54:25,768 - ipset -F i-2-29-VM
2019-01-18 14:54:25,772 - ipset chain not exists creating i-2-29-VM
2019-01-18 14:54:25,772 - ipset -N i-2-29-VM iphash family inet
2019-01-18 14:54:25,777 - vm ip 172.30.6.62019-01-18 14:54:25,777 - ipset -A 
i-2-29-VM 172.30.6.60
2019-01-18 14:54:25,782 - Failed to network rule !
Traceback (most recent call last):
  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", 
line 995, in add_network_rules
default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname, 
sec_ips)
  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", 
line 490, in default_network_rules
if ips[0] == "0":
IndexError: list index out of range

 Added a few lines to debug the script security_group.py and it would appear 
this line (line 487) is the culprit -

ips = sec_ips.split(';')

as far as I can tell the separator should be a colon (':') and not a semi colon 
or at least on my setup.  Once changed to -

ips = sec_ips.split(':')

the iptables rules were updated correctly on the host the VM was migrated to.

I don't know if this is the right change to make as the script is over a 1000 
lines long and imports other modules so woudl appreciate any input as this 
seems to be a key function of Advanced with security groups.

Thanks

Jon




Possible bug fix - sanity check please

2019-01-23 Thread Jon Marshall
The following issue was seen using  CS 4.11.2 in advanced mode with security 
group isolation.

VM (internal name i-2-29-VM)  - is created and works fine with default security 
group allowing inbound SSH and ICMP echo request.

Migrate the VM to another of the compute nodes and the VM migrates and from the 
proxy console the VM can connect out but the default security group inbound is 
not copied across the compute node.   The 
/var/log/cloudstack/agent/security_group.log shows on the compute node the VM 
has migrated to -

2019-01-18 14:54:25,724 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,724 - ebtables -t nat -F i-2-29-VM-out
2019-01-18 14:54:25,730 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,730 - ebtables -t nat -F i-2-29-VM-in-ips
2019-01-18 14:54:25,735 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,735 - ebtables -t nat -F i-2-29-VM-out-ips
2019-01-18 14:54:25,740 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,741 - iptables -N i-2-29-VM
2019-01-18 14:54:25,745 - ip6tables -N i-2-29-VM
2019-01-18 14:54:25,749 - iptables -N i-2-29-VM-eg
2019-01-18 14:54:25,753 - ip6tables -N i-2-29-VM-eg
2019-01-18 14:54:25,758 - iptables -N i-2-29-def
2019-01-18 14:54:25,763 - ip6tables -N i-2-29-def
2019-01-18 14:54:25,767 - Creating ipset chain  i-2-29-VM
2019-01-18 14:54:25,768 - ipset -F i-2-29-VM
2019-01-18 14:54:25,772 - ipset chain not exists creating i-2-29-VM
2019-01-18 14:54:25,772 - ipset -N i-2-29-VM iphash family inet
2019-01-18 14:54:25,777 - vm ip 172.30.6.62019-01-18 14:54:25,777 - ipset -A 
i-2-29-VM 172.30.6.60
2019-01-18 14:54:25,782 - Failed to network rule !
Traceback (most recent call last):
  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", 
line 995, in add_network_rules
default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname, 
sec_ips)
  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", 
line 490, in default_network_rules
if ips[0] == "0":
IndexError: list index out of range

 Added a few lines to debug the script security_group.py and it would appear 
this line (line 487) is the culprit -

ips = sec_ips.split(';')

as far as I can tell the separator should be a colon (':') and not a semi colon 
or at least on my setup.  Once changed to -

ips = sec_ips.split(':')

the iptables rules were updated correctly on the host the VM was migrated to.

I don't know if this is the right change to make as the script is over a 1000 
lines long and imports other modules so woudl appreciate any input as this 
seems to be a key function of Advanced with security groups.

Thanks

Jon




Possible bug in migrating VMs with advanced using security groups ?

2019-01-18 Thread Jon Marshall
Don't know whether this is a bug or to do wit setup -

CS 4.11.2

1 x manager, 3 x compute nodes runnning Advanced with security groups.

VM (internal name i-2-29-VM)  - is created and works fine with default security 
group allowing inbound SSH and ICMP echo request.

Migrate the VM to another of the compute nodes and the VM migrate and from the 
proxy console the VM can connect out but the default security group inbound is 
not copied across the compute node.   The 
/var/log/cloudstack/agent/security_group.log shows on the compute node the VM 
has migrated to -

2019-01-18 14:54:25,724 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,724 - ebtables -t nat -F i-2-29-VM-out
2019-01-18 14:54:25,730 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,730 - ebtables -t nat -F i-2-29-VM-in-ips
2019-01-18 14:54:25,735 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,735 - ebtables -t nat -F i-2-29-VM-out-ips
2019-01-18 14:54:25,740 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,741 - iptables -N i-2-29-VM
2019-01-18 14:54:25,745 - ip6tables -N i-2-29-VM
2019-01-18 14:54:25,749 - iptables -N i-2-29-VM-eg
2019-01-18 14:54:25,753 - ip6tables -N i-2-29-VM-eg
2019-01-18 14:54:25,758 - iptables -N i-2-29-def
2019-01-18 14:54:25,763 - ip6tables -N i-2-29-def
2019-01-18 14:54:25,767 - Creating ipset chain  i-2-29-VM
2019-01-18 14:54:25,768 - ipset -F i-2-29-VM
2019-01-18 14:54:25,772 - ipset chain not exists creating i-2-29-VM
2019-01-18 14:54:25,772 - ipset -N i-2-29-VM iphash family inet
2019-01-18 14:54:25,777 - vm ip 172.30.6.60
2019-01-18 14:54:25,777 - ipset -A i-2-29-VM 172.30.6.60
2019-01-18 14:54:25,782 - Failed to network rule !
Traceback (most recent call last):
  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", 
line 995, in add_network_rules
default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname, 
sec_ips)
  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", 
line 490, in default_network_rules
if ips[0] == "0":
IndexError: list index out of range


Jon


Re: VR DHCP server does not lease secondary IPs to guests

2018-09-22 Thread Jon Marshall
If you allocate a secondary IP to a VM you don't want the VR to offer that IP 
to another VM otherwise you could end up with
two VMs trying to use the same IP.

If you remove the secondary IP from the VM then the VR can allocate that IP to 
another VM.


From: Fariborz Navidan 
Sent: 21 September 2018 19:03
To: users@cloudstack.apache.org
Subject: VR DHCP server does not lease secondary IPs to guests

Hello folks,

When I add secondary IPs to a VM, DHCP server on virtual router does not
offer those to dhcp client. Do I need to modify OS templates to do this? If
yes, how should I get secondary IPs when they are not available in the
dhclient's lease files?

Thanks for any advise!


Re: Basic vs advanced networking

2018-08-09 Thread Jon Marshall
Hi Dag


Makes a lot of sense, thanks for that.


Jon


From: Dag Sonstebo 
Sent: 09 August 2018 10:13
To: users@cloudstack.apache.org
Subject: Re: Basic vs advanced networking

Hi Jon,

In short you are right – advanced networking offers a lot more features, and 
the only benefit of basic networking is a simpler setup (no VRs) as well as to 
a certain degree more scalability since you can run relatively large L3 
networks (with the proviso that broadcast traffic may be a limiting factor). As 
security groups rely on access to underlying networking on the hypervisor they 
will also most likely never work on VMware due to the proprietary nature of 
ESXi.

If you look through the user@ / dev@ mailing list you’ll see we have started 
discussions around deprecating basic networks for advanced zone with security 
groups – since the latter offers the same networking functionality as basic 
(security groups, no VRs) but offers the scalability of running multiple of 
these basic type networks (a traditional basic zone can only run one network).

So all in all if you are looking at longer term strategy whilst wanting the 
simplicity of basic networking you should look at this option (looks like you 
might have played with this already).

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 09/08/2018, 07:54, "Jon Marshall"  wrote:

Having looked at both in a lab environment I am wondering what the 
advantages of running basic networking are.


Obviously with basic you can use security groups (although you can with 
advanced if using KVM) but apart from that advanced seems to offer all the 
features of basic plus a whole lot more.


The only downside I have found with advanced is that VRs seems to be the 
most "flaky" aspect of ACS and obviously you end up with a whole lot more of 
them.


Would be interested to hear opinions either way.


Thanks



dag.sonst...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
ShapeBlue are the largest independent integrator of CloudStack technologies 
globally and are specialists in the design and implementation of IaaS cloud 
infrastructures for both private and public cloud implementations.



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue





Basic vs advanced networking

2018-08-09 Thread Jon Marshall
Having looked at both in a lab environment I am wondering what the advantages 
of running basic networking are.


Obviously with basic you can use security groups (although you can with 
advanced if using KVM) but apart from that advanced seems to offer all the 
features of basic plus a whole lot more.


The only downside I have found with advanced is that VRs seems to be the most 
"flaky" aspect of ACS and obviously you end up with a whole lot more of them.


Would be interested to hear opinions either way.


Thanks


Tips for troubleshooting

2018-08-06 Thread Jon Marshall
I have a test setup for CS 4.11.1 advanced networking KVM on Centos 7.


One manager node and one compute node 2 NICs (1 management/storage), I a trunk 
link for VM traffic.


I create a guest network, an isolated network and a VPC with it's own isolated 
network so 3 VRs and each network has a VM created.


Every time I reboot both servers I get exactly the same results -


1) the VR for the guest network is up and the VM is up.

2) the VR for the isolated network is up but the VM is stopped

3) the VR for the VPC isolated network is stuck in starting and so are the VMs.


For 2) the solution is simply to start up the VMs


For 3) you cannot do anything until the VR goes into stopped mode which takes 
approx 10 mins. You can then destroy it and simply start one of the stopped VMs 
which recreates the VR.


I asked about the VPC VR problem before and was told to check the management 
server log and the agent log.  The agent log shows nothing. In the management 
server log I traced the entire ctx job(s) and I can see it has enough resource 
and the start job but then it just says it is pending then eventually reports 
the host as unreachable.


Not asking for someone to fix this for me but can someone tell me how to 
troubleshoot this because the logs are not showing much and  I can reproduce 
this every time I reboot.


Jon




Re: VPC virtual router will not start on reboot

2018-07-23 Thread Jon Marshall
Hi Dag


Sorry I am running 4.11.1 already.


I just created an isolated network with a VM on the same host (ID = 1) and it 
works fine so i'm not sure it's a host specific issue.


It seems to only come up with VRs for VPCs.


I'll keep digging :)



From: Dag Sonstebo 
Sent: 23 July 2018 09:19
To: users@cloudstack.apache.org
Subject: Re: VPC virtual router will not start on reboot

Hi Jon,

First of all I would advise you to upgrade to 4.11.1, it comes with a number of 
bug fixes.

Wrt the errors you are seeing they tend to be fairly clear – the KVM host with 
ID=1 in your DB is not checking in, or taking time checking in, and the 
management server can therefore not communicate with it. Check the startup of 
the agent works as expected, and also check the agent logs.

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 23/07/2018, 09:11, "Jon Marshall"  wrote:

Cloudstack 4.11.0 - KVM


Created on VPC with 1 isolated network as test with 2 instances and it 
works as expected.  When doing a reboot of all nodes (compute and management) 
when it comes back up the virtual router will not start. This happens each time 
I reboot.


I have gone through management server logs and it is not a resource issue 
as it reports CPU, memory etc. as okay.  It does report this -


2018-07-23 08:37:09,931 ERROR [c.c.v.VmWorkJobHandlerProxy] 
(Work-Job-Executor-21:ctx-1c696ea5 job-821/job-822 ctx-bef3996c) 
(logid:a19d2179) Invocation exception, caused by: 
com.cloud.exception.AgentUnavailableException: Resource [Host:1] is 
unreachable: Host 1: Unable to start instance due to Unable to start  
VM:390a0aad-9c13-4578-bbbf-4de1323b142e due to error in finalizeStart, not 
retrying

checking the host table in the database that same host is running the 2 
system VMs so not sure how it is unreachable ?


Could someone offer any tips/pointers on how to troubleshoot this ?


Jon



dag.sonst...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
ShapeBlue are the largest independent integrator of CloudStack technologies 
globally and are specialists in the design and implementation of IaaS cloud 
infrastructures for both private and public cloud implementations.



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue





VPC virtual router will not start on reboot

2018-07-23 Thread Jon Marshall
Cloudstack 4.11.0 - KVM


Created on VPC with 1 isolated network as test with 2 instances and it works as 
expected.  When doing a reboot of all nodes (compute and management) when it 
comes back up the virtual router will not start. This happens each time I 
reboot.


I have gone through management server logs and it is not a resource issue as it 
reports CPU, memory etc. as okay.  It does report this -


2018-07-23 08:37:09,931 ERROR [c.c.v.VmWorkJobHandlerProxy] 
(Work-Job-Executor-21:ctx-1c696ea5 job-821/job-822 ctx-bef3996c) 
(logid:a19d2179) Invocation exception, caused by: 
com.cloud.exception.AgentUnavailableException: Resource [Host:1] is 
unreachable: Host 1: Unable to start instance due to Unable to start  
VM:390a0aad-9c13-4578-bbbf-4de1323b142e due to error in finalizeStart, not 
retrying

checking the host table in the database that same host is running the 2 system 
VMs so not sure how it is unreachable ?


Could someone offer any tips/pointers on how to troubleshoot this ?


Jon


Re: VPC ACLs SRC and DST

2018-07-18 Thread Jon Marshall
Hi Andrija


Following on from that if you are using an isolated guest network and static IP 
for NAT to a VM private IP is there anyway in the IP address firewall 
configuration to deny certain traffic as well as permit traffic.


Jon



From: Andrija Panic 
Sent: 18 July 2018 16:17
To: users
Subject: Re: VPC ACLs SRC and DST

Hi Adam,

unless something has changed in most recent version (doubt that) - no, you
can only define one CIDR in each ACL rule, which, if creating
egress/outbound rule is considered as destination IP/CIDR to which you
alow/deny access from your VPC network, or if using ingress (inbound) rule,
then this CIDR represents the SOURCE from which access is allowed/denied to
your VPC network (whole VPC network in both cases - i.e.  it's not granular
on single IP/VM level - for this you need to use local firewall if really
needed)

Hope that answers your question.


Andrija

On Wed, 18 Jul 2018 at 17:07, Adam Witwicki  wrote:

> Hello
>
> Is there a way we can add the DST IP to the ACL lists in a VPC as well as
> the SRC IP (outbound)
>
> Thanks
>
> Adam
>
>
>
> Disclaimer Notice:
> This email has been sent by Oakford Technology Limited, while we have
> checked this e-mail and any attachments for viruses, we can not guarantee
> that they are virus-free. You must therefore take full responsibility for
> virus checking.
> This message and any attachments are confidential and should only be read
> by those to whom they are addressed. If you are not the intended recipient,
> please contact us, delete the message from your computer and destroy any
> copies. Any distribution or copying without our prior permission is
> prohibited.
> Internet communications are not always secure and therefore Oakford
> Technology Limited does not accept legal responsibility for this message.
> The recipient is responsible for verifying its authenticity before acting
> on the contents. Any views or opinions presented are solely those of the
> author and do not necessarily represent those of Oakford Technology Limited.
> Registered address: Oakford Technology Limited, 10 Prince Maurice Court,
> Devizes, Wiltshire. SN10 2RT.
> Registered in England and Wales No. 5971519
>
>

--

Andrija Panić


Re: VPC vitual router stuck in starting

2018-07-18 Thread Jon Marshall
The virtual router for the VPC finally went to stopped and I did a restart VPC 
and did a clean up and the VR restarted. I could then restart the VMs.





From: Jon Marshall 
Sent: 17 July 2018 13:46
To: users@cloudstack.apache.org
Subject: RE: VPC vitual router stuck in starting

Hi Jon,

It is possible to connect directly to the VR via console KVM ? (virsh console 
r-XXX-VM)
If yes, please check cloud.log, State "starting"  from CS doesn't mean it's not 
okay from KVM

The cloud-agent log on KVM host could be useful as well.


Best regards,
N.B


-Message d'origine-----
De : Jon Marshall [mailto:jms@hotmail.co.uk]
Envoyé : mardi 17 juillet 2018 12:28
À : users@cloudstack.apache.org
Objet : VPC vitual router stuck in starting

Testing with advanced networking v4.11 using KVM.


I setup some isolated networks (2) and then a VPC which all worked fine. I then 
rebooted compute nodes (x3) and manager and when it all came back the VPC 
virtual router is stuck in starting as are the VMs in the VPC.


I have checked the management server logs and I see a lot of -


com.cloud.utils.exception.ExecutionException: Unable to start  
VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not 
retrying
com.cloud.exception.AgentUnavailableException: Resource [Host:4] is 
unreachable: Host 4: Unable to start instance due to Unable to start  
VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not 
retrying Caused by: com.cloud.utils.exception.ExecutionException: Unable to 
start  VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, 
not retrying
com.cloud.utils.exception.ExecutionException: Unable to start  
VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not 
retrying

it says Host 4 is not reachable but I have another virtual router for one of 
the isolated networks and some guest VMs running on the same host.


1) Does anyone have any suggestions as to how to troubleshoot this beyond 
looking through the logs ?


2) how can I stop the vritual router, destroy it and recreate it as while it is 
starting you cannot do anything with it ?


thanks


Re: VPC vitual router stuck in starting

2018-07-18 Thread Jon Marshall
Hi Nicolas


Sorry to have to ask but using "virsh console ..." command what is the 
username/password I should enter ?


I can never seem to find answers to these sort of questions no matter how hard 
I look 


By the way I can reproduce this problem ie. I deleted the VPC, created a new 
one with some isolated networks and VMs, rebooted everything and same issue - 
VR for VPC stuck in starting.


Jon



From: Nicolas Bouige 
Sent: 17 July 2018 13:46
To: users@cloudstack.apache.org
Subject: RE: VPC vitual router stuck in starting

Hi Jon,

It is possible to connect directly to the VR via console KVM ? (virsh console 
r-XXX-VM)
If yes, please check cloud.log, State "starting"  from CS doesn't mean it's not 
okay from KVM

The cloud-agent log on KVM host could be useful as well.


Best regards,
N.B


-Message d'origine-----
De : Jon Marshall [mailto:jms@hotmail.co.uk]
Envoyé : mardi 17 juillet 2018 12:28
À : users@cloudstack.apache.org
Objet : VPC vitual router stuck in starting

Testing with advanced networking v4.11 using KVM.


I setup some isolated networks (2) and then a VPC which all worked fine. I then 
rebooted compute nodes (x3) and manager and when it all came back the VPC 
virtual router is stuck in starting as are the VMs in the VPC.


I have checked the management server logs and I see a lot of -


com.cloud.utils.exception.ExecutionException: Unable to start  
VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not 
retrying
com.cloud.exception.AgentUnavailableException: Resource [Host:4] is 
unreachable: Host 4: Unable to start instance due to Unable to start  
VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not 
retrying Caused by: com.cloud.utils.exception.ExecutionException: Unable to 
start  VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, 
not retrying
com.cloud.utils.exception.ExecutionException: Unable to start  
VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not 
retrying

it says Host 4 is not reachable but I have another virtual router for one of 
the isolated networks and some guest VMs running on the same host.


1) Does anyone have any suggestions as to how to troubleshoot this beyond 
looking through the logs ?


2) how can I stop the vritual router, destroy it and recreate it as while it is 
starting you cannot do anything with it ?


thanks


VPC vitual router stuck in starting

2018-07-17 Thread Jon Marshall
Testing with advanced networking v4.11 using KVM.


I setup some isolated networks (2) and then a VPC which all worked fine. I then 
rebooted compute nodes (x3) and manager and when it all came back the VPC 
virtual router is stuck in starting as are the VMs in the VPC.


I have checked the management server logs and I see a lot of -


com.cloud.utils.exception.ExecutionException: Unable to start  
VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not 
retrying
com.cloud.exception.AgentUnavailableException: Resource [Host:4] is 
unreachable: Host 4: Unable to start instance due to Unable to start  
VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not 
retrying
Caused by: com.cloud.utils.exception.ExecutionException: Unable to start  
VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not 
retrying
com.cloud.utils.exception.ExecutionException: Unable to start  
VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not 
retrying

it says Host 4 is not reachable but I have another virtual router for one of 
the isolated networks and some guest VMs running on the same host.


1) Does anyone have any suggestions as to how to troubleshoot this beyond 
looking through the logs ?


2) how can I stop the vritual router, destroy it and recreate it as while it is 
starting you cannot do anything with it ?


thanks


Re: Adding secondary IP to VM

2018-07-11 Thread Jon Marshall
Did a bit of digging in database and there is a table called 
"nic_secondary_ips"  -



mysql> select * from nic_secondary_ips;
++--+--+---+-+-++-++---+
| id | uuid | vmId | nicId | ip4_address | 
ip6_address | network_id | created | account_id | domain_id |
++--+--+---+-+-++-++---+
|  2 | 57921029-893a-4400-b6ac-50d4fd006b74 |9 |15 | 172.30.4.80 | NULL 
   |204 | 2018-07-11 09:23:06 |  2 | 1 |
++--+--+---+-+-++-++---+
1 row in set (0.00 sec)

mysql>

so that's where its' stored.

Jon


From: Andrija Panic 
Sent: 11 July 2018 10:40
To: users
Subject: Re: Adding secondary IP to VM

ACS doesn't handle this in any way (except that it might reserve the IP, so
it's not possible to add same IP to another VM/nic in same network).

You need to manually configure secondary IP on the VM - this is at least in
4.8 release, and per my experience so far.

Cheers.

On Wed, 11 Jul 2018 at 11:23, Jon Marshall  wrote:

> I am trying to work out how CS handles additional IPs assigned to a VM.
>
>
> So using DHCP for the VMs if I log onto the virtual router in the
> "dhcphosts.txt" can see the VM maping to it's IP.
>
>
> If I then acquire a secondary IP for the VM a couple of questions -
>
>
> 1) where does the virtual router store the information because it is not
> in the DHCP file which makes sense but it must record it somewhere because
> it won't hand out that same IP to another VM (I tested it). Is it in the
> DBase somewhere
>
>
> 2) How do others handle multiple IPs on a VM ie. do you DHCP for the main
> interface and then configure static IPs for the sub interfaces or do you
> turn off DHCP altogether ?
>
>
> Many thanks
>
>
> Jon
>


--

Andrija Panić


Adding secondary IP to VM

2018-07-11 Thread Jon Marshall
I am trying to work out how CS handles additional IPs assigned to a VM.


So using DHCP for the VMs if I log onto the virtual router in the 
"dhcphosts.txt" can see the VM maping to it's IP.


If I then acquire a secondary IP for the VM a couple of questions -


1) where does the virtual router store the information because it is not in the 
DHCP file which makes sense but it must record it somewhere because it won't 
hand out that same IP to another VM (I tested it). Is it in the DBase somewhere


2) How do others handle multiple IPs on a VM ie. do you DHCP for the main 
interface and then configure static IPs for the sub interfaces or do you turn 
off DHCP altogether ?


Many thanks


Jon


Re: Isolated network and ingress rules

2018-07-06 Thread Jon Marshall
Hi Dag


Many thanks


Jon



From: Dag Sonstebo 
Sent: 06 July 2018 13:01
To: users@cloudstack.apache.org
Subject: Re: Isolated network and ingress rules

Hi Jon,

For normal isolated networks the ingress rules are on the firewall 
configuration option under each individual public IP address – as oppose to 
egress rules which apply to the whole network.

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 06/07/2018, 12:17, "Jon Marshall"  wrote:

Quick update re question 2) - where


I created a VPC and added a static NAT and it worked as expected. I think 
this may well be because with VPCs you can configure both ingress and egress 
rules whereas with a guest isolated network I don't seem to have the ingress 
option.





    From: Jon Marshall 
Sent: 06 July 2018 09:26
To: users@cloudstack.apache.org
Subject: Isolated network and ingress rules

Have setup advanced network 4.11 KVM and it seems to be a lot more 
intuitive than basic networking (at least to me )


Just a couple of quick questions -


1) when I add a new isolated network with source NAT  through the UI no 
matter what I enter in the Guest gateway and Guest netmask boxes it just uses 
the initial CIDR block I specified when building the zone. And it reuses this 
for every new isolated network.


Is this normal behaviour ?


2) I tried to add a static NAT for one of the VMs in an isolated network. I 
know the mapping works because a "curl icanhazip.com" returns the static IP 
rather than the one used by all the other VMs but I cannot connect to the 
statically mapped VM from outside.


When I go to the Network details in the UI I have egress rules I can edit 
but no ingress rules tab.


Again is this to be expected and if it is any pointers on how to get it 
working.


Thanks



dag.sonst...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
ShapeBlue are the largest independent integrator of CloudStack technologies 
globally and are specialists in the design and implementation of IaaS cloud 
infrastructures for both private and public cloud implementations.



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue





Re: Isolated network and ingress rules

2018-07-06 Thread Jon Marshall
Quick update re question 2) - where


I created a VPC and added a static NAT and it worked as expected. I think this 
may well be because with VPCs you can configure both ingress and egress rules 
whereas with a guest isolated network I don't seem to have the ingress option.





From: Jon Marshall 
Sent: 06 July 2018 09:26
To: users@cloudstack.apache.org
Subject: Isolated network and ingress rules

Have setup advanced network 4.11 KVM and it seems to be a lot more intuitive 
than basic networking (at least to me )


Just a couple of quick questions -


1) when I add a new isolated network with source NAT  through the UI no matter 
what I enter in the Guest gateway and Guest netmask boxes it just uses the 
initial CIDR block I specified when building the zone. And it reuses this for 
every new isolated network.


Is this normal behaviour ?


2) I tried to add a static NAT for one of the VMs in an isolated network. I 
know the mapping works because a "curl icanhazip.com" returns the static IP 
rather than the one used by all the other VMs but I cannot connect to the 
statically mapped VM from outside.


When I go to the Network details in the UI I have egress rules I can edit but 
no ingress rules tab.


Again is this to be expected and if it is any pointers on how to get it working.


Thanks


Isolated network and ingress rules

2018-07-06 Thread Jon Marshall
Have setup advanced network 4.11 KVM and it seems to be a lot more intuitive 
than basic networking (at least to me )


Just a couple of quick questions -


1) when I add a new isolated network with source NAT  through the UI no matter 
what I enter in the Guest gateway and Guest netmask boxes it just uses the 
initial CIDR block I specified when building the zone. And it reuses this for 
every new isolated network.


Is this normal behaviour ?


2) I tried to add a static NAT for one of the VMs in an isolated network. I 
know the mapping works because a "curl icanhazip.com" returns the static IP 
rather than the one used by all the other VMs but I cannot connect to the 
statically mapped VM from outside.


When I go to the Network details in the UI I have egress rules I can edit but 
no ingress rules tab.


Again is this to be expected and if it is any pointers on how to get it working.


Thanks


Re: Advanced networking - physical NICs.

2018-07-03 Thread Jon Marshall
Chris


Many thanks for that.


Jon



From: Christoffer Pedersen 
Sent: 03 July 2018 12:21
To: users@cloudstack.apache.org
Subject: Re: Advanced networking - physical NICs.

Hi Jon,

I would suppose that several people/providers run guest and public networks 
together. I was also confused in the start about the cloudstack networking.

1. I guess you can, the traffic will be separated by VLAN’s.
2. When defining a public range, in my experience you have to assign a VLAN to 
that range. Then just put in the VLAN ID where your respecitve public range 
resides.
3. You can allocate vlan ranges for guest networks. You can for example use 
500-549 as a range. Just bind that to your cloudbr. Cloudstack will manage the 
sub-bridge for the vlan.
4. You would have a trunk running from the switch to your network port on the 
server. You would add that port to your cloudbr1 like:

auto eth1
iface eth1 inet manual

auto cloudbr1
iface cloudbr1 inet manual
  bridge_ports eth1

Please correct me if I’m wrong, i al using openvswitch so my config is 
different. Cloud will handle the tagging if you specify a vlan for your public 
or guest networks.

Chris

Sent from my iPhone

> On 3. Jul 2018, at 12:55, Jon Marshall  wrote:
>
> I come from a Cisco background so I understand vlans, tagging and how to 
> configure switches for trunks and I also understand how to configure tagging 
> on CentOS.
>
>
> The bit that is just not clicking with me is how to configure the NIC with CS 
> using KVM and advanced networking.
>
>
> The management/storage NIC is easy as I just assign an IP directly the bridge 
> configuration file (cloudbr0) as there is no vlan tagging here.
>
>
> The second NIC I want to run guest and public traffic across and I am using 
> another bridge - cloudbr1.
>
>
> Questions -
>
>
> 1) Is it okay to run guest and public traffic on the same NIC ?
>
>
> 2) do the public IPs only live on the VR ie. do I need a cloudbr1. 
> for the public IP range ?
>
>
> 3) whenever I add a new guest network once setup do I first need to setup the 
> cloudbr1. for that guest network or does cloudstack do this 
> automatically ?
>
>
> 4) Assuming it is okay to run guest and public on same NIC what would the 
> initial configuration of cloudbr1 look like ?
>
>
>
> Apologies for all the questions but I am just getting completely stuck on this
>


Re: Advanced networking - physical NICs.

2018-07-03 Thread Jon Marshall
Paul


Many thanks, will give it a go.


Jon



From: Paul Angus 
Sent: 03 July 2018 12:13
To: users@cloudstack.apache.org
Subject: RE: Advanced networking - physical NICs.

Hi Jon,

1. Yes
2. You tell CloudStack what VLAN the public IPs are on, CloudStack will add the 
VLAN tags
3. CloudStack will do it automatically
4. 'something' like this:

Ifcfg-eth1

DEVICE=eth1
ONBOOT=yes
HOTPLUG=no
BOOTPROTO=none
TYPE=Ethernet
BRIDGE= cloudbr1
NM_CONTROLLED=no



Ifcfg-cloudbr1

DEVICE= cloudbr1
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
IPV6INIT=no
IPV6_AUTOCONF=no
STP=off

NOTE the public/guest interface (eth1 in this case) will then need to be 
connected to a trunk interface on your switch which allows all of the VLANs 
that you need for public and private networks



Kind regards,

Paul Angus

paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
ShapeBlue are the largest independent integrator of CloudStack technologies 
globally and are specialists in the design and implementation of IaaS cloud 
infrastructures for both private and public cloud implementations.



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-----
From: Jon Marshall 
Sent: 03 July 2018 11:55
To: users@cloudstack.apache.org
Subject: Advanced networking - physical NICs.

I come from a Cisco background so I understand vlans, tagging and how to 
configure switches for trunks and I also understand how to configure tagging on 
CentOS.


The bit that is just not clicking with me is how to configure the NIC with CS 
using KVM and advanced networking.


The management/storage NIC is easy as I just assign an IP directly the bridge 
configuration file (cloudbr0) as there is no vlan tagging here.


The second NIC I want to run guest and public traffic across and I am using 
another bridge - cloudbr1.


Questions -


1) Is it okay to run guest and public traffic on the same NIC ?


2) do the public IPs only live on the VR ie. do I need a cloudbr1. for 
the public IP range ?


3) whenever I add a new guest network once setup do I first need to setup the 
cloudbr1. for that guest network or does cloudstack do this 
automatically ?


4) Assuming it is okay to run guest and public on same NIC what would the 
initial configuration of cloudbr1 look like ?



Apologies for all the questions but I am just getting completely stuck on this



Advanced networking - physical NICs.

2018-07-03 Thread Jon Marshall
I come from a Cisco background so I understand vlans, tagging and how to 
configure switches for trunks and I also understand how to configure tagging on 
CentOS.


The bit that is just not clicking with me is how to configure the NIC with CS 
using KVM and advanced networking.


The management/storage NIC is easy as I just assign an IP directly the bridge 
configuration file (cloudbr0) as there is no vlan tagging here.


The second NIC I want to run guest and public traffic across and I am using 
another bridge - cloudbr1.


Questions -


1) Is it okay to run guest and public traffic on the same NIC ?


2) do the public IPs only live on the VR ie. do I need a cloudbr1. for 
the public IP range ?


3) whenever I add a new guest network once setup do I first need to setup the 
cloudbr1. for that guest network or does cloudstack do this 
automatically ?


4) Assuming it is okay to run guest and public on same NIC what would the 
initial configuration of cloudbr1 look like ?



Apologies for all the questions but I am just getting completely stuck on this



Advanced networking adding a host

2018-07-03 Thread Jon Marshall
Trying to setup advanced networking using KVM CS v4.11


When I try to add the first host in the initial setup I get this in the 
management-server log -



local), Ver: v1, Flags: 110, { ReadyAnswer } }
2018-07-03 10:30:37,489 DEBUG [c.c.u.s.SSHCmdHelper] 
(qtp788117692-16:ctx-c7a9deda ctx-9bbb3bea) (logid:2e852372) SSH command: 
cloudstack-setup-agent  -m 172.30.3.2 -z 1 -p 1 -c 1 -g 
9f2b15cb-1b75-321b-bf59-f83e7a5e8efb -a -s  --pubNic=cloudbr1 --prvNic=cloudbr0 
--guestNic=cloudbr1 --hypervisor=kvm
SSH command output:
Usage: cloudstack-setup-agent [options]

cloudstack-setup-agent: error: no such option: -s

2018-07-03 10:30:37,489 INFO  [c.c.h.k.d.LibvirtServerDiscoverer] 
(qtp788117692-16:ctx-c7a9deda ctx-9bbb3bea) (logid:2e852372) cloudstack agent 
setup command failed: cloudstack-setup-agent  -m 172.30.3.2 -z 1 -p 1 -c 1 -g 
9f2b15cb-1b75-321b-bf59-f83e7a5e8efb -a -s  --pubNic=cloudbr1 --prvNic=cloudbr0 
--guestNic=cloudbr1 --hypervisor=kvm

and sure enough there is no "-s" option according to the agent -


 cloudstack-setup-agent -h
Usage: cloudstack-setup-agent [options]

Options:
  -h, --helpshow this help message and exit
  -aauto mode
  -m MGT, --host=MGTManagement server hostname or IP-Address
  -z ZONE, --zone=ZONE  zone id
  -p POD, --pod=POD pod id
  -c CLUSTER, --cluster=CLUSTER
cluster id
  -t HYPERVISOR, --hypervisor=HYPERVISOR
hypervisor type
  -g GUID, --guid=GUID  guid
  --pubNic=PUBNIC   Public traffic interface
  --prvNic=PRVNIC   Private traffic interface
  --guestNic=GUESTNIC   Guest traffic interface

anyone have an idea what the management server thinks the "-s"  option is meant 
to be (storage ??)






Re: Adding a static route to the SSVM for remote NFS server

2018-06-27 Thread Jon Marshall
Hi Sateesh


I was trying to edit the interfaces files on the SSVM itself, I would never 
have thought to use that setting.


Worked perfectly, many thanks for that, much appreciated.


Jon



From: Sateesh Chodapuneedi 
Sent: 27 June 2018 10:58
To: users@cloudstack.apache.org
Subject: Re: Adding a static route to the SSVM for remote NFS server

Hi Jon,

>> Do you know how to add it permanently across reboots ?
Yes, we can update the global configuration setting 
"secstorage.allowed.internal.sites" to achieve that. That is a comma separated 
list of internal CIDRs having the servers hosting the templates that SSVM tries 
to download. We can add the CIDR of NFS server over there, just add the CIDR in 
that comma separated list. Do not overwrite whatever is the current setting, 
just append after a comma.

Regards,
Sateesh

-Original Message-
From: Jon Marshall 
Reply-To: "users@cloudstack.apache.org" 
Date: Wednesday, 27 June 2018 at 13:45
To: "users@cloudstack.apache.org" 
Subject: Re: Adding a static route to the SSVM for remote NFS server

Hi Sateesh


I can add the route manually but when the SSVM is rebooted it loses that 
route.


I edited the /etc/network/interfaces file and added it there but it still 
gets overwritten on reboot.


Do you know how to add it permanently across reboots ?

Jon

From: Sateesh Chodapuneedi 
Sent: 26 June 2018 16:25
To: users@cloudstack.apache.org
Subject: Re: Adding a static route to the SSVM for remote NFS server

Hi Jon,

>> Can I just add a static route to the SSVM for the 172.30.5.0/28 subnet 
and would this work or is there a better way to do it ?

Yes, that should work. I did it some time back in my environment where NFS 
server is sitting in a separate subnet (in LAN), and public NIC/gateway in the 
SSVM was used to route the packets to/fro NFS server. We have added static 
route via the private/management NIC because the NFS server sits in LAN.
Let us know how it goes.

Regards,
Sateesh

    -Original Message-
From: Jon Marshall 
Reply-To: "users@cloudstack.apache.org" 
Date: Tuesday, 26 June 2018 at 19:36
To: "users@cloudstack.apache.org" 
Subject: Adding a static route to the SSVM for remote NFS server

I am doing basic networking with 2 NICS (one for management/storage and 
the other for Guest traffic).

When you configure the physical NIC/bridges  you can only define one 
default gateway so I do it for the guest traffic which means the routing table 
on the SSVM ends up as  -

root@s-1-VM:/etc# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags   MSS Window  
irtt Iface
0.0.0.0 172.30.4.1  0.0.0.0 UG0 0  
0 eth2
8.8.4.4 172.30.3.1  255.255.255.255 UGH   0 0  
0 eth1
8.8.8.8 172.30.3.1  255.255.255.255 UGH   0 0  
0 eth1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0  
0 eth0
172.30.3.0  0.0.0.0 255.255.255.192 U 0 0  
0 eth1
172.30.4.0  0.0.0.0 255.255.255.128 U 0 0  
0 eth2


where 172.30.3.0/26 is the management network and 172.30.4.0/25 is the 
guest network.

My NFS server has an IP of 172.30.5.2 so it is on a different subnet 
which means the the secondary storage would have to run over the guest NIC if I 
am understanding this properly.

I want storage over the management NIC.  Based on some advice on this 
mailing list I configured storage to use the same bridge (KVM label) as 
management but it won't build ie. it errors on the storage traffic part I 
suspect because the subnet details I enter are not part of the management 
subnet.

Can I just add a static route to the SSVM for the 172.30.5.0/28 subnet 
and would this work or is there a better way to do it ?

On a more general note with basic networking the assumption seems to be 
you run everything over the same NIC and if you don't it seems to cause no end 
of problems :)





DISCLAIMER
==
This e-mail may contain privileged and confidential information which is 
the property of Accelerite, a Persistent Systems business. It is intended only 
for the use of the individual or entity to which it is addressed. If you are 
not the intended recipient, you are not authorized to read, retain, copy, 
print, distribute or use this message. If you have received this communication 
in error, please notify the sender and delete all copies of this message. 
Accelerite, a Persistent Systems business does not accept any liability for 
virus infected mails.




Re: Adding a static route to the SSVM for remote NFS server

2018-06-27 Thread Jon Marshall
Hi Sateesh


I can add the route manually but when the SSVM is rebooted it loses that route.


I edited the /etc/network/interfaces file and added it there but it still gets 
overwritten on reboot.


Do you know how to add it permanently across reboots ?

Jon

From: Sateesh Chodapuneedi 
Sent: 26 June 2018 16:25
To: users@cloudstack.apache.org
Subject: Re: Adding a static route to the SSVM for remote NFS server

Hi Jon,

>> Can I just add a static route to the SSVM for the 172.30.5.0/28 subnet and 
>> would this work or is there a better way to do it ?

Yes, that should work. I did it some time back in my environment where NFS 
server is sitting in a separate subnet (in LAN), and public NIC/gateway in the 
SSVM was used to route the packets to/fro NFS server. We have added static 
route via the private/management NIC because the NFS server sits in LAN.
Let us know how it goes.

Regards,
Sateesh

-Original Message-----
From: Jon Marshall 
Reply-To: "users@cloudstack.apache.org" 
Date: Tuesday, 26 June 2018 at 19:36
To: "users@cloudstack.apache.org" 
Subject: Adding a static route to the SSVM for remote NFS server

I am doing basic networking with 2 NICS (one for management/storage and the 
other for Guest traffic).

When you configure the physical NIC/bridges  you can only define one 
default gateway so I do it for the guest traffic which means the routing table 
on the SSVM ends up as  -

root@s-1-VM:/etc# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags   MSS Window  irtt 
Iface
0.0.0.0 172.30.4.1  0.0.0.0 UG0 0  0 
eth2
8.8.4.4 172.30.3.1  255.255.255.255 UGH   0 0  0 
eth1
8.8.8.8 172.30.3.1  255.255.255.255 UGH   0 0  0 
eth1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0  0 
eth0
172.30.3.0  0.0.0.0 255.255.255.192 U 0 0  0 
eth1
172.30.4.0  0.0.0.0 255.255.255.128 U 0 0  0 
eth2


where 172.30.3.0/26 is the management network and 172.30.4.0/25 is the 
guest network.

My NFS server has an IP of 172.30.5.2 so it is on a different subnet which 
means the the secondary storage would have to run over the guest NIC if I am 
understanding this properly.

I want storage over the management NIC.  Based on some advice on this 
mailing list I configured storage to use the same bridge (KVM label) as 
management but it won't build ie. it errors on the storage traffic part I 
suspect because the subnet details I enter are not part of the management 
subnet.

Can I just add a static route to the SSVM for the 172.30.5.0/28 subnet and 
would this work or is there a better way to do it ?

On a more general note with basic networking the assumption seems to be you 
run everything over the same NIC and if you don't it seems to cause no end of 
problems :)





DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Accelerite, a Persistent Systems business. It is intended only for 
the use of the individual or entity to which it is addressed. If you are not 
the intended recipient, you are not authorized to read, retain, copy, print, 
distribute or use this message. If you have received this communication in 
error, please notify the sender and delete all copies of this message. 
Accelerite, a Persistent Systems business does not accept any liability for 
virus infected mails.


Adding a static route to the SSVM for remote NFS server

2018-06-26 Thread Jon Marshall
I am doing basic networking with 2 NICS (one for management/storage and the 
other for Guest traffic).

When you configure the physical NIC/bridges  you can only define one default 
gateway so I do it for the guest traffic which means the routing table on the 
SSVM ends up as  -

root@s-1-VM:/etc# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags   MSS Window  irtt Iface
0.0.0.0 172.30.4.1  0.0.0.0 UG0 0  0 eth2
8.8.4.4 172.30.3.1  255.255.255.255 UGH   0 0  0 eth1
8.8.8.8 172.30.3.1  255.255.255.255 UGH   0 0  0 eth1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0  0 eth0
172.30.3.0  0.0.0.0 255.255.255.192 U 0 0  0 eth1
172.30.4.0  0.0.0.0 255.255.255.128 U 0 0  0 eth2


where 172.30.3.0/26 is the management network and 172.30.4.0/25 is the guest 
network.

My NFS server has an IP of 172.30.5.2 so it is on a different subnet which 
means the the secondary storage would have to run over the guest NIC if I am 
understanding this properly.

I want storage over the management NIC.  Based on some advice on this mailing 
list I configured storage to use the same bridge (KVM label) as management but 
it won't build ie. it errors on the storage traffic part I suspect because the 
subnet details I enter are not part of the management subnet.

Can I just add a static route to the SSVM for the 172.30.5.0/28 subnet and 
would this work or is there a better way to do it ?

On a more general note with basic networking the assumption seems to be you run 
everything over the same NIC and if you don't it seems to cause no end of 
problems :)





Re: Storage traffic clarification.

2018-06-21 Thread Jon Marshall
Ilya


Thanks for the response.


So if I use cloudbr0 for management then define that on the storage icon as 
well when setting up a zone.


Is there something else I need to do as well though because when I set it up I 
have cloudbr0 for management and cloudbr1 for guest and in the network 
configuration files I only define a default gateway in the cloudbr1 file.


This is what caught me out originally ie. I defined a default gateway in both 
cloudbrx files and the SSVM chose the management vlan as it's default gateway 
so the guest traffic did not work.  If i only set the default gateway in the 
guest subnet everything works but then the SSVM will have a default gateway in 
the guest IP subnet and as it does not have an interface in the NFS subnet it 
has to use that default gateway to get to the NFS server.


Perhaps I am not understanding how cloudstack is doing the routing internally ?


Jon



From: ilya musayev 
Sent: 20 June 2018 21:20
To: users@cloudstack.apache.org
Subject: Re: Storage traffic clarification.

Jon

with Basic Network - it implies you have all in one network for everything.

If you have a storage network that is L3 routable and you don’t want to use
guest network - then when you create a zone - use storage label and define
what bridge will be used to get there.

If it’s not guest bridge you wan to use - then use the management Bridge.

 Regards
Ilya

On Wed, Jun 20, 2018 at 12:25 AM Jon Marshall  wrote:

> I am probably missing something obvious but according to this article (
> https://www.shapeblue.com/understanding-cloudstacks-physical-networking-architecture/)
[https://www.shapeblue.com/wp-content/uploads/2013/01/PhysicalNetworkingBlog_basNetWiz-300x239.png]<https://www.shapeblue.com/understanding-cloudstacks-physical-networking-architecture/>

Understanding CloudStack’s Physical Networking 
...<https://www.shapeblue.com/understanding-cloudstacks-physical-networking-architecture/>
www.shapeblue.com
Understanding and configuring the physical connections of a host in a 
CloudStack deployment can at first be very confusing. While Software Defined 
Networking (SDN) is set to greatly simplify some aspects, its integration 
within CloudStack is not fully mature yet and it won’t be the right solution 
for everyone.



> by default primary and secondary storage traffic travels across the
> management network.
>
> As an example assume basic networking with 2 NICS, one for management with
> an IP subnet,  the other NIC for guest traffic using a different subnet. A
> physical host should only have one default gateway and this would have to
> be from the guest VM subnet.
>
> I setup two tests  -
>
> 1) the NFS server had an IP address from the management subnet
>
> 2) the NFS server was on a completely different IP subnet ie. not the
> management or the guest IP subnets.
>
> Both worked but in test 2 I can't see how the storage traffic could be
> using the management NIC because there is no default gateway on the compute
> nodes for the management subnet and the NFS server is on a remote network.
>
> So is storage traffic in test 2 actually running across the guest NIC ?
>
> And as the recommendation is to have separate storage from guest traffic
> does this mean the NFS server has to be in the management subnet ?
>
> Thanks
>


Re: advanced networking with public IPs direct to VMs

2018-06-20 Thread Jon Marshall
Hi Rafael


Just to let you know I reran the 2 NIC setup and it worked fine this time so it 
must have been something I did in the setup.


Many thanks for all the help


Jon



From: Rafael Weingärtner 
Sent: 15 June 2018 11:40
To: users
Subject: Re: advanced networking with public IPs direct to VMs

Did you notice some problems in the log files when you tested with 2 NICs?
When using NFS cluster wide storage, the behavior should be the same as
with 3 NICs. There might be something in your configuration. The problem
for zone wide storage is what we discussed before though.

1) if I want to run the management/storage traffic over the same NIC the
NFS server needs to be in the management subnet
No. You should be able to setup different network ranges for each one of
them.

2) when I do the initial configuration I need to drag and drop the storage
icon and use the same label as the management traffic
If you are using only two NICs, for sure you need to configure the traffic
labels according. I mean, you have two only NICs, then you need to
configure the labels (cloudbr0 and cloudbr2) in that physical network tab
in the zone configuration.


On Thu, Jun 14, 2018 at 5:03 PM, Jon Marshall  wrote:

> Hi Rafael
>
>
> I did log a bug but when rebuilding I found some slightly different
> behaviour so have temporarily removed it.
>
>
> So using cluster NFS and 3 NICs as already described VM HA works.
>
>
> Because the recommendation for basic network setup seems to be run
> storage/management over the same NIC and guest on another, so 2 NICs in
> total,  I set it up this way using cluster NFS and to my surprise VM HA did
> not work so it is obviously a bit more complicated than it first appeared.
>
>
> My NFS server is on a different subnet than the management server and when
> I set it up in the UI because the storage traffic runs over the management
> NIC by default I did not assign a label to the storage traffic, ie. I only
> assigned labels to management and guest.
>
>
> So two thoughts occur which I can test unless you can see the issue -
>
>
> 1) if I want to run the management/storage traffic over the same NIC the
> NFS server needs to be in the management subnet
>
>
> or
>
>
> 2) when I do the initial configuration I need to drag and drop the storage
> icon and use the same label as the management traffic
>
>
> Personally I can't see how 2) will help ie. the only time I should need to
> assign a label to storage is if I use a different NIC.
>
>
> Apologies for bringing this up again but am happy to run any tests and
> would like to file accurate bug report.
>
>
>
>
>
>
> 
> From: Rafael Weingärtner 
> Sent: 11 June 2018 10:58
> To: users
> Subject: Re: advanced networking with public IPs direct to VMs
>
> Well, it seems that you have found a bug. Can you fill out an issue report
> on Github?
>
> Thanks for the hard work on debugging and testing.
>
> On Fri, Jun 8, 2018 at 2:17 PM, Jon Marshall 
> wrote:
>
> > So based on Erik's suggestion (thanks Erik) I rebuilt the management
> > server and setup cluster wide primary storage as opposed to zone wide
> which
> > I have been using so far.
> >
> >
> > Still using 3 NICs (management/Guest/storage) and basic networking.
> >
> >
> > And VM HA now works. In addition it failed over quicker than it did when
> I
> > had zone wide NFS storage on a single NIC.
> >
> >
> > Still a bit confused about this output where it is still showing the
> > storage_ip_addresses as 172.30.3.x IPs which is the management subnet but
> > maybe I am reading it incorrectly.
> >
> >
> >
> > mysql> select * from cloud.host;
> > ++-+
> > --++++--
> > ---+-++-
> > +-+--+--
> > -+---++-
> > --+-++--
> > --+++-+--+--
> > -+-+-+--
> > ---+++--+---
> > ---+++--+---
> > +---+---
> > +-+++---
> > --+-+-+--+--
> > --+---+-+--+
> > | id | name 

Storage traffic clarification.

2018-06-20 Thread Jon Marshall
I am probably missing something obvious but according to this article 
(https://www.shapeblue.com/understanding-cloudstacks-physical-networking-architecture/)
  by default primary and secondary storage traffic travels across the 
management network.

As an example assume basic networking with 2 NICS, one for management with an 
IP subnet,  the other NIC for guest traffic using a different subnet. A 
physical host should only have one default gateway and this would have to be 
from the guest VM subnet.

I setup two tests  -

1) the NFS server had an IP address from the management subnet

2) the NFS server was on a completely different IP subnet ie. not the 
management or the guest IP subnets.

Both worked but in test 2 I can't see how the storage traffic could be using 
the management NIC because there is no default gateway on the compute nodes for 
the management subnet and the NFS server is on a remote network.

So is storage traffic in test 2 actually running across the guest NIC ?

And as the recommendation is to have separate storage from guest traffic does 
this mean the NFS server has to be in the management subnet ?

Thanks


Re: advanced networking with public IPs direct to VMs

2018-06-15 Thread Jon Marshall
I did a quick run through and it looked like the same messages as I got with 
zone wide NFS when it didn't work.


I am going to do some more tests and capture full management logs so I can do a 
comparison to see if there are any differences and once I have done that I will 
redo the bug report.


Just to clarify the second point about labels.



When you configure the UI and use the manual setup with basic networking when 
you configure the physical network the "Management" and "Guest" icons are 
already under the physical network part and the "Storage" icon is under 
"Traffic types"


For both the 2 and 3 NIC setup I configure Management as cloudbr0 and Guest as 
cloudbr1.


For the 2 NIC setup that is all I do because by default storage runs across 
management so I assume I don't need to do anything else.


For the 3 NIC setup I then drag and drop the Storage icon onto the physical 
network part and configure it as cloudbr2.


Just wanted to make that clear in case I am doing it wrong.


Will let you know results of tests next week.




From: Rafael Weingärtner 
Sent: 15 June 2018 11:4
To: users
Subject: Re: advanced networking with public IPs direct to VMs

Did you notice some problems in the log files when you tested with 2 NICs?
When using NFS cluster wide storage, the behavior should be the same as
with 3 NICs. There might be something in your configuration. The problem
for zone wide storage is what we discussed before though.

1) if I want to run the management/storage traffic over the same NIC the
NFS server needs to be in the management subnet
No. You should be able to setup different network ranges for each one of
them.

2) when I do the initial configuration I need to drag and drop the storage
icon and use the same label as the management traffic
If you are using only two NICs, for sure you need to configure the traffic
labels according. I mean, you have two only NICs, then you need to
configure the labels (cloudbr0 and cloudbr2) in that physical network tab
in the zone configuration.


On Thu, Jun 14, 2018 at 5:03 PM, Jon Marshall  wrote:

> Hi Rafael
>
>
> I did log a bug but when rebuilding I found some slightly different
> behaviour so have temporarily removed it.
>
>
> So using cluster NFS and 3 NICs as already described VM HA works.
>
>
> Because the recommendation for basic network setup seems to be run
> storage/management over the same NIC and guest on another, so 2 NICs in
> total,  I set it up this way using cluster NFS and to my surprise VM HA did
> not work so it is obviously a bit more complicated than it first appeared.
>
>
> My NFS server is on a different subnet than the management server and when
> I set it up in the UI because the storage traffic runs over the management
> NIC by default I did not assign a label to the storage traffic, ie. I only
> assigned labels to management and guest.
>
>
> So two thoughts occur which I can test unless you can see the issue -
>
>
> 1) if I want to run the management/storage traffic over the same NIC the
> NFS server needs to be in the management subnet
>
>
> or
>
>
> 2) when I do the initial configuration I need to drag and drop the storage
> icon and use the same label as the management traffic
>
>
> Personally I can't see how 2) will help ie. the only time I should need to
> assign a label to storage is if I use a different NIC.
>
>
> Apologies for bringing this up again but am happy to run any tests and
> would like to file accurate bug report.
>
>
>
>
>
>
> 
> From: Rafael Weingärtner 
> Sent: 11 June 2018 10:58
> To: users
> Subject: Re: advanced networking with public IPs direct to VMs
>
> Well, it seems that you have found a bug. Can you fill out an issue report
> on Github?
>
> Thanks for the hard work on debugging and testing.
>
> On Fri, Jun 8, 2018 at 2:17 PM, Jon Marshall 
> wrote:
>
> > So based on Erik's suggestion (thanks Erik) I rebuilt the management
> > server and setup cluster wide primary storage as opposed to zone wide
> which
> > I have been using so far.
> >
> >
> > Still using 3 NICs (management/Guest/storage) and basic networking.
> >
> >
> > And VM HA now works. In addition it failed over quicker than it did when
> I
> &

Re: advanced networking with public IPs direct to VMs

2018-06-14 Thread Jon Marshall
Hi Rafael


I did log a bug but when rebuilding I found some slightly different behaviour 
so have temporarily removed it.


So using cluster NFS and 3 NICs as already described VM HA works.


Because the recommendation for basic network setup seems to be run 
storage/management over the same NIC and guest on another, so 2 NICs in total,  
I set it up this way using cluster NFS and to my surprise VM HA did not work so 
it is obviously a bit more complicated than it first appeared.


My NFS server is on a different subnet than the management server and when I 
set it up in the UI because the storage traffic runs over the management NIC by 
default I did not assign a label to the storage traffic, ie. I only assigned 
labels to management and guest.


So two thoughts occur which I can test unless you can see the issue -


1) if I want to run the management/storage traffic over the same NIC the NFS 
server needs to be in the management subnet


or


2) when I do the initial configuration I need to drag and drop the storage icon 
and use the same label as the management traffic


Personally I can't see how 2) will help ie. the only time I should need to 
assign a label to storage is if I use a different NIC.


Apologies for bringing this up again but am happy to run any tests and would 
like to file accurate bug report.







From: Rafael Weingärtner 
Sent: 11 June 2018 10:58
To: users
Subject: Re: advanced networking with public IPs direct to VMs

Well, it seems that you have found a bug. Can you fill out an issue report
on Github?

Thanks for the hard work on debugging and testing.

On Fri, Jun 8, 2018 at 2:17 PM, Jon Marshall  wrote:

> So based on Erik's suggestion (thanks Erik) I rebuilt the management
> server and setup cluster wide primary storage as opposed to zone wide which
> I have been using so far.
>
>
> Still using 3 NICs (management/Guest/storage) and basic networking.
>
>
> And VM HA now works. In addition it failed over quicker than it did when I
> had zone wide NFS storage on a single NIC.
>
>
> Still a bit confused about this output where it is still showing the
> storage_ip_addresses as 172.30.3.x IPs which is the management subnet but
> maybe I am reading it incorrectly.
>
>
>
> mysql> select * from cloud.host;
> ++-+
> --++++--
> ---+-++-
> +-+--+--
> -+---++-
> --+-++--
> --+++-+--+--
> -+-+-+--
> ---+++--+---
> ---+++--+---
> +---+---
> +-+++---
> --+-+-+--+--
> --+---+-+--+
> | id | name| uuid | status |
> type   | private_ip_address | private_netmask |
> private_mac_address | storage_ip_address | storage_netmask |
> storage_mac_address | storage_ip_address_2 | storage_mac_address_2 |
> storage_netmask_2 | cluster_id | public_ip_address | public_netmask  |
> public_mac_address | proxy_port | data_center_id | pod_id | cpu_sockets |
> cpus | speed | url | fs_type |
> hypervisor_type | hypervisor_version | ram| resource | version  |
> parent | total_size | capabilities | guid
> | available | setup | dom0_memory | last_ping  |
> mgmt_server_id | disconnected| created | removed |
> update_count | resource_state | owner | lastUpdated | engine_state |
> ++-+
> --++++--
> ---+-++-
> +-+--+--
> -+---++-
> --+-++--
> --+++-+--+--
> -+-+-+--
> ---+++--+---
> ---+++--+---
> +---+---
> +-+++---
> --+-+-+

Re: 4.11 without Host-HA framework

2018-06-11 Thread Jon Marshall
Hi Parth


Just in case you have not seen my other thread, it turns out that all this time 
it has been a bug.


Using multiple NICs with basic networking and using zone wide NFS VM HA just 
does not work. If you change to cluster wide NFS then it works fine (and quite 
quickly as well :))


I am now going to setup Host HA and see make sure that all works as well using 
cluster NFS.


Got there in the end :)


Jon






From: Parth Patel 
Sent: 24 May 2018 06:52
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Hi Jon and Angus,

I did not shutdown the VMs as Yiping Zhang said, but I have confirmed this
and discussed earlier in the users list that my HA-enabled VMs got started
on another suitable available host in the cluster even when I didn't have
IPMI-enabled hardware and did no configuration for OOBM and Host-HA. I
simply pulled the ethernet cable connecting the host to entire network (I
did use just one NIC) and according to the value set in ping timeout event,
the HA-enabled VMs were restarted on another available host. I tested the
scenario using both the scenarios: the echo command as well as good old
plugging out the NIC from the host. My VMs were successfully started on
another available host after CS manager confirmed they were not reachable.

I too want to understand how the failover mechanism in CloudStack actually
works. I used ACS 4.11 packages available here:
http://cloudstack.apt-get.eu/centos/7/4.11/

Regards,
Parth Patel


On Thu, 24 May 2018 at 10:53 Paul Angus  wrote:

> I'm afraid that is not a host crash.  When shutting down the guest OS, the
> CloudStack agent on the host is still able to report to the management
> server that the VM has stopped.
>
> This is my point. VM-HA relies on the management sever communication with
> the host agent.
>
> Kind regards,
>
> Paul Angus
>
> paul.an...@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>
> -Original Message-
> From: Yiping Zhang 
> Sent: 24 May 2018 00:44
> To: users@cloudstack.apache.org
> Subject: Re: 4.11 without Host-HA framework
>
> I can say for fact that VM's using a HA enabled service offering will be
> restarted by CS on another host, assuming there are enough
> capacity/resources in the cluster, when their original host crashes,
> regardless that host comes back or not.
>
> The simplest way to test VM HA feature with a VM instance using HA enabled
> service offering is to issue shutdown command in guest OS, and watching it
> gets restarted by CS manager.
>
> On 5/23/18, 1:23 PM, "Paul Angus"  wrote:
>
> Hi Jon,
>
> Don't worry, TBH I'm dubious about those claiming to have VM-HA
> working when a host crashes (but doesn't restart).
> I'll check in with the guys that set values for host-ha when testing,
> to see which ones they change and what they set them to.
>
> paul.an...@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>
> -Original Message-
> From: Jon Marshall 
> Sent: 23 May 2018 21:10
> To: users@cloudstack.apache.org
> Subject: Re: 4.11 without Host-HA framework
>
> Rohit / Paul
>
>
> Thanks again for answering.
>
>
> I am a Cisco guy with an ex Unix background but no virtualisation
> experience and I can honestly say I have never felt this stupid before 
>
>
> I have Cloudstack working but failover is killing me.
>
>
> When you say VM HA relies on the host telling CS the VM is down how
> does that work because if you crash the host how does it tell CS anything ?
> And when you say tell CS do you mean the CS manager  ?
>
>
> I guess I am just not understanding all the moving parts. I have had
> HOST HA working (to an extent) although it takes a long time to failover
> even after tweaking the timers but the fact that I keep finding references
> to people saying even without HOST HA it should failover (and mine doesn't)
> makes me think I have configured it incorrectly somewhere along the line.
>
>
> I have configured a compute offering with HA and I am crashing the
> host with the echo command as suggested but still nothing.
>
>
> I understand what you are saying Paul about it not being a good idea
> to rely on VM HA so I will go back to Host HA and try to speed up failover
> times.
>
>
> Can I ask, from your experiences, what is a realistic fail over time
> for CS ie. if a host fails for example ?
>
>
> Jon
>
>
>
>
> 
> Fro

Re: advanced networking with public IPs direct to VMs

2018-06-11 Thread Jon Marshall
Hi Rafael


I don't have a github account but can setup one up and do a report sometime 
this week if that is okay ?


No problem with the testing and thanks for the help.


Before I leave this if I use NFS cluster mode couple of questions -


1) if I run management and storage over same interface the NFS server can still 
be on a different subnet than the management subnet ie. the NFS server does not 
have to have IP from the management subnet ?


2) If i add another cluster can I just create a different NFS share from the 
same server ?


Finally many thanks to you and the others for the help provided.


From: Rafael Weingärtner 
Sent: 11 June 2018 10:58
To: users
Subject: Re: advanced networking with public IPs direct to VMs

Well, it seems that you have found a bug. Can you fill out an issue report
on Github?

Thanks for the hard work on debugging and testing.

On Fri, Jun 8, 2018 at 2:17 PM, Jon Marshall  wrote:

> So based on Erik's suggestion (thanks Erik) I rebuilt the management
> server and setup cluster wide primary storage as opposed to zone wide which
> I have been using so far.
>
>
> Still using 3 NICs (management/Guest/storage) and basic networking.
>
>
> And VM HA now works. In addition it failed over quicker than it did when I
> had zone wide NFS storage on a single NIC.
>
>
> Still a bit confused about this output where it is still showing the
> storage_ip_addresses as 172.30.3.x IPs which is the management subnet but
> maybe I am reading it incorrectly.
>
>
>
> mysql> select * from cloud.host;
> ++-+
> --++++--
> ---+-++-
> +-+--+--
> -+---++-
> --+-++--
> --+++-+--+--
> -+-+-+--
> ---+++--+---
> ---+++--+---
> +---+---
> +-+++---
> --+-+-+--+--
> --+---+-+--+
> | id | name| uuid | status |
> type   | private_ip_address | private_netmask |
> private_mac_address | storage_ip_address | storage_netmask |
> storage_mac_address | storage_ip_address_2 | storage_mac_address_2 |
> storage_netmask_2 | cluster_id | public_ip_address | public_netmask  |
> public_mac_address | proxy_port | data_center_id | pod_id | cpu_sockets |
> cpus | speed | url | fs_type |
> hypervisor_type | hypervisor_version | ram| resource | version  |
> parent | total_size | capabilities | guid
> | available | setup | dom0_memory | last_ping  |
> mgmt_server_id | disconnected| created | removed |
> update_count | resource_state | owner | lastUpdated | engine_state |
> ++-+
> --++++--
> ---+-++-
> +-+--+--
> -+---++-
> --+-++--
> --+++-+--+--
> -+-+-+--
> ---+++--+---
> ---+++--+---
> +---+---
> +-+++---
> --+-+-+--+--
> --+---+-+--+
> |  1 | dcp-cscn1.local | 372c738c-5370-4b46-9358-14b649c73d6b | Up |
> Routing| 172.30.3.3 | 255.255.255.192 |
> 00:22:19:92:4e:34   | 172.30.3.3 | 255.255.255.192 |
> 00:22:19:92:4e:34   | NULL | NULL  | NULL
> |  1 | 172.30.4.3| 255.255.255.128 |
> 00:22:19:92:4e:35  |   NULL |  1 |  1 |   1 |
>   2 |  3000 | iqn.1994-05.com.redhat:fa437fb0c023 | NULL| KVM
>  | NULL   | 7510159360 | NULL | 4.11.0.0 | NULL   |
>NULL | hvm,snapshot | 
> 9f2b15cb-1b75-321b-bf59-f83e7a5e8efb-LibvirtComputing

Re: advanced networking with public IPs direct to VMs

2018-06-08 Thread Jon Marshall
:15   | NULL 
| NULL  | NULL  |   NULL | 172.30.4.62  
 | 255.255.255.128 | 1e:00:01:00:00:63  |   NULL |  1 |  1 
|NULL | NULL |  NULL | NoIqn   | NULL| 
NULL| NULL   |  0 | NULL | 4.11.0.0 | NULL  
 |   NULL | NULL | Proxy.2-ConsoleProxyResource 
 | 1 | 0 |   0 | 1492635804 |   146457912294 | 
2018-06-08 11:57:31 | 2018-06-08 11:22:03 | NULL|7 | Enabled
| NULL  | NULL| Disabled |
|  4 | dcp-cscn2.local | 935260eb-a80c-4ead-85d7-3df8212e301b | Down   | 
Routing| 172.30.3.4 | 255.255.255.192 | 00:26:b9:4a:97:7d   
| 172.30.3.4 | 255.255.255.192 | 00:26:b9:4a:97:7d   | NULL 
| NULL  | NULL  |  1 | 172.30.4.4   
 | 255.255.255.128 | 00:26:b9:4a:97:7e  |   NULL |  1 |  1 
|   1 |2 |  3000 | iqn.1994-05.com.redhat:e9b4aa7e7881 | NULL| 
KVM | NULL   | 7510159360 | NULL | 4.11.0.0 | NULL  
 |   NULL | hvm,snapshot | 
40e58399-fc7a-3a59-8f48-16d0f99b11c9-LibvirtComputingResource | 1 | 
0 |   0 | 1492635804 |   NULL | 2018-06-08 11:57:31 | 
2018-06-08 11:35:07 | NULL|7 | Enabled| NULL  | NULL
| Disabled |
|  5 | dcp-cscn3.local | f3cabc9e-9679-4d7e-8297-b6765eea2770 | Up | 
Routing| 172.30.3.5 | 255.255.255.192 | 00:24:e8:73:6a:b2   
| 172.30.3.5 | 255.255.255.192 | 00:24:e8:73:6a:b2   | NULL 
| NULL  | NULL  |  1 | 172.30.4.5   
 | 255.255.255.128 | 00:24:e8:73:6a:b3  |   NULL |  1 |  1 
|   1 |2 |  3000 | iqn.1994-05.com.redhat:ccdce43aff1c | NULL| 
KVM | NULL   | 7510159360 | NULL | 4.11.0.0 | NULL  
 |   NULL | hvm,snapshot | 
10bb1c01-0e92-3108-8209-37f3eebad8fb-LibvirtComputingResource | 1 | 
0 |   0 | 1492635804 |   146457912294 | 2018-06-08 11:57:31 | 
2018-06-08 11:36:27 | NULL|4 | Enabled| NULL  | NULL
| Disabled |
++-+--++++-+-++-+-+--+---+---++---+-+++++-+--+---+-+-+-+++--+--+++--+---+---+---+-+++-+-+-+--++---+-+--+
5 rows in set (0.00 sec)

mysql>

So some sort of bug maybe ?


From: Erik Weber 
Sent: 08 June 2018 10:15
To: users@cloudstack.apache.org
Subject: Re: advanced networking with public IPs direct to VMs

While someone ponders about the zone wide storage, you could try adding a
cluster wide nfs storage and see if it the rest works in that setup.

Erik

On Thu, Jun 7, 2018 at 11:49 AM Jon Marshall  wrote:

> Yes, all basic. I read a Shapeblue doc that recommended splitting traffic
> across multiple NICs even in basic networking mode so that is what I am
> trying to do.
>
>
> With single NIC you do not get the NFS storage message.
>
>
> I have the entire management server logs for both scenarios after I pulled
> the power to one of the compute nodes but from the single NIC setup these
> seem to be the relevant lines -
>
>
> 2018-06-04 10:17:10,972 DEBUG [c.c.n.NetworkUsageManagerImpl]
> (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) Disconnected called on 4
> with status Down
> 2018-06-04 10:17:10,972 DEBUG [c.c.h.Status]
> (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) Transition:[Resource state
> = Enabled, Agent event = HostDown, Host id = 4, name = dcp-cscn2.local]
> 2018-06-04 10:17:10,981 WARN  [o.a.c.alerts]
> (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) AlertType:: 7 |
> dataCenterId:: 1 | podId:: 1 | clusterId:: null | message:: Host is down,
> name: dcp-cscn2.local (id:4), availability zone: dcpz1, pod: dcp1
> 2018-06-04 10:17:11,000 DEBUG [c.c.h.CheckOnAgentInvestigator]
> (HA-Worker-1:ctx-f763f12f work-17) (logid:77c56778) Unable to reach the
> agent for VM[User|i-2-6-VM]: Resource [Host:4] is unreachable: Host 4: Host
> with specified id is not in the right state: Down
> 2018-06-04 1

Re: advanced networking with public IPs direct to VMs

2018-06-07 Thread Jon Marshall
Yes, all basic. I read a Shapeblue doc that recommended splitting traffic 
across multiple NICs even in basic networking mode so that is what I am trying 
to do.


With single NIC you do not get the NFS storage message.


I have the entire management server logs for both scenarios after I pulled the 
power to one of the compute nodes but from the single NIC setup these seem to 
be the relevant lines -


2018-06-04 10:17:10,972 DEBUG [c.c.n.NetworkUsageManagerImpl] 
(AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) Disconnected called on 4 with 
status Down
2018-06-04 10:17:10,972 DEBUG [c.c.h.Status] (AgentTaskPool-3:ctx-8627b348) 
(logid:ef7b8230) Transition:[Resource state = Enabled, Agent event = HostDown, 
Host id = 4, name = dcp-cscn2.local]
2018-06-04 10:17:10,981 WARN  [o.a.c.alerts] (AgentTaskPool-3:ctx-8627b348) 
(logid:ef7b8230) AlertType:: 7 | dataCenterId:: 1 | podId:: 1 | clusterId:: 
null | message:: Host is down, name: dcp-cscn2.local (id:4), availability zone: 
dcpz1, pod: dcp1
2018-06-04 10:17:11,000 DEBUG [c.c.h.CheckOnAgentInvestigator] 
(HA-Worker-1:ctx-f763f12f work-17) (logid:77c56778) Unable to reach the agent 
for VM[User|i-2-6-VM]: Resource [Host:4] is unreachable: Host 4: Host with 
specified id is not in the right state: Down
2018-06-04 10:17:11,006 DEBUG [c.c.h.KVMInvestigator] 
(AgentTaskPool-2:ctx-a6f6dbd1) (logid:774553ff) Neighbouring host:5 returned 
status:Down for the investigated host:4
2018-06-04 10:17:11,006 DEBUG [c.c.h.KVMInvestigator] 
(AgentTaskPool-2:ctx-a6f6dbd1) (logid:774553ff) HA: HOST is ineligible legacy 
state Down for host 4
2018-06-04 10:17:11,006 DEBUG [c.c.h.HighAvailabilityManagerImpl] 
(AgentTaskPool-2:ctx-a6f6dbd1) (logid:774553ff) KVMInvestigator was able to 
determine host 4 is in Down
2018-06-04 10:17:11,006 INFO  [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-2:ctx-a6f6dbd1) (logid:774553ff) The agent from host 4 state 
determined is Down
2018-06-04 10:17:11,006 ERROR [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-2:ctx-a6f6dbd1) (logid:774553ff) Host is down: 
4-dcp-cscn2.local. Starting HA on the VMs

At the moment I only need to assign public IPs direct to VMs rather than using 
NAT with the virtual router but would be happy to go with advanced networking 
if it would make things easier :)


From: Rafael Weingärtner 
Sent: 07 June 2018 10:35
To: users
Subject: Re: advanced networking with public IPs direct to VMs

Ah so, it is not an advanced setup; even when you use multiple NICs.
Can you confirm that the message ""Agent investigation was requested on
host, but host does not support investigation because it has no NFS
storage. Skipping investigation." does not appear when you use a single
NIC? Can you check other log entries that might appear when the host is
marked as "down"?

On Thu, Jun 7, 2018 at 6:30 AM, Jon Marshall  wrote:

> It is all basic networking at the moment for all the setups.
>
>
> If you want me to I can setup a single NIC solution again and run any
> commands you need me to do.
>
>
> FYI when I setup single NIC I use the guided  installtion option in the UI
> rather than manual setup which I do for the multiple NIC scenario.
>
>
> Happy to set it up if it helps.
>
>
>
>
> 
> From: Rafael Weingärtner 
> Sent: 07 June 2018 10:23
> To: users
> Subject: Re: advanced networking with public IPs direct to VMs
>
> Ok, so that explains the log message. This is looking like a bug to me. It
> seems that in Zone wide the host state (when disconnected) is not being
> properly identified due to this NFS thing, and as a consequency it has a
> side effect in VM HA.
>
> We would need some inputs from guys that have advanced networking
> deployments and Zone wide storage.
>
> I do not see how the all in one NIC deployment scenario is working though.
> This method "com.cloud.ha.KVMInvestigator.isAgentAlive(Host)" is dead
> simple, if there is no NFS in the cluster (NFS storage pools found for a
> host's cluster), KVM hosts will be detected as "disconnected" and not down
> with that warning message you noticed.
>
> When you say "all in one NIC", is it an advanced network deployment where
> you put all traffic in a single network, or is it a basic networking that
> you are doing?
>
> On Thu, Jun 7, 2018 at 6:06 AM, Jon Marshall 
> wrote:
>
> > zone wide.
> >
> >
> > 
> > From: Rafael Weingärtner 
> > Sent: 07 June 2018 10:04
> > To: users
> > Subject: Re: advanced networking with public IPs direct to VMs
> >
> > What type of storage are you using? Zone wide? Or cluste

Re: advanced networking with public IPs direct to VMs

2018-06-07 Thread Jon Marshall
It is all basic networking at the moment for all the setups.


If you want me to I can setup a single NIC solution again and run any commands 
you need me to do.


FYI when I setup single NIC I use the guided  installtion option in the UI 
rather than manual setup which I do for the multiple NIC scenario.


Happy to set it up if it helps.





From: Rafael Weingärtner 
Sent: 07 June 2018 10:23
To: users
Subject: Re: advanced networking with public IPs direct to VMs

Ok, so that explains the log message. This is looking like a bug to me. It
seems that in Zone wide the host state (when disconnected) is not being
properly identified due to this NFS thing, and as a consequency it has a
side effect in VM HA.

We would need some inputs from guys that have advanced networking
deployments and Zone wide storage.

I do not see how the all in one NIC deployment scenario is working though.
This method "com.cloud.ha.KVMInvestigator.isAgentAlive(Host)" is dead
simple, if there is no NFS in the cluster (NFS storage pools found for a
host's cluster), KVM hosts will be detected as "disconnected" and not down
with that warning message you noticed.

When you say "all in one NIC", is it an advanced network deployment where
you put all traffic in a single network, or is it a basic networking that
you are doing?

On Thu, Jun 7, 2018 at 6:06 AM, Jon Marshall  wrote:

> zone wide.
>
>
> 
> From: Rafael Weingärtner 
> Sent: 07 June 2018 10:04
> To: users
> Subject: Re: advanced networking with public IPs direct to VMs
>
> What type of storage are you using? Zone wide? Or cluster "wide" storage?
>
> On Thu, Jun 7, 2018 at 4:25 AM, Jon Marshall 
> wrote:
>
> > Rafael
> >
> >
> > Here is the output as requested -
> >
> >
> >
> > mysql> mysql> select * from cloud.storage_pool where removed is null;
> > ++--+--+
> > ---+--++++--
> > --++--+---+-
> > +-+-+-+-
> > ---+---+---++---
> > --+---+
> > | id | name | uuid | pool_type |
> > port | data_center_id | pod_id | cluster_id | used_bytes |
> capacity_bytes |
> > host_address | user_info | path| created |
> removed
> > | update_time | status | storage_provider_name | scope | hypervisor |
> > managed | capacity_iops |
> > ++--+--+
> > ---+--++++--
> > --++--+---+-
> > +-+-+-+-
> > ---+---+---++---
> > --+---+
> > |  1 | ds1  | a234224f-05fb-3f4c-9b0f-c51ebdf9a601 | NetworkFilesystem |
> > 2049 |  1 |   NULL |   NULL | 6059720704 |
> 79133933568 |
> > 172.30.5.2   | NULL  | /export/primary | 2018-06-05 13:45:01 | NULL
> > | NULL| Up | DefaultPrimary| ZONE  | KVM|
> >  0 |  NULL |
> > ++--+--+
> > ---+--++++--
> > --++--+---+-
> > +-+-+-+-
> > ---+---+---++---
> > --+---+
> > 1 row in set (0.00 sec)
> >
> > mysql>
> >
> > Do you think this problem is related to my NIC/bridge configuration or
> the
> > way I am configuring the zone ?
> >
> > Jon
> > 
> > From: Rafael Weingärtner 
> > Sent: 07 June 2018 06:45
> > To: users
> > Subject: Re: advanced networking with public IPs direct to VMs
> >
> > Can you also post the result of:
> > select * from cloud.storage_pool where removed is null
> >
> > On Wed, Jun 6, 2018 at 3:06 PM, Dag Sonstebo  >
> > wrote:
> >
> > > Hi Jon,
> > >
> > > Still confused where your primary storage pools are – are you sure your
> > > hosts are in cluster 1?
> > >
> > > Quick question just to make sure - assuming management/storage is on
> the
> > > same NIC when I setup basic networking the physical network has the
> > > management and guest icons already there and I just edit the KVM
>

Re: advanced networking with public IPs direct to VMs

2018-06-07 Thread Jon Marshall
zone wide.



From: Rafael Weingärtner 
Sent: 07 June 2018 10:04
To: users
Subject: Re: advanced networking with public IPs direct to VMs

What type of storage are you using? Zone wide? Or cluster "wide" storage?

On Thu, Jun 7, 2018 at 4:25 AM, Jon Marshall  wrote:

> Rafael
>
>
> Here is the output as requested -
>
>
>
> mysql> mysql> select * from cloud.storage_pool where removed is null;
> ++--+--+
> ---+--++++--
> --++--+---+-
> +-+-+-+-
> ---+---+---++---
> --+---+
> | id | name | uuid | pool_type |
> port | data_center_id | pod_id | cluster_id | used_bytes | capacity_bytes |
> host_address | user_info | path| created | removed
> | update_time | status | storage_provider_name | scope | hypervisor |
> managed | capacity_iops |
> ++--+--+
> ---+--++++--
> --++--+---+-
> +-+-+-+-
> ---+---+---++---
> --+---+
> |  1 | ds1  | a234224f-05fb-3f4c-9b0f-c51ebdf9a601 | NetworkFilesystem |
> 2049 |  1 |   NULL |   NULL | 6059720704 |79133933568 |
> 172.30.5.2   | NULL  | /export/primary | 2018-06-05 13:45:01 | NULL
> | NULL| Up | DefaultPrimary| ZONE  | KVM|
>  0 |  NULL |
> ++--+--+
> ---+--++++--
> --++--+---+-
> +-+-+-+-
> ---+---+---++---
> --+---+
> 1 row in set (0.00 sec)
>
> mysql>
>
> Do you think this problem is related to my NIC/bridge configuration or the
> way I am configuring the zone ?
>
> Jon
> 
> From: Rafael Weingärtner 
> Sent: 07 June 2018 06:45
> To: users
> Subject: Re: advanced networking with public IPs direct to VMs
>
> Can you also post the result of:
> select * from cloud.storage_pool where removed is null
>
> On Wed, Jun 6, 2018 at 3:06 PM, Dag Sonstebo 
> wrote:
>
> > Hi Jon,
> >
> > Still confused where your primary storage pools are – are you sure your
> > hosts are in cluster 1?
> >
> > Quick question just to make sure - assuming management/storage is on the
> > same NIC when I setup basic networking the physical network has the
> > management and guest icons already there and I just edit the KVM labels.
> If
> > I am running storage over management do I need to drag the storage icon
> to
> > the physical network and use the same KVM label (cloudbr0) as the
> > management or does CS automatically just use the management NIC ie. I
> would
> > only need to drag the storage icon across in basic setup if I wanted it
> on
> > a different NIC/IP subnet ?  (hope that makes sense !)
> >
> > >> I would do both – set up your 2/3 physical networks, name isn’t that
> > important – but then drag the traffic types to the correct one and make
> > sure the labels are correct.
> > Regards,
> > Dag Sonstebo
> > Cloud Architect
> > ShapeBlue
> >
> > On 06/06/2018, 12:39, "Jon Marshall"  wrote:
> >
> > Dag
> >
> >
> > Do you mean  check the pools with "Infrastructure -> Primary Storage"
> > and "Infrastructure -> Secondary Storage" within the UI ?
> >
> >
> > If so Primary Storage has a state of UP, secondary storage does not
> > show a state as such so not sure where else to check it ?
> >
> >
> > Rerun of the command -
> >
> > mysql> select * from cloud.storage_pool where cluster_id = 1;
> > Empty set (0.00 sec)
> >
> > mysql>
> >
> > I think it is something to do with my zone creation rather than the
> > NIC, bridge setup although I can post those if needed.
> >
> > I may try to setup just the 2 NIC solution you mentioned although as
> I
> > say I had the same issue with that ie. host goes to "Altert" state and
> same
> > error messages.  The only time I can get it to go to "Do

Re: advanced networking with public IPs direct to VMs

2018-06-07 Thread Jon Marshall
Rafael


Here is the output as requested -



mysql> mysql> select * from cloud.storage_pool where removed is null;
++--+--+---+--++++++--+---+-+-+-+-++---+---++-+---+
| id | name | uuid | pool_type | port | 
data_center_id | pod_id | cluster_id | used_bytes | capacity_bytes | 
host_address | user_info | path| created | removed | 
update_time | status | storage_provider_name | scope | hypervisor | managed | 
capacity_iops |
++--+--+---+--++++++--+---+-+-+-+-++---+---++-+---+
|  1 | ds1  | a234224f-05fb-3f4c-9b0f-c51ebdf9a601 | NetworkFilesystem | 2049 | 
 1 |   NULL |   NULL | 6059720704 |79133933568 | 172.30.5.2 
  | NULL  | /export/primary | 2018-06-05 13:45:01 | NULL| NULL| 
Up | DefaultPrimary| ZONE  | KVM|   0 |  NULL |
++--+--+---+--++++++--+---+-+-+-+-++---+---++-+---+
1 row in set (0.00 sec)

mysql>

Do you think this problem is related to my NIC/bridge configuration or the way 
I am configuring the zone ?

Jon

From: Rafael Weingärtner 
Sent: 07 June 2018 06:45
To: users
Subject: Re: advanced networking with public IPs direct to VMs

Can you also post the result of:
select * from cloud.storage_pool where removed is null

On Wed, Jun 6, 2018 at 3:06 PM, Dag Sonstebo 
wrote:

> Hi Jon,
>
> Still confused where your primary storage pools are – are you sure your
> hosts are in cluster 1?
>
> Quick question just to make sure - assuming management/storage is on the
> same NIC when I setup basic networking the physical network has the
> management and guest icons already there and I just edit the KVM labels. If
> I am running storage over management do I need to drag the storage icon to
> the physical network and use the same KVM label (cloudbr0) as the
> management or does CS automatically just use the management NIC ie. I would
> only need to drag the storage icon across in basic setup if I wanted it on
> a different NIC/IP subnet ?  (hope that makes sense !)
>
> >> I would do both – set up your 2/3 physical networks, name isn’t that
> important – but then drag the traffic types to the correct one and make
> sure the labels are correct.
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>
> On 06/06/2018, 12:39, "Jon Marshall"  wrote:
>
> Dag
>
>
> Do you mean  check the pools with "Infrastructure -> Primary Storage"
> and "Infrastructure -> Secondary Storage" within the UI ?
>
>
> If so Primary Storage has a state of UP, secondary storage does not
> show a state as such so not sure where else to check it ?
>
>
> Rerun of the command -
>
> mysql> select * from cloud.storage_pool where cluster_id = 1;
> Empty set (0.00 sec)
>
> mysql>
>
> I think it is something to do with my zone creation rather than the
> NIC, bridge setup although I can post those if needed.
>
> I may try to setup just the 2 NIC solution you mentioned although as I
> say I had the same issue with that ie. host goes to "Altert" state and same
> error messages.  The only time I can get it to go to "Down" state is when
> it is all on the single NIC.
>
> Quick question just to make sure - assuming management/storage is on
> the same NIC when I setup basic networking the physical network has the
> management and guest icons already there and I just edit the KVM labels. If
> I am running storage over management do I need to drag the storage icon to
> the physical network and use the same KVM label (cloudbr0) as the
> management or does CS automatically just use the management NIC ie. I would
> only need to drag the storage icon across in basic setup if I wanted it on
> a different NIC/IP subnet ?  (hope that makes sense !)
>
> On the plus side I have been at this for so long now and done so many
> rebuilds I could do it in my sleep now 
>
>
> __

Re: advanced networking with public IPs direct to VMs

2018-06-07 Thread Jon Marshall
Dag


Am not an SQL expert by any means but does this not show hosts are in cluster 1 
-


mysql> select name, cluster_id from cloud.host;
+-++
| name| cluster_id |
+-++
| dcp-cscn1.local |  1 |
| v-2-VM  |   NULL |
| s-1-VM  |   NULL |
| dcp-cscn2.local |  1 |
| dcp-cscn3.local |  1 |
+-++
5 rows in set (0.00 sec)

mysql>

I only have one cluster and those are the hosts I am using.


Jon



From: Dag Sonstebo 
Sent: 06 June 2018 19:06
To: users@cloudstack.apache.org
Subject: Re: advanced networking with public IPs direct to VMs

Hi Jon,

Still confused where your primary storage pools are – are you sure your hosts 
are in cluster 1?

Quick question just to make sure - assuming management/storage is on the same 
NIC when I setup basic networking the physical network has the management and 
guest icons already there and I just edit the KVM labels. If I am running 
storage over management do I need to drag the storage icon to the physical 
network and use the same KVM label (cloudbr0) as the management or does CS 
automatically just use the management NIC ie. I would only need to drag the 
storage icon across in basic setup if I wanted it on a different NIC/IP subnet 
?  (hope that makes sense !)

>> I would do both – set up your 2/3 physical networks, name isn’t that 
>> important – but then drag the traffic types to the correct one and make sure 
>> the labels are correct.
Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 06/06/2018, 12:39, "Jon Marshall"  wrote:

Dag


Do you mean  check the pools with "Infrastructure -> Primary Storage" and 
"Infrastructure -> Secondary Storage" within the UI ?


If so Primary Storage has a state of UP, secondary storage does not show a 
state as such so not sure where else to check it ?


Rerun of the command -

mysql> select * from cloud.storage_pool where cluster_id = 1;
Empty set (0.00 sec)

mysql>

I think it is something to do with my zone creation rather than the NIC, 
bridge setup although I can post those if needed.

I may try to setup just the 2 NIC solution you mentioned although as I say 
I had the same issue with that ie. host goes to "Altert" state and same error 
messages.  The only time I can get it to go to "Down" state is when it is all 
on the single NIC.

Quick question just to make sure - assuming management/storage is on the 
same NIC when I setup basic networking the physical network has the management 
and guest icons already there and I just edit the KVM labels. If I am running 
storage over management do I need to drag the storage icon to the physical 
network and use the same KVM label (cloudbr0) as the management or does CS 
automatically just use the management NIC ie. I would only need to drag the 
storage icon across in basic setup if I wanted it on a different NIC/IP subnet 
?  (hope that makes sense !)

On the plus side I have been at this for so long now and done so many 
rebuilds I could do it in my sleep now 



From: Dag Sonstebo 
Sent: 06 June 2018 12:28
To: users@cloudstack.apache.org
Subject: Re: advanced networking with public IPs direct to VMs

Looks OK to me Jon.

The one thing that throws me is your storage pools – can you rerun your 
query: select * from cloud.storage_pool where cluster_id = 1;

Do the pools show up as online in the CloudStack GUI?

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 06/06/2018, 12:08, "Jon Marshall"  wrote:

Don't know whether this helps or not but I logged into the SSVM and ran 
an ifconfig -


eth0: flags=4163  mtu 1500
inet 169.254.3.35  netmask 255.255.0.0  broadcast 
169.254.255.255
ether 0e:00:a9:fe:03:23  txqueuelen 1000  (Ethernet)
RX packets 141  bytes 20249 (19.7 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 108  bytes 16287 (15.9 KiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=4163  mtu 1500
inet 172.30.3.34  netmask 255.255.255.192  broadcast 172.30.3.63
ether 1e:00:3b:00:00:05  txqueuelen 1000  (Ethernet)
RX packets 56722  bytes 4953133 (4.7 MiB)
RX errors 0  dropped 44573  overruns 0  frame 0
TX packets 11224  bytes 1234932 (1.1 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth2: flags=4163  mtu 1500
inet 172.30.4.86  netmask 255.255.255.128  broadcast 
172.30.4.127
ether 1e:00:d9:00:00:53  txqueuelen 1000  (Ethernet)
RX packets 366191  bytes 43530055

Re: advanced networking with public IPs direct to VMs

2018-06-06 Thread Jon Marshall
Dag


Do you mean  check the pools with "Infrastructure -> Primary Storage" and 
"Infrastructure -> Secondary Storage" within the UI ?


If so Primary Storage has a state of UP, secondary storage does not show a 
state as such so not sure where else to check it ?


Rerun of the command -

mysql> select * from cloud.storage_pool where cluster_id = 1;
Empty set (0.00 sec)

mysql>

I think it is something to do with my zone creation rather than the NIC, bridge 
setup although I can post those if needed.

I may try to setup just the 2 NIC solution you mentioned although as I say I 
had the same issue with that ie. host goes to "Altert" state and same error 
messages.  The only time I can get it to go to "Down" state is when it is all 
on the single NIC.

Quick question just to make sure - assuming management/storage is on the same 
NIC when I setup basic networking the physical network has the management and 
guest icons already there and I just edit the KVM labels. If I am running 
storage over management do I need to drag the storage icon to the physical 
network and use the same KVM label (cloudbr0) as the management or does CS 
automatically just use the management NIC ie. I would only need to drag the 
storage icon across in basic setup if I wanted it on a different NIC/IP subnet 
?  (hope that makes sense !)

On the plus side I have been at this for so long now and done so many rebuilds 
I could do it in my sleep now 



From: Dag Sonstebo 
Sent: 06 June 2018 12:28
To: users@cloudstack.apache.org
Subject: Re: advanced networking with public IPs direct to VMs

Looks OK to me Jon.

The one thing that throws me is your storage pools – can you rerun your query: 
select * from cloud.storage_pool where cluster_id = 1;

Do the pools show up as online in the CloudStack GUI?

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 06/06/2018, 12:08, "Jon Marshall"  wrote:

Don't know whether this helps or not but I logged into the SSVM and ran an 
ifconfig -


eth0: flags=4163  mtu 1500
inet 169.254.3.35  netmask 255.255.0.0  broadcast 169.254.255.255
ether 0e:00:a9:fe:03:23  txqueuelen 1000  (Ethernet)
RX packets 141  bytes 20249 (19.7 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 108  bytes 16287 (15.9 KiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=4163  mtu 1500
inet 172.30.3.34  netmask 255.255.255.192  broadcast 172.30.3.63
ether 1e:00:3b:00:00:05  txqueuelen 1000  (Ethernet)
RX packets 56722  bytes 4953133 (4.7 MiB)
RX errors 0  dropped 44573  overruns 0  frame 0
TX packets 11224  bytes 1234932 (1.1 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth2: flags=4163  mtu 1500
inet 172.30.4.86  netmask 255.255.255.128  broadcast 172.30.4.127
ether 1e:00:d9:00:00:53  txqueuelen 1000  (Ethernet)
RX packets 366191  bytes 435300557 (415.1 MiB)
RX errors 0  dropped 39456  overruns 0  frame 0
TX packets 145065  bytes 7978602 (7.6 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth3: flags=4163  mtu 1500
inet 172.30.5.14  netmask 255.255.255.240  broadcast 172.30.5.15
ether 1e:00:cb:00:00:1a  txqueuelen 1000  (Ethernet)
RX packets 132440  bytes 426362982 (406.6 MiB)
RX errors 0  dropped 39446  overruns 0  frame 0
TX packets 67443  bytes 423670834 (404.0 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73  mtu 65536
inet 127.0.0.1  netmask 255.0.0.0
loop  txqueuelen 1  (Local Loopback)
RX packets 18  bytes 1440 (1.4 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 18  bytes 1440 (1.4 KiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


so it has interfaces in both the management and the storage subnets (as 
well as guest).




From: Jon Marshall 
Sent: 06 June 2018 11:08
To: users@cloudstack.apache.org
Subject: Re: advanced networking with public IPs direct to VMs

Hi Rafael


Thanks for the help, really appreciate it.


So rerunning that command with all servers up -



mysql> select * from cloud.storage_pool where cluster_id = 1 and removed is 
null;
Empty set (0.00 sec)

mysql>


As for the storage IP no I'm not setting it to be the management IP when I 
setup the zone but the output of the SQL command suggests that is what has 
happened.

As I said to Dag I am using a different subnet for storage ie.

172.30.3.0/26  - management subnet
172.30.4.0/25 -  guest VM subnet
172.30.5.0/28 - storage

the NFS server IP 

Re: advanced networking with public IPs direct to VMs

2018-06-06 Thread Jon Marshall
Don't know whether this helps or not but I logged into the SSVM and ran an 
ifconfig -


eth0: flags=4163  mtu 1500
inet 169.254.3.35  netmask 255.255.0.0  broadcast 169.254.255.255
ether 0e:00:a9:fe:03:23  txqueuelen 1000  (Ethernet)
RX packets 141  bytes 20249 (19.7 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 108  bytes 16287 (15.9 KiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=4163  mtu 1500
inet 172.30.3.34  netmask 255.255.255.192  broadcast 172.30.3.63
ether 1e:00:3b:00:00:05  txqueuelen 1000  (Ethernet)
RX packets 56722  bytes 4953133 (4.7 MiB)
RX errors 0  dropped 44573  overruns 0  frame 0
TX packets 11224  bytes 1234932 (1.1 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth2: flags=4163  mtu 1500
inet 172.30.4.86  netmask 255.255.255.128  broadcast 172.30.4.127
ether 1e:00:d9:00:00:53  txqueuelen 1000  (Ethernet)
RX packets 366191  bytes 435300557 (415.1 MiB)
RX errors 0  dropped 39456  overruns 0  frame 0
TX packets 145065  bytes 7978602 (7.6 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth3: flags=4163  mtu 1500
inet 172.30.5.14  netmask 255.255.255.240  broadcast 172.30.5.15
ether 1e:00:cb:00:00:1a  txqueuelen 1000  (Ethernet)
RX packets 132440  bytes 426362982 (406.6 MiB)
RX errors 0  dropped 39446  overruns 0  frame 0
TX packets 67443  bytes 423670834 (404.0 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73  mtu 65536
inet 127.0.0.1  netmask 255.0.0.0
loop  txqueuelen 1  (Local Loopback)
RX packets 18  bytes 1440 (1.4 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 18  bytes 1440 (1.4 KiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


so it has interfaces in both the management and the storage subnets (as well as 
guest).




From: Jon Marshall 
Sent: 06 June 2018 11:08
To: users@cloudstack.apache.org
Subject: Re: advanced networking with public IPs direct to VMs

Hi Rafael


Thanks for the help, really appreciate it.


So rerunning that command with all servers up -



mysql> select * from cloud.storage_pool where cluster_id = 1 and removed is 
null;
Empty set (0.00 sec)

mysql>


As for the storage IP no I'm not setting it to be the management IP when I 
setup the zone but the output of the SQL command suggests that is what has 
happened.

As I said to Dag I am using a different subnet for storage ie.

172.30.3.0/26  - management subnet
172.30.4.0/25 -  guest VM subnet
172.30.5.0/28 - storage

the NFS server IP is 172.30.5.2

each compute node has 3 NICs with an IP from each subnet (i am assuming the 
management node only needs an IP in the management network ?)

When I add the zone in the UI I have one physical network with management 
(cloudbr0), guest (cloudbr1) and storage (cloudbr2).
When I fill in the storage traffic page I use the range 172.16.5.10 - 14 as 
free IPs as I exclude the ones already allocated to the compute nodes and the 
NFS server.

I think maybe I am doing something wrong in the UI setup but it is not obvious 
to me what it is.

What I might try today unless you want me to keep the setup I have for more 
outputs is to go back to 2 NICs, one for storage/management and one for guest 
VMs.

I think with the 2 NICs setup the mistake I made last time when adding the zone 
was to assume storage would just run over management so I did not drag and drop 
the storage icon and assign it to cloudbr0 as with the management which I think 
is what I should do ?






From: Rafael Weingärtner 
Sent: 06 June 2018 10:54
To: users
Subject: Re: advanced networking with public IPs direct to VMs

Jon, do not panic we are here to help you :)
So, I might have mistyped the SQL query. You you use select * from
cloud.storage_pool where cluster_id = 1 and removed is not null ", you are
listing the storage pools removed. Therefore, the right query would be "
select * from cloud.storage_pool where cluster_id = 1 and removed is null "

There is also something else I do not understand. You are setting the
storage IP in the management subnet? I am not sure if you should be doing
like this. Normally, I set all my storages (primary[when working with NFS]
and secondary) to IPs in the storage subnet.

On Wed, Jun 6, 2018 at 6:49 AM, Dag Sonstebo 
wrote:

> Hi John,
>
> I’m late to this thread and have possibly missed some things – but a
> couple of observations:
>
> “When I add the zone and get to the storage web page I exclude the IPs
> already used for the compute node NICs and the NFS server itself. …..”
> “So the range is 172.30.5.1 -> 15 and the range I fill in is 172.30.5.10
> -> 172.30.5.14.”

Re: advanced networking with public IPs direct to VMs

2018-06-06 Thread Jon Marshall
Hi Rafael


Thanks for the help, really appreciate it.


So rerunning that command with all servers up -



mysql> select * from cloud.storage_pool where cluster_id = 1 and removed is 
null;
Empty set (0.00 sec)

mysql>


As for the storage IP no I'm not setting it to be the management IP when I 
setup the zone but the output of the SQL command suggests that is what has 
happened.

As I said to Dag I am using a different subnet for storage ie.

172.30.3.0/26  - management subnet
172.30.4.0/25 -  guest VM subnet
172.30.5.0/28 - storage

the NFS server IP is 172.30.5.2

each compute node has 3 NICs with an IP from each subnet (i am assuming the 
management node only needs an IP in the management network ?)

When I add the zone in the UI I have one physical network with management 
(cloudbr0), guest (cloudbr1) and storage (cloudbr2).
When I fill in the storage traffic page I use the range 172.16.5.10 - 14 as 
free IPs as I exclude the ones already allocated to the compute nodes and the 
NFS server.

I think maybe I am doing something wrong in the UI setup but it is not obvious 
to me what it is.

What I might try today unless you want me to keep the setup I have for more 
outputs is to go back to 2 NICs, one for storage/management and one for guest 
VMs.

I think with the 2 NICs setup the mistake I made last time when adding the zone 
was to assume storage would just run over management so I did not drag and drop 
the storage icon and assign it to cloudbr0 as with the management which I think 
is what I should do ?






From: Rafael Weingärtner 
Sent: 06 June 2018 10:54
To: users
Subject: Re: advanced networking with public IPs direct to VMs

Jon, do not panic we are here to help you :)
So, I might have mistyped the SQL query. You you use select * from
cloud.storage_pool where cluster_id = 1 and removed is not null ", you are
listing the storage pools removed. Therefore, the right query would be "
select * from cloud.storage_pool where cluster_id = 1 and removed is null "

There is also something else I do not understand. You are setting the
storage IP in the management subnet? I am not sure if you should be doing
like this. Normally, I set all my storages (primary[when working with NFS]
and secondary) to IPs in the storage subnet.

On Wed, Jun 6, 2018 at 6:49 AM, Dag Sonstebo 
wrote:

> Hi John,
>
> I’m late to this thread and have possibly missed some things – but a
> couple of observations:
>
> “When I add the zone and get to the storage web page I exclude the IPs
> already used for the compute node NICs and the NFS server itself. …..”
> “So the range is 172.30.5.1 -> 15 and the range I fill in is 172.30.5.10
> -> 172.30.5.14.”
>
> I think you may have some confusion around the use of the storage network.
> The important part here is to understand this is for *secondary storage*
> use only – it has nothing to do with primary storage. This means this
> storage network needs to be accessible to the SSVM, to the hypervisors, and
> secondary storage NFS pools needs to be accessible on this network.
>
> The important part – this also means you *can not use the same IP ranges
> for management and storage networks* - doing so means you will have issues
> where effectively both hypervisors and SSVM can see the same subnet on two
> NICs – and you end up in a routing black hole.
>
> So – you need to either:
>
> 1) Use different IP subnets on management and storage, or
> 2) preferably just simplify your setup – stop using a secondary storage
> network altogether and just allow secondary storage to use the management
> network (which is default). Unless you have a very high I/O environment in
> production you are just adding complexity by running separate management
> and storage.
>
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>
> On 06/06/2018, 10:18, "Jon Marshall"  wrote:
>
> I will disconnect the host this morning and test but before I do that
> I ran this command when all hosts are up -
>
>
>
>
>
>  select * from cloud.host;
> ++-+
> --++++--
> ---+-++-
> +-+--+--
> -+---++-
> --+-++--
> --+++-+--+--
> -+-+-+--
> ---+++--+---
> ---+++--+---
> +---+---
> +-++---

Re: advanced networking with public IPs direct to VMs

2018-06-06 Thread Jon Marshall
Hi Dag


Thanks for joining in.


I did use a separate network for management (172.30.3.0/27) and storage 
(172.30.5.0/28) when I configured the zone it is just for some reason it is not 
referencing the 172.30.5.x subnet anywhere in the SQL output.


My compute nodes have 3 NICs, one for management, one for guest VM traffic and 
one for storage, all different subnets and in different vlans on the switch.


I also set it up with two NICs just as you suggested with storage/management on 
one NIC and guest traffic on the other NIC and I got exactly the same result 
ie. host in "Alert" state and this from logs -



2018-06-04 12:53:45,853 WARN  [c.c.h.KVMInvestigator] 
(AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Agent investigation was 
requested on host Host[-2-Routing], but host does not support investigation 
because it has no NFS storage. Skipping investigation.
2018-06-04 12:53:45,854 DEBUG [c.c.h.HighAvailabilityManagerImpl] 
(AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) KVMInvestigator was able to 
determine host 2 is in Disconnected
2018-06-04 12:53:45,854 INFO  [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) The agent from host 2 state 
determined is Disconnected
2018-06-04 12:53:45,854 WARN  [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Agent is disconnected but the 
host is still up: 2-dcp-cscn2.local
2018-06-04 12:53:45,854 WARN  [o.a.c.alerts] (AgentTaskPool-3:ctx-0aed2673) 
(logid:32aaef2a) AlertType:: 7 | dataCenterId:: 1 | podId:: 1 | clusterId:: 
null | message:: Host disconnected, name: dcp-cscn2.local (id:2), availability 
zone: dcp1, pod: dcpp1


the only difference was when I configured the zone I did not have to configure 
cloudbr2 (for storage) and did not enter any storage traffic IP subnet range.

I know it is something stupid I am doing 



From: Dag Sonstebo 
Sent: 06 June 2018 10:49
To: users@cloudstack.apache.org
Subject: Re: advanced networking with public IPs direct to VMs

Hi John,

I’m late to this thread and have possibly missed some things – but a couple of 
observations:

“When I add the zone and get to the storage web page I exclude the IPs already 
used for the compute node NICs and the NFS server itself. …..”
“So the range is 172.30.5.1 -> 15 and the range I fill in is 172.30.5.10 -> 
172.30.5.14.”

I think you may have some confusion around the use of the storage network. The 
important part here is to understand this is for *secondary storage* use only – 
it has nothing to do with primary storage. This means this storage network 
needs to be accessible to the SSVM, to the hypervisors, and secondary storage 
NFS pools needs to be accessible on this network.

The important part – this also means you *can not use the same IP ranges for 
management and storage networks* - doing so means you will have issues where 
effectively both hypervisors and SSVM can see the same subnet on two NICs – and 
you end up in a routing black hole.

So – you need to either:

1) Use different IP subnets on management and storage, or
2) preferably just simplify your setup – stop using a secondary storage network 
altogether and just allow secondary storage to use the management network 
(which is default). Unless you have a very high I/O environment in production 
you are just adding complexity by running separate management and storage.

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 06/06/2018, 10:18, "Jon Marshall"  wrote:

I will disconnect the host this morning and test but before I do that I ran 
this command when all hosts are up -





 select * from cloud.host;

++-+--++++-+-++-+-+--+---+---++---+-+++++-+--+---+-+-+-+++--+--+++--+---+---+---+-+++-+-+-+--++---+-+--+
| id | name| uuid | status | 
type   | private_ip_address | private_netmask | private_mac_address 
| storage_ip_address | storage_netmask | storage_mac_address | 
storage_ip_address_2 | storage_mac_address_2 | storage_netmask_2 | cluster_id | 
public_ip_address | public_netmask  | public_mac_address | proxy_port | 
data_center_id | pod_id | cpu_sockets | cpus | speed | url  
   |

Re: advanced networking with public IPs direct to VMs

2018-06-06 Thread Jon Marshall
rom storage_pool where cluster_id =  and removed
> is not null
>

Can you run that SQL to see the its return when your hosts are marked as
disconnected?

On Tue, Jun 5, 2018 at 11:32 AM, Jon Marshall  wrote:

> I reran the tests with the 3 NIC setup. When I configured the zone through
> the UI I used the labels cloudbr0 for management, cloudbr1 for guest
> traffic and cloudbr2 for NFS as per my original response to you.
>
>
> When I pull the power to the node (dcp-cscn2.local) after about 5 mins
> the  host status goes to "Alert" but never to "Down"
>
>
> I get this in the logs -
>
>
> 2018-06-05 15:17:14,382 WARN  [c.c.h.KVMInvestigator]
> (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent investigation was
> requested on host Host[-4-Routing], but host does not support investigation
> because it has no NFS storage. Skipping investigation.
> 2018-06-05 15:17:14,382 DEBUG [c.c.h.HighAvailabilityManagerImpl]
> (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) KVMInvestigator was able to
> determine host 4 is in Disconnected
> 2018-06-05 15:17:14,382 INFO  [c.c.a.m.AgentManagerImpl]
> (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) The agent from host 4 state
> determined is Disconnected
> 2018-06-05 15:17:14,382 WARN  [c.c.a.m.AgentManagerImpl]
> (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent is disconnected but
> the host is still up: 4-dcp-cscn2.local
>
> I don't understand why it thinks there is no NFS storage as each compute
> node has a dedicated storage NIC.
>
>
> I also don't understand why it thinks the host is still up ie. what test
> is it doing to determine that ?
>
>
> Am I just trying to get something working that is not supported ?
>
>
> 
> From: Rafael Weingärtner 
> Sent: 04 June 2018 15:31
> To: users
> Subject: Re: advanced networking with public IPs direct to VMs
>
> What type of failover are you talking about?
> What ACS version are you using?
> What hypervisor are you using?
> How are you configuring your NICs in the hypervisor?
> How are you configuring the traffic labels in ACS?
>
> On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall 
> wrote:
>
> > Hi all
> >
> >
> > I am close to giving up on basic networking as I just cannot get failover
> > working with multiple NICs (I am not even sure it is supported).
> >
> >
> > What I would like is to use 3 NICs for management, storage and guest
> > traffic. I would like to assign public IPs direct to the VMs which is
> why I
> > originally chose basic.
> >
> >
> > If I switch to advanced networking do I just configure a guest VM with
> > public IPs on one NIC and not both with the public traffic -
> >
> >
> > would this work ?
> >
>
>
>
> --
> Rafael Weingärtner
>



--
Rafael Weingärtner


Re: advanced networking with public IPs direct to VMs

2018-06-05 Thread Jon Marshall
No problem.


I am leaving work now but will test first thing tomorrow and get back to you.


I definitely have NFS storage as far as I can tell !



From: Rafael Weingärtner 
Sent: 05 June 2018 16:13
To: users
Subject: Re: advanced networking with public IPs direct to VMs

That is interesting. Let's see the source of all truth...
This is the code that is generating that odd message.

> List clusterPools =
> _storagePoolDao.listPoolsByCluster(agent.getClusterId());
> boolean hasNfs = false;
> for (StoragePoolVO pool : clusterPools) {
> if (pool.getPoolType() == StoragePoolType.NetworkFilesystem) {
> hasNfs = true;
> break;
> }
> }
> if (!hasNfs) {
> s_logger.warn(
> "Agent investigation was requested on host " + agent +
> ", but host does not support investigation because it has no NFS storage.
> Skipping investigation.");
> return Status.Disconnected;
> }
>

There are two possibilities here. You do not have any NFS storage? Is that
the case? Or maybe, for some reason, the call
"_storagePoolDao.listPoolsByCluster(agent.getClusterId())" is not returning
any NFS storage pools. Looking at the "listPoolsByCluster " we will see
that the following SQL is used:

Select * from storage_pool where cluster_id =  and removed
> is not null
>

Can you run that SQL to see the its return when your hosts are marked as
disconnected?

On Tue, Jun 5, 2018 at 11:32 AM, Jon Marshall  wrote:

> I reran the tests with the 3 NIC setup. When I configured the zone through
> the UI I used the labels cloudbr0 for management, cloudbr1 for guest
> traffic and cloudbr2 for NFS as per my original response to you.
>
>
> When I pull the power to the node (dcp-cscn2.local) after about 5 mins
> the  host status goes to "Alert" but never to "Down"
>
>
> I get this in the logs -
>
>
> 2018-06-05 15:17:14,382 WARN  [c.c.h.KVMInvestigator]
> (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent investigation was
> requested on host Host[-4-Routing], but host does not support investigation
> because it has no NFS storage. Skipping investigation.
> 2018-06-05 15:17:14,382 DEBUG [c.c.h.HighAvailabilityManagerImpl]
> (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) KVMInvestigator was able to
> determine host 4 is in Disconnected
> 2018-06-05 15:17:14,382 INFO  [c.c.a.m.AgentManagerImpl]
> (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) The agent from host 4 state
> determined is Disconnected
> 2018-06-05 15:17:14,382 WARN  [c.c.a.m.AgentManagerImpl]
> (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent is disconnected but
> the host is still up: 4-dcp-cscn2.local
>
> I don't understand why it thinks there is no NFS storage as each compute
> node has a dedicated storage NIC.
>
>
> I also don't understand why it thinks the host is still up ie. what test
> is it doing to determine that ?
>
>
> Am I just trying to get something working that is not supported ?
>
>
> 
> From: Rafael Weingärtner 
> Sent: 04 June 2018 15:31
> To: users
> Subject: Re: advanced networking with public IPs direct to VMs
>
> What type of failover are you talking about?
> What ACS version are you using?
> What hypervisor are you using?
> How are you configuring your NICs in the hypervisor?
> How are you configuring the traffic labels in ACS?
>
> On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall 
> wrote:
>
> > Hi all
> >
> >
> > I am close to giving up on basic networking as I just cannot get failover
> > working with multiple NICs (I am not even sure it is supported).
> >
> >
> > What I would like is to use 3 NICs for management, storage and guest
> > traffic. I would like to assign public IPs direct to the VMs which is
> why I
> > originally chose basic.
> >
> >
> > If I switch to advanced networking do I just configure a guest VM with
> > public IPs on one NIC and not both with the public traffic -
> >
> >
> > would this work ?
> >
>
>
>
> --
> Rafael Weingärtner
>



--
Rafael Weingärtner


Re: advanced networking with public IPs direct to VMs

2018-06-05 Thread Jon Marshall
I reran the tests with the 3 NIC setup. When I configured the zone through the 
UI I used the labels cloudbr0 for management, cloudbr1 for guest traffic and 
cloudbr2 for NFS as per my original response to you.


When I pull the power to the node (dcp-cscn2.local) after about 5 mins the  
host status goes to "Alert" but never to "Down"


I get this in the logs -


2018-06-05 15:17:14,382 WARN  [c.c.h.KVMInvestigator] 
(AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent investigation was 
requested on host Host[-4-Routing], but host does not support investigation 
because it has no NFS storage. Skipping investigation.
2018-06-05 15:17:14,382 DEBUG [c.c.h.HighAvailabilityManagerImpl] 
(AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) KVMInvestigator was able to 
determine host 4 is in Disconnected
2018-06-05 15:17:14,382 INFO  [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) The agent from host 4 state 
determined is Disconnected
2018-06-05 15:17:14,382 WARN  [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent is disconnected but the 
host is still up: 4-dcp-cscn2.local

I don't understand why it thinks there is no NFS storage as each compute node 
has a dedicated storage NIC.


I also don't understand why it thinks the host is still up ie. what test is it 
doing to determine that ?


Am I just trying to get something working that is not supported ?



From: Rafael Weingärtner 
Sent: 04 June 2018 15:31
To: users
Subject: Re: advanced networking with public IPs direct to VMs

What type of failover are you talking about?
What ACS version are you using?
What hypervisor are you using?
How are you configuring your NICs in the hypervisor?
How are you configuring the traffic labels in ACS?

On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall  wrote:

> Hi all
>
>
> I am close to giving up on basic networking as I just cannot get failover
> working with multiple NICs (I am not even sure it is supported).
>
>
> What I would like is to use 3 NICs for management, storage and guest
> traffic. I would like to assign public IPs direct to the VMs which is why I
> originally chose basic.
>
>
> If I switch to advanced networking do I just configure a guest VM with
> public IPs on one NIC and not both with the public traffic -
>
>
> would this work ?
>



--
Rafael Weingärtner


Re: advanced networking with public IPs direct to VMs

2018-06-05 Thread Jon Marshall
I think I do know what it means.


Let me build it with 3 separate NICs again an rerun.



From: Rafael Weingärtner 
Sent: 04 June 2018 15:31
To: users
Subject: Re: advanced networking with public IPs direct to VMs

What type of failover are you talking about?
What ACS version are you using?
What hypervisor are you using?
How are you configuring your NICs in the hypervisor?
How are you configuring the traffic labels in ACS?

On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall  wrote:

> Hi all
>
>
> I am close to giving up on basic networking as I just cannot get failover
> working with multiple NICs (I am not even sure it is supported).
>
>
> What I would like is to use 3 NICs for management, storage and guest
> traffic. I would like to assign public IPs direct to the VMs which is why I
> originally chose basic.
>
>
> If I switch to advanced networking do I just configure a guest VM with
> public IPs on one NIC and not both with the public traffic -
>
>
> would this work ?
>



--
Rafael Weingärtner


Re: advanced networking with public IPs direct to VMs

2018-06-05 Thread Jon Marshall
Update to this.


I ran the all on one NIC test again and it does report as "Down" in the UI as 
opposed to "Alert" when using multiple NICs.


Looking at the management server log this seems to be the key part -


1) from the single NIC logs -


2018-06-04 10:17:10,967 DEBUG [c.c.h.KVMInvestigator] 
(AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) Neighbouring host:5 returned 
status:Down for the investigated host:4
2018-06-04 10:17:10,967 DEBUG [c.c.h.KVMInvestigator] 
(AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) HA: HOST is ineligible legacy 
state Down for host 4
2018-06-04 10:17:10,967 DEBUG [c.c.h.HighAvailabilityManagerImpl] 
(AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) KVMInvestigator was able to 
determine host 4 is in Down
2018-06-04 10:17:10,967 INFO  [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) The agent from host 4 state 
determined is Down
2018-06-04 10:17:10,967 ERROR [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) Host is down: 
4-dcp-cscn2.local. Starting HA on the VMs



2) from the setup with 2 NICs (managemnet/storage on one NIC, guest traffic on 
the other) -



2018-06-04 12:53:45,853 WARN  [c.c.h.KVMInvestigator] 
(AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Agent investigation was 
requested on host Host[-2-Routing], but host does not support investigation 
because it has no NFS storage. Skipping investigation.
2018-06-04 12:53:45,854 DEBUG [c.c.h.HighAvailabilityManagerImpl] 
(AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) KVMInvestigator was able to 
determine host 2 is in Disconnected
2018-06-04 12:53:45,854 INFO  [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) The agent from host 2 state 
determined is Disconnected
2018-06-04 12:53:45,854 WARN  [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Agent is disconnected but the 
host is still up: 2-dcp-cscn2.local
2018-06-04 12:53:45,854 WARN  [o.a.c.alerts] (AgentTaskPool-3:ctx-0aed2673) 
(logid:32aaef2a) AlertType:: 7 | dataCenterId:: 1 | podId:: 1 | clusterId:: 
null | message:: Host disconnected, name: dcp-cscn2.local (id:2), availability 
zone: dcp1, pod: dcpp1
2018-06-04 12:53:45,858 INFO  [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Host 2 is disconnecting with 
event AgentDisconnected
2018-06-04 12:53:45,858 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) The next status of agent 2is 
Alert, current status is Up
2018-06-04 12:53:45,858 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Deregistering link for 2 with 
state Alert
2018-06-04 12:53:45,859 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Remove Agent : 2


I don't know what it means by host has no NFS storage but you can see it never 
marks the failed node as down.


Any ideas ?





From: Rafael Weingärtner 
Sent: 04 June 2018 21:15
To: users
Subject: Re: advanced networking with public IPs direct to VMs

Everything seems to be normal at a first glance. Do you see some sort of
error in the log files?

On Mon, Jun 4, 2018 at 11:39 AM, Jon Marshall  wrote:

> CS version 4.11
>
> VM HA at the moment (not Host HA as yet)
>
> KVM
>
>
> For the management node just one NIC - 172.30.3.2/26 assigned to physical
> NIC.
>
>
> For the compute nodes -
>
>
> 3 NICs so as an example from one compute node -
>
>
> ifcfg-eth0
>
> BRIDGE=cloudbr0
>
>
> ifcfg-eth1
>
> BRIDGE=cloudbr1
>
>
> ifcfg-eth2
>
> BRIDGE=cloudbr2
>
>
> then the 3 bridges -
>
>
> ifcfg-cloudbr0
>
> ip address 172.30.3.3/26<--- management network
>
>
> if-cloudbr1
>
> ip address 172.30.4.3/25  <-- guest traffic
>
> gateway 172.30.4.1
>
>
>
> ifcfg-cloubr2
>
> ip address 172.30.5.3 /28 <-- storage traffic
>
>
> traffic labels would be cloudbr0, cloudbr1, cloudbr2
>
>
> Can only get failover working when I put all traffic on same NIC.
>
>
>
> 
> From: Rafael Weingärtner 
> Sent: 04 June 2018 15:31
> To: users
> Subject: Re: advanced networking with public IPs direct to VMs
>
> What type of failover are you talking about?
> What version are you using?
> What hypervisor are you using?
> How are you configuring your NICs in the hypervisor?
> How are you configuring the traffic labels in ACS?
>
> On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall 
> wrote:
>
> > Hi all
> >
> >
> > I am close to giving up on basic networking as I just cannot get failover
> > working with multiple NICs (I am not even sure it is supported).
> >
> >
> > What I would like is to use 3 NICs for management, storage and guest
> >

Re: advanced networking with public IPs direct to VMs

2018-06-05 Thread Jon Marshall
No, watching the management server logs when I pull the power on one of the 
compute nodes it recognises the host is not responding to a ping and eventually 
marks the host status as "Alert" in the UI but it never tries to migrate the 
VMs that was running on the node.


>From memory when I put everything on one NIC (management, storage and guest 
>traffic) the host status is marked as "Down" not alert which makes me think 
>there is something not supported with multiple NICs and failover.


It is almost as though with multiple NICs the manager knows that there is a 
problem with the node but cannot definitely say it is down and so it cannot 
migrate the VM in case it is still running on that node.


I have been at this for well over a month now (off and on) and apart from when 
I used a single NIC VM HA has never worked. If the configuration I have posted 
looks okay then maybe it is just not supported unless of course you know 
differently ?


I did think it may be the default gateway being set to the guest VM subnet but 
if I don't do this then the SSVM has issues with communication.


I am going to do a side by side comparison of the management server logs for 
single NIC vs dual NICs (management/storage on one NIC, the other NIC for guest 
VMs) and see if there is anything obvious that stands out.


That aside if I can't get this working then can I just assign a public IP 
subnet to the guest VM when setting up advanced networking and if so how does 
it then in effect bypass the virtual router (in terms of NAT) or do I not need 
to worry about this ?

Thanks

From: Rafael Weingärtner 
Sent: 04 June 2018 21:15
To: users
Subject: Re: advanced networking with public IPs direct to VMs

Everything seems to be normal at a first glance. Do you see some sort of
error in the log files?

On Mon, Jun 4, 2018 at 11:39 AM, Jon Marshall  wrote:

> CS version 4.11
>
> VM HA at the moment (not Host HA as yet)
>
> KVM
>
>
> For the management node just one NIC - 172.30.3.2/26 assigned to physical
> NIC.
>
>
> For the compute nodes -
>
>
> 3 NICs so as an example from one compute node -
>
>
> ifcfg-eth0
>
> BRIDGE=cloudbr0
>
>
> ifcfg-eth1
>
> BRIDGE=cloudbr1
>
>
> ifcfg-eth2
>
> BRIDGE=cloudbr2
>
>
> then the 3 bridges -
>
>
> ifcfg-cloudbr0
>
> ip address 172.30.3.3/26<--- management network
>
>
> if-cloudbr1
>
> ip address 172.30.4.3/25  <-- guest traffic
>
> gateway 172.30.4.1
>
>
>
> ifcfg-cloubr2
>
> ip address 172.30.5.3 /28 <-- storage traffic
>
>
> traffic labels would be cloudbr0, cloudbr1, cloudbr2
>
>
> Can only get failover working when I put all traffic on same NIC.
>
>
>
> 
> From: Rafael Weingärtner 
> Sent: 04 June 2018 15:31
> To: users
> Subject: Re: advanced networking with public IPs direct to VMs
>
> What type of failover are you talking about?
> What version are you using?
> What hypervisor are you using?
> How are you configuring your NICs in the hypervisor?
> How are you configuring the traffic labels in ACS?
>
> On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall 
> wrote:
>
> > Hi all
> >
> >
> > I am close to giving up on basic networking as I just cannot get failover
> > working with multiple NICs (I am not even sure it is supported).
> >
> >
> > What I would like is to use 3 NICs for management, storage and guest
> > traffic. I would like to assign public IPs direct to the VMs which is
> why I
> > originally chose basic.
> >
> >
> > If I switch to advanced networking do I just configure a guest VM with
> > public IPs on one NIC and not both with the public traffic -
> >
> >
> > would this work ?
> >
>
>
>
> --
> Rafael Weingärtner
>



--
Rafael Weingärtner


Re: advanced networking with public IPs direct to VMs

2018-06-04 Thread Jon Marshall
CS version 4.11

VM HA at the moment (not Host HA as yet)

KVM


For the management node just one NIC - 172.30.3.2/26 assigned to physical NIC.


For the compute nodes -


3 NICs so as an example from one compute node -


ifcfg-eth0

BRIDGE=cloudbr0


ifcfg-eth1

BRIDGE=cloudbr1


ifcfg-eth2

BRIDGE=cloudbr2


then the 3 bridges -


ifcfg-cloudbr0

ip address 172.30.3.3/26<--- management network


if-cloudbr1

ip address 172.30.4.3/25  <-- guest traffic

gateway 172.30.4.1



ifcfg-cloubr2

ip address 172.30.5.3 /28 <-- storage traffic


traffic labels would be cloudbr0, cloudbr1, cloudbr2


Can only get failover working when I put all traffic on same NIC.




From: Rafael Weingärtner 
Sent: 04 June 2018 15:31
To: users
Subject: Re: advanced networking with public IPs direct to VMs

What type of failover are you talking about?
What version are you using?
What hypervisor are you using?
How are you configuring your NICs in the hypervisor?
How are you configuring the traffic labels in ACS?

On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall  wrote:

> Hi all
>
>
> I am close to giving up on basic networking as I just cannot get failover
> working with multiple NICs (I am not even sure it is supported).
>
>
> What I would like is to use 3 NICs for management, storage and guest
> traffic. I would like to assign public IPs direct to the VMs which is why I
> originally chose basic.
>
>
> If I switch to advanced networking do I just configure a guest VM with
> public IPs on one NIC and not both with the public traffic -
>
>
> would this work ?
>



--
Rafael Weingärtner


Re: advanced networking with public IPs direct to VMs

2018-06-04 Thread Jon Marshall
Sorry that should say "not bother with the public traffic"


____
From: Jon Marshall 
Sent: 04 June 2018 15:29
To: users@cloudstack.apache.org
Subject: advanced networking with public IPs direct to VMs

Hi all


I am close to giving up on basic networking as I just cannot get failover 
working with multiple NICs (I am not even sure it is supported).


What I would like is to use 3 NICs for management, storage and guest traffic. I 
would like to assign public IPs direct to the VMs which is why I originally 
chose basic.


If I switch to advanced networking do I just configure a guest VM with public 
IPs on one NIC and not both with the public traffic -


would this work ?


advanced networking with public IPs direct to VMs

2018-06-04 Thread Jon Marshall
Hi all


I am close to giving up on basic networking as I just cannot get failover 
working with multiple NICs (I am not even sure it is supported).


What I would like is to use 3 NICs for management, storage and guest traffic. I 
would like to assign public IPs direct to the VMs which is why I originally 
chose basic.


If I switch to advanced networking do I just configure a guest VM with public 
IPs on one NIC and not both with the public traffic -


would this work ?


Re: 4.11 without Host-HA framework

2018-06-04 Thread Jon Marshall
As mentioned if I use just the one NIC for all traffic then VM HA works.


I have been using this document to understand CS network concepts -


https://www.shapeblue.com/understanding-cloudstacks-physical-networking-architecture/


I have been assuming that the manager node only needs an interface in the 
management network and it is the compute nodes that I have split the traffic 
across 3 NICs as per the above doc.


Does the manager need NICs in the other networks as well ?


Jon




From: Paul Angus 
Sent: 25 May 2018 07:37
To: users@cloudstack.apache.org
Subject: RE: 4.11 without Host-HA framework

I'm on leave next week, but I'll pick this up again when I'm back ...

paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-
From: Jon Marshall 
Sent: 24 May 2018 11:20
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Hi Parth


I remember you saying this worked for you in a previous thread.


I am beginning to wonder if it is the fact I have used 3 separate NICs, one for 
management, one for the VM traffic and the third for storage that I am not 
seeing the behaviour you saw.


That is why, I too would like to understand exactly what is talking to what and 
doing checks for both non Host-HA and Host-HA.


I did get failover working in some scenarios with Host-HA and OOBM using IPMI 
but it was slow even after tweaking the timers eg. for a crashed host the best 
time i got was around 8 minutes which seems a long time but perhaps that is an 
acceptable time for CS, I just don't know.


Not expecting it to be instantaneous as it needs to do checks etc.


Jon



From: Parth Patel 
Sent: 24 May 2018 06:52
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Hi Jon and Angus,

I did not shutdown the VMs as Yiping Zhang said, but I have confirmed this and 
discussed earlier in the users list that my HA-enabled VMs got started on 
another suitable available host in the cluster even when I didn't have 
IPMI-enabled hardware and did no configuration for OOBM and Host-HA. I simply 
pulled the ethernet cable connecting the host to entire network (I did use just 
one NIC) and according to the value set in ping timeout event, the HA-enabled 
VMs were restarted on another available host. I tested the scenario using both 
the scenarios: the echo command as well as good old plugging out the NIC from 
the host. My VMs were successfully started on another available host after CS 
manager confirmed they were not reachable.

I too want to understand how the failover mechanism in CloudStack actually 
works. I used ACS 4.11 packages available here:
http://cloudstack.apt-get.eu/centos/7/4.11/

Regards,
Parth Patel


On Thu, 24 May 2018 at 10:53 Paul Angus  wrote:

> I'm afraid that is not a host crash.  When shutting down the guest OS,
> the CloudStack agent on the host is still able to report to the
> management server that the VM has stopped.
>
> This is my point. VM-HA relies on the management sever communication
> with the host agent.
>
> Kind regards,
>
> Paul Angus
>
> paul.an...@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue
>
>
>
>
> -Original Message-
> From: Yiping Zhang 
> Sent: 24 May 2018 00:44
> To: users@cloudstack.apache.org
> Subject: Re: 4.11 without Host-HA framework
>
> I can say for fact that VM's using a HA enabled service offering will be
> restarted by CS on another host, assuming there are enough
> capacity/resources in the cluster, when their original host crashes,
> regardless that host comes back or not.
>
> The simplest way to test VM HA feature with a VM instance using HA enabled
> service offering is to issue shutdown command in guest OS, and watching it
> gets restarted by CS manager.
>
> On 5/23/18, 1:23 PM, "Paul Angus"  wrote:
>
> Hi Jon,
>
> Don't worry, TBH I'm dubious about those claiming to have VM-HA
> working when a host crashes (but doesn't restart).
> I'll check in with the guys that set values for host-ha when testing,
> to see which ones they change and what they set them to.
>
> paul.an...@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>
> -Original Message-
> From: Jon Marshall 
> Sent: 23 May 2018 21:10
> To: users@cloudstack.apache.org
> Subject: Re: 4.11 without Host-HA framework
>
> Rohit / Paul
>
>
> Thanks again for answering.
>
>
> I am a Cisco guy with an ex Unix background but no virtualisation
> e

Re: 4.11 without Host-HA framework

2018-06-01 Thread Jon Marshall
Update on this.


I put everything (management, storage and guest VMs) on single NIC so all in 
same subnet and VM HA failover worked. It took about 6 1/2 minutes with default 
timers before the VM was responding to a ping after being migrated.


So it looks like it is something with the network setup I am doing.



The manager node hast just a single NIC in the management subnet  - 
172.30.3.0/27 and the IP is assigned directly to the NIC.


Each compute node has -


1)  a NIC in the management subnet - 172.30.3.0/27

2) a NIC in the guest VM subnet - 172.30.4.0/25

3) a NIC in the storage subnet - 172.30.5.0/28  (NFS server is also in this 
subnet)


None of the NICs are vlan aware but the ports they connect to on the switch are 
in different vlans.


3 bridges are used on each node - cloudbr0 for management, cloudbr1 for guest 
VMs and cloudbr2 for storage.  Only the ifcfg-cloudbr1 configuration references 
a default gateway because I read somewhere that is what should be used and I 
seemed to remember I had trouble with SSVM until I did this.


When setting up the cloud I exclude the already used management IPs on the 
nodes from the range you enter as I had issues with the system VMs picking up 
IPs already in use.


Same reasoning behind storage ie. I exclude IPs already used for NFS server and 
compute nodes.


Can anyone see where any of the above could be causing an issue ?


Many thanks for any help given




From: Rohit Yadav 
Sent: 23 May 2018 10:45
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Jon,


In the VM's compute offering, make sure that HA is ticked/enabled. Then use 
that HA-enabled VM offering while deploying a VM. Around testing - it depends 
how you're crashing. In case of KVM, you can try to cause host crash (example: 
echo c > /proc/sysrq-trigger) and see if HA-enabled VMs gets started on a 
different host.


- Rohit

<https://cloudstack.apache.org>
[https://cloudstack.apache.org/images/monkey-144.png]<https://cloudstack.apache.org/>

Apache CloudStack: Open Source Cloud Computing<https://cloudstack.apache.org/>
cloudstack.apache.org
CloudStack is open source cloud computing software for creating, managing, and 
deploying infrastructure cloud services






________
From: Jon Marshall 
Sent: Tuesday, May 22, 2018 8:28:06 PM
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Hi Rohit


Thanks for responding.


I have not had much luck with HA at all.  I crash a server and nothing happens  
in terms of VMs migrating to another host. Monitoring the management log file 
it seems the management server recognises the host has stopped responding to 
pings but doesn't think it has to do anything.


I am currently running v4.11 with basic network but 3 separate NICs, one for 
management, one for storage and one for VMs themselves.


Should it make it any difference ie. would it be worth trying to run management 
and storage over the same NIC ?


I am just lost as to why I see no failover at all whereas others are reporting 
it works fine.


Jon



From: Rohit Yadav 
Sent: 22 May 2018 12:12
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Hi Jon,


Yes, Host-HA is different from VM-HA and without Host HA enabled a HA enabled 
VM should be recovered/run on a different host when it crashes. Historically 
the term 'HA' in CloudStack is used around high availability of a VM.


Host HA as the name tries to imply is around HA of a physical hypervisor host 
by means of out-of-band management technologies such as ipmi and currently 
supporting ipmi as OOBM and KVM hosts with NFS storage.


- Rohit

<https://cloudstack.apache.org>
[https://cloudstack.apache.org/images/monkey-144.png]<https://cloudstack.apache.org/>

Apache CloudStack: Open Source Cloud Computing<https://cloudstack.apache.org/>
cloudstack.apache.org
CloudStack is open source cloud computing software for creating, managing, and 
deploying infrastructure cloud services






____
From: Jon Marshall 
Sent: Monday, May 21, 2018 8:36:04 PM
To: users@cloudstack.apache.org
Subject: 4.11 without Host-HA framework

I keep seeing conflicting information about this in the mailing lists and in 
blogs etc.

If I run 4.11 without enabling Host HA framework should HA still work if I 
crash a compute node because my understanding was the new framework was added 
for certain cases only.

It doesn't work for me but I can find a number of people saying you don't need 
to enable the new framework for it to work.

Thanks

Jon

rohit.ya...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




rohit.ya...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue





Re: 4.11 without Host-HA framework

2018-06-01 Thread Jon Marshall

Hi Rohit


In an attempt to make things simpler I am now running the management and 
storage (NFS) across the same NIC with a separate NIC for the guest VMs. So 
basic networking, one subnet for management/storage and a different one for 
guest VMs which means two bridges.


I am also just testing VM HA (not Host HA at present)


1 manager and 3 compute nodes. I crash a compute node or pull the power on the 
node and monitor the management server log.


It reports the ping timeouts and once then after the ping interval * ping 
timoeut time it marks the host as state Alert in the UI. So far so good.


But it never tries to migrate the VM running on the crashed node.  Not a single 
message about attempting to restart, nothing.


The VM has been setup with a compute offering with HA enabled.


Any thoughts as to why it is not trying to restart the VM on another of the 
nodes (there is capacity as one of the nodes has no VMs on it)

.

The only other thing I can try is to use just one NIC for everything and see if 
I get anywhere with that.


Jon




From: Rohit Yadav 
Sent: 23 May 2018 10:45
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Jon,


In the VM's compute offering, make sure that HA is ticked/enabled. Then use 
that HA-enabled VM offering while deploying a VM. Around testing - it depends 
how you're crashing. In case of KVM, you can try to cause host crash (example: 
echo c > /proc/sysrq-trigger) and see if HA-enabled VMs gets started on a 
different host.


- Rohit

<https://cloudstack.apache.org>
[https://cloudstack.apache.org/images/monkey-144.png]<https://cloudstack.apache.org/>

Apache CloudStack: Open Source Cloud Computing<https://cloudstack.apache.org/>
cloudstack.apache.org
CloudStack is open source cloud computing software for creating, managing, and 
deploying infrastructure cloud services






________
From: Jon Marshall 
Sent: Tuesday, May 22, 2018 8:28:06 PM
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Hi Rohit


Thanks for responding.


I have not had much luck with HA at all.  I crash a server and nothing happens  
in terms of VMs migrating to another host. Monitoring the management log file 
it seems the management server recognises the host has stopped responding to 
pings but doesn't think it has to do anything.


I am currently running v4.11 with basic network but 3 separate NICs, one for 
management, one for storage and one for VMs themselves.


Should it make it any difference ie. would it be worth trying to run management 
and storage over the same NIC ?


I am just lost as to why I see no failover at all whereas others are reporting 
it works fine.


Jon



From: Rohit Yadav 
Sent: 22 May 2018 12:12
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Hi Jon,


Yes, Host-HA is different from VM-HA and without Host HA enabled a HA enabled 
VM should be recovered/run on a different host when it crashes. Historically 
the term 'HA' in CloudStack is used around high availability of a VM.


Host HA as the name tries to imply is around HA of a physical hypervisor host 
by means of out-of-band management technologies such as ipmi and currently 
supporting ipmi as OOBM and KVM hosts with NFS storage.


- Rohit

<https://cloudstack.apache.org>
[https://cloudstack.apache.org/images/monkey-144.png]<https://cloudstack.apache.org/>

Apache CloudStack: Open Source Cloud Computing<https://cloudstack.apache.org/>
cloudstack.apache.org
CloudStack is open source cloud computing software for creating, managing, and 
deploying infrastructure cloud services






____
From: Jon Marshall 
Sent: Monday, May 21, 2018 8:36:04 PM
To: users@cloudstack.apache.org
Subject: 4.11 without Host-HA framework

I keep seeing conflicting information about this in the mailing lists and in 
blogs etc.

If I run 4.11 without enabling Host HA framework should HA still work if I 
crash a compute node because my understanding was the new framework was added 
for certain cases only.

It doesn't work for me but I can find a number of people saying you don't need 
to enable the new framework for it to work.

Thanks

Jon

rohit.ya...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




rohit.ya...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue





Re: Basic networking setup

2018-05-29 Thread Jon Marshall
So everything on one subnet/vlan except guest traffic which has it's own.


Man thanks for that.



From: Ivan Kudryavtsev 
Sent: 29 May 2018 10:49
To: users
Subject: Re: Basic networking setup

Hello, Jon,

Basically following schema is used for a basic zone:
1. system VMs and hardware servers (heads, secondary storages, hypervisors)
use a fake net like 10.0.0.0/16 (I also do NAT all those nodes thru heads
to avoid public IPs, or separate security appliance can be used);
2. guest network - separate CIDR used;

I still think that the sentence you cite is correct. Every pod has
dedicated CIDR (pt2) which assigned to guest VMs, the same stuff (actually)
is true for management, but this is another CIDR (pt1).

Some people also suggest using a separate network for storage, but I don't
see advantages for small and medium deployments.

Cheers.

2018-05-29 16:12 GMT+07:00 Jon Marshall :

> From the 4.11 documentation -
>
>
> "When basic networking is used, CloudStack will assign IP addresses in the
> CIDR of the pod to the guests in that pod. The administrator must add a
> Direct IP range on the pod for this purpose. These IPs are in the same VLAN
> as the hosts."
>
>
> It may be the way it is written but the above suggests that the IP subnet
> used for guest VM traffic is the same IP subnet used for the actual hosts
> themselves.
>
>
> But in the same documentation it says it recommends the use of separate
> NICs for management and guest traffic.
>
>
> I have setup CS using separate subnets for management, Guest VMs and
> storage so 3 separate NICs each in a different vlan using a different IP
> subnet. (the NICs are not vlan aware, just connecting to ports in different
> vlans on the switch).
>
>
> Should I be using just the one IP subnet for all NICs and simply
> connecting them all to the same bridge instead ?
>
>
> Jon
>
>
>


--
With best regards, Ivan Kudryavtsev
Bitworks Software, Ltd.
Cell: +7-923-414-1515
WWW: http://bitworks.software/ <http://bw-sw.com/>
Bitworks Software — custom software development for fast 
...<http://bitworks.software/>
bitworks.software
Welcome to Bitworks Software. We update our web-site currently. Our estimate of 
coming back is middle of May, 2018. Currently available resources:





Basic networking setup

2018-05-29 Thread Jon Marshall
>From the 4.11 documentation -


"When basic networking is used, CloudStack will assign IP addresses in the CIDR 
of the pod to the guests in that pod. The administrator must add a Direct IP 
range on the pod for this purpose. These IPs are in the same VLAN as the hosts."


It may be the way it is written but the above suggests that the IP subnet used 
for guest VM traffic is the same IP subnet used for the actual hosts themselves.


But in the same documentation it says it recommends the use of separate NICs 
for management and guest traffic.


I have setup CS using separate subnets for management, Guest VMs and storage so 
3 separate NICs each in a different vlan using a different IP subnet. (the NICs 
are not vlan aware, just connecting to ports in different vlans on the switch).


Should I be using just the one IP subnet for all NICs and simply connecting 
them all to the same bridge instead ?


Jon




Re: 4.11 without Host-HA framework

2018-05-24 Thread Jon Marshall
Hi Parth


I remember you saying this worked for you in a previous thread.


I am beginning to wonder if it is the fact I have used 3 separate NICs, one for 
management, one for the VM traffic and the third for storage that I am not 
seeing the behaviour you saw.


That is why, I too would like to understand exactly what is talking to what and 
doing checks for both non Host-HA and Host-HA.


I did get failover working in some scenarios with Host-HA and OOBM using IPMI 
but it was slow even after tweaking the timers eg. for a crashed host the best 
time i got was around 8 minutes which seems a long time but perhaps that is an 
acceptable time for CS, I just don't know.


Not expecting it to be instantaneous as it needs to do checks etc.


Jon



From: Parth Patel <parthpatel2...@gmail.com>
Sent: 24 May 2018 06:52
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Hi Jon and Angus,

I did not shutdown the VMs as Yiping Zhang said, but I have confirmed this
and discussed earlier in the users list that my HA-enabled VMs got started
on another suitable available host in the cluster even when I didn't have
IPMI-enabled hardware and did no configuration for OOBM and Host-HA. I
simply pulled the ethernet cable connecting the host to entire network (I
did use just one NIC) and according to the value set in ping timeout event,
the HA-enabled VMs were restarted on another available host. I tested the
scenario using both the scenarios: the echo command as well as good old
plugging out the NIC from the host. My VMs were successfully started on
another available host after CS manager confirmed they were not reachable.

I too want to understand how the failover mechanism in CloudStack actually
works. I used ACS 4.11 packages available here:
http://cloudstack.apt-get.eu/centos/7/4.11/

Regards,
Parth Patel


On Thu, 24 May 2018 at 10:53 Paul Angus <paul.an...@shapeblue.com> wrote:

> I'm afraid that is not a host crash.  When shutting down the guest OS, the
> CloudStack agent on the host is still able to report to the management
> server that the VM has stopped.
>
> This is my point. VM-HA relies on the management sever communication with
> the host agent.
>
> Kind regards,
>
> Paul Angus
>
> paul.an...@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>
> -Original Message-
> From: Yiping Zhang <yzh...@marketo.com>
> Sent: 24 May 2018 00:44
> To: users@cloudstack.apache.org
> Subject: Re: 4.11 without Host-HA framework
>
> I can say for fact that VM's using a HA enabled service offering will be
> restarted by CS on another host, assuming there are enough
> capacity/resources in the cluster, when their original host crashes,
> regardless that host comes back or not.
>
> The simplest way to test VM HA feature with a VM instance using HA enabled
> service offering is to issue shutdown command in guest OS, and watching it
> gets restarted by CS manager.
>
> On 5/23/18, 1:23 PM, "Paul Angus" <paul.an...@shapeblue.com> wrote:
>
> Hi Jon,
>
> Don't worry, TBH I'm dubious about those claiming to have VM-HA
> working when a host crashes (but doesn't restart).
> I'll check in with the guys that set values for host-ha when testing,
> to see which ones they change and what they set them to.
>
> paul.an...@shapeblue.com
>     www.shapeblue.com<http://www.shapeblue.com>
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>
> -Original Message-
> From: Jon Marshall <jms@hotmail.co.uk>
> Sent: 23 May 2018 21:10
> To: users@cloudstack.apache.org
> Subject: Re: 4.11 without Host-HA framework
>
> Rohit / Paul
>
>
> Thanks again for answering.
>
>
> I am a Cisco guy with an ex Unix background but no virtualisation
> experience and I can honestly say I have never felt this stupid before 
>
>
> I have Cloudstack working but failover is killing me.
>
>
> When you say VM HA relies on the host telling CS the VM is down how
> does that work because if you crash the host how does it tell CS anything ?
> And when you say tell CS do you mean the CS manager  ?
>
>
> I guess I am just not understanding all the moving parts. I have had
> HOST HA working (to an extent) although it takes a long time to failover
> even after tweaking the timers but the fact that I keep finding references
> to people saying even without HOST HA it should failover (and mine doesn't)
> makes me think I have configured it incorrectly somewhere along the line.
>
>
> I have configured a compute offering with HA and I am crashi

Re: 4.11 without Host-HA framework

2018-05-23 Thread Jon Marshall
Rohit / Paul


Thanks again for answering.


I am a Cisco guy with an ex Unix background but no virtualisation experience 
and I can honestly say I have never felt this stupid before 


I have Cloudstack working but failover is killing me.


When you say VM HA relies on the host telling CS the VM is down how does that 
work because if you crash the host how does it tell CS anything ? And when you 
say tell CS do you mean the CS manager  ?


I guess I am just not understanding all the moving parts. I have had HOST HA 
working (to an extent) although it takes a long time to failover even after 
tweaking the timers but the fact that I keep finding references to people 
saying even without HOST HA it should failover (and mine doesn't) makes me 
think I have configured it incorrectly somewhere along the line.


I have configured a compute offering with HA and I am crashing the host with 
the echo command as suggested but still nothing.


I understand what you are saying Paul about it not being a good idea to rely on 
VM HA so I will go back to Host HA and try to speed up failover times.


Can I ask, from your experiences, what is a realistic fail over time for CS ie. 
if a host fails for example ?


Jon





From: Paul Angus <paul.an...@shapeblue.com>
Sent: 23 May 2018 19:55
To: users@cloudstack.apache.org
Subject: RE: 4.11 without Host-HA framework

Jon,

As Rohit says, it is very important to understand the difference between VM HA 
and host HA.
VM HA relies on the HOST telling CloudStack that the VM is down on order for 
CloudStack start it again (wherever that ends up being).
Any sequence of events that ends up with VM HA restarting the VM when 
CloudStack can't contact the host is luck/fluke/unreliable/bad(tm)

The purpose of Host HA was to create a reliable mechanism to determine that a 
host has 'crashed' and that the VMs within it are inoperative. Then take 
appropriate action, including ultimately telling VM HA to restart the VM 
elsewhere.





paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
ShapeBlue are the largest independent integrator of CloudStack technologies 
globally and are specialists in the design and implementation of IaaS cloud 
infrastructures for both private and public cloud implementations.



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-
From: Rohit Yadav <rohit.ya...@shapeblue.com>
Sent: 23 May 2018 10:45
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Jon,


In the VM's compute offering, make sure that HA is ticked/enabled. Then use 
that HA-enabled VM offering while deploying a VM. Around testing - it depends 
how you're crashing. In case of KVM, you can try to cause host crash (example: 
echo c > /proc/sysrq-trigger) and see if HA-enabled VMs gets started on a 
different host.


- Rohit

<https://cloudstack.apache.org>



________
From: Jon Marshall <jms@hotmail.co.uk>
Sent: Tuesday, May 22, 2018 8:28:06 PM
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Hi Rohit


Thanks for responding.


I have not had much luck with HA at all.  I crash a server and nothing happens  
in terms of VMs migrating to another host. Monitoring the management log file 
it seems the management server recognises the host has stopped responding to 
pings but doesn't think it has to do anything.


I am currently running v4.11 with basic network but 3 separate NICs, one for 
management, one for storage and one for VMs themselves.


Should it make it any difference ie. would it be worth trying to run management 
and storage over the same NIC ?


I am just lost as to why I see no failover at all whereas others are reporting 
it works fine.


Jon



From: Rohit Yadav <rohit.ya...@shapeblue.com>
Sent: 22 May 2018 12:12
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Hi Jon,


Yes, Host-HA is different from VM-HA and without Host HA enabled a HA enabled 
VM should be recovered/run on a different host when it crashes. Historically 
the term 'HA' in CloudStack is used around high availability of a VM.


Host HA as the name tries to imply is around HA of a physical hypervisor host 
by means of out-of-band management technologies such as ipmi and currently 
supporting ipmi as OOBM and KVM hosts with NFS storage.


- Rohit

<https://cloudstack.apache.org>
[https://cloudstack.apache.org/images/monkey-144.png]<https://cloudstack.apache.org/>

Apache CloudStack: Open Source Cloud Computing<https://cloudstack.apache.org/>
cloudstack.apache.org
CloudStack is open source cloud computing software for creating, managing, and 
deploying infrastructure cloud services







From: Jon Marshall <jms.

Re: 4.11 without Host-HA framework

2018-05-22 Thread Jon Marshall
Hi Rohit


Thanks for responding.


I have not had much luck with HA at all.  I crash a server and nothing happens  
in terms of VMs migrating to another host. Monitoring the management log file 
it seems the management server recognises the host has stopped responding to 
pings but doesn't think it has to do anything.


I am currently running v4.11 with basic network but 3 separate NICs, one for 
management, one for storage and one for VMs themselves.


Should it make it any difference ie. would it be worth trying to run management 
and storage over the same NIC ?


I am just lost as to why I see no failover at all whereas others are reporting 
it works fine.


Jon



From: Rohit Yadav <rohit.ya...@shapeblue.com>
Sent: 22 May 2018 12:12
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Hi Jon,


Yes, Host-HA is different from VM-HA and without Host HA enabled a HA enabled 
VM should be recovered/run on a different host when it crashes. Historically 
the term 'HA' in CloudStack is used around high availability of a VM.


Host HA as the name tries to imply is around HA of a physical hypervisor host 
by means of out-of-band management technologies such as ipmi and currently 
supporting ipmi as OOBM and KVM hosts with NFS storage.


- Rohit

<https://cloudstack.apache.org>
[https://cloudstack.apache.org/images/monkey-144.png]<https://cloudstack.apache.org/>

Apache CloudStack: Open Source Cloud Computing<https://cloudstack.apache.org/>
cloudstack.apache.org
CloudStack is open source cloud computing software for creating, managing, and 
deploying infrastructure cloud services






________
From: Jon Marshall <jms@hotmail.co.uk>
Sent: Monday, May 21, 2018 8:36:04 PM
To: users@cloudstack.apache.org
Subject: 4.11 without Host-HA framework

I keep seeing conflicting information about this in the mailing lists and in 
blogs etc.

If I run 4.11 without enabling Host HA framework should HA still work if I 
crash a compute node because my understanding was the new framework was added 
for certain cases only.

It doesn't work for me but I can find a number of people saying you don't need 
to enable the new framework for it to work.

Thanks

Jon

rohit.ya...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue





4.11 without Host-HA framework

2018-05-21 Thread Jon Marshall
I keep seeing conflicting information about this in the mailing lists and in 
blogs etc.

If I run 4.11 without enabling Host HA framework should HA still work if I 
crash a compute node because my understanding was the new framework was added 
for certain cases only.

It doesn't work for me but I can find a number of people saying you don't need 
to enable the new framework for it to work.

Thanks

Jon


Re: Failover for VMs

2018-04-03 Thread Jon Marshall
Paul


I did some more testing today and am not sure what some of the states mean.


The first test was the easiest ie. "echo c > /proc/sysrq-trigger" which crashes 
the server.  In my setup the VMs on the crashed node never migrate because the 
server is rebooted and it comes backup before CS tries to migrate any servers.  
It takes approx 4 mins for server to recover.


The next tests were by doing a hard reset on the server and then modifying 
timers -


I did 4 tests and the quickest I got the VMs to failover was approx  5 and half 
minutes (see below for test details).


So I have two questions really from all this -


1) why does it go from Suspect to Degraded and back to Suspect once I started 
changing timers.  According to the docs Degraded means a successful activity 
check but the server was down so it can't have passed. And noticeably without 
modifying any timers it never goes to Degraded at all.


2) what is a sensible fail over time in your experience ie. what in your 
experience is a reasonable failover time ?


Thanks for any help you can give.


Tests -


1)  default timers -

0:00 Suspect
9:00 recovery/Fenced
10:15 VM migrated

2)  kvm.ha.activity.check.max.attempts  3 (default = 10)

0:00 Suspect
2:00 Degraded
7:00 Suspect
9:00 Recovery/Fenced
10:20 VM migrated

3)  kvm.ha.activity.check.max.attempts 3  (default = 10)
 kvm.ha.degraded.max.period 120 seconds (default = 300)

0:00 Suspect
2:00 Degraded
4:00 Suspect
6:00 Checking/Fenced
7:21 VM migrated

4)   kvm.ha.activity.check.max.attempts 3  (default = 10)
  kvm.ha.degraded.max.period 120 seconds (default = 300)
  kvm.ha.activity.check.interval 30 seconds (default = 60)

0:00 Suspect
1:10 Degraded
3:10 Suspect
4:20 Recovering/Fenced
5:30 VM migrated


________
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 29 March 2018 09:40
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs

Hi Paul


I did make some progress with this and seem to remember that after it said 
Recovered it then went back to Suspect and finally Fenced.


I am going to rerun a lot of the tests after changing some of the kvm_ha_ 
timers to try and speed things up a bit.


Will update here after I have run tests to check if that is what I should be 
seeing.


Many thanks


Jon



From: Paul Angus <paul.an...@shapeblue.com>
Sent: 28 March 2018 20:01
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs

Ah.

Did you wait after the node said recovered?

That message is spurious. I've seen it also. It should say recovering.   at 
that time.

________
From: Jon Marshall <jms@hotmail.co.uk>
Sent: Tuesday, 27 March 2018 10:42 am
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs

Just as an update to this before I forget what I did :) -


I used "echo c /proc/sysrq-trigger" on one of the compute nodes and there was 
no VM failover.  Instead HA reported suspect and then IPMI rebooted the 
machine, it came back online and the VM started responding to pings again.  
IPMI is out of band so that seems to be reasonable behaviour but no use in 
testing HA.


Next I just pulled all 3 NIC cables  from the same compute node and again HA 
reported suspect.  Again IPMI rebooted but then HA state changed to "Recovered" 
which I don't understand as the NIC cables were still disconnected so VM was 
not reachable and no failover.


I don't understand how it can think the node is recovered as apart from the 
IPMI out of band connection there are no network connections to this server.


Finally pulled power lead and this time HA went from suspect to Fencing and 
then stayed that way. Again no VM failover.   This makes sense as no power 
means IPMI cannot reboot server so it never moves to Fenced I assume. Again no 
failover.


I am wondering if it is to do with out of band IPMI or the way I have the NICs 
setup.  The management node only has one NIC in the management network but I 
assume this is okay.


I may try reloading with CS v4.9 and just try failover without the new HA KVM 
to see if I see anything different.



Jon



From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 10:10
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs


Thanks Paul, will pick up after Easter break.

Doing some more testing with HA KVM at the moment so any progress will update 
this thread


i

From: Paul Angus <paul.an...@shapeblue.com
Sent: 27 March 2018 10:07
To: users@cloudstack.apache.org
Subject: RE: Failover for VMr
Jon,

I've been updating the Ansible to move our physical hosts from Centos6 to 
Centos7, now that's done I'll run through an HA setup and post answers 
(probably after easter break).

paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-conte

Re: Failover for VMs

2018-03-29 Thread Jon Marshall
Hi Paul


I did make some progress with this and seem to remember that after it said 
Recovered it then went back to Suspect and finally Fenced.


I am going to rerun a lot of the tests after changing some of the kvm_ha_ 
timers to try and speed things up a bit.


Will update here after I have run tests to check if that is what I should be 
seeing.


Many thanks


Jon



From: Paul Angus <paul.an...@shapeblue.com>
Sent: 28 March 2018 20:01
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs

Ah.

Did you wait after the node said recovered?

That message is spurious. I've seen it also. It should say recovering.   at 
that time.


From: Jon Marshall <jms@hotmail.co.uk>
Sent: Tuesday, 27 March 2018 10:42 am
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs

Just as an update to this before I forget what I did :) -


I used "echo c /proc/sysrq-trigger" on one of the compute nodes and there was 
no VM failover.  Instead HA reported suspect and then IPMI rebooted the 
machine, it came back online and the VM started responding to pings again.  
IPMI is out of band so that seems to be reasonable behaviour but no use in 
testing HA.


Next I just pulled all 3 NIC cables  from the same compute node and again HA 
reported suspect.  Again IPMI rebooted but then HA state changed to "Recovered" 
which I don't understand as the NIC cables were still disconnected so VM was 
not reachable and no failover.


I don't understand how it can think the node is recovered as apart from the 
IPMI out of band connection there are no network connections to this server.


Finally pulled power lead and this time HA went from suspect to Fencing and 
then stayed that way. Again no VM failover.   This makes sense as no power 
means IPMI cannot reboot server so it never moves to Fenced I assume. Again no 
failover.


I am wondering if it is to do with out of band IPMI or the way I have the NICs 
setup.  The management node only has one NIC in the management network but I 
assume this is okay.


I may try reloading with CS v4.9 and just try failover without the new HA KVM 
to see if I see anything different.



Jon


________
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 10:10
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs


Thanks Paul, will pick up after Easter break.

Doing some more testing with HA KVM at the moment so any progress will update 
this thread


i

From: Paul Angus <paul.an...@shapeblue.com
Sent: 27 March 2018 10:07
To: users@cloudstack.apache.org
Subject: RE: Failover for VMr
Jon,

I've been updating the Ansible to move our physical hosts from Centos6 to 
Centos7, now that's done I'll run through an HA setup and post answers 
(probably after easter break).

paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com<http://www.shapeblue.com>
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com<http://www.shapeblue.com>
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 09:19
To: users@cloudstack.apache.org
Subject: Failover for VMs

After 3 weeks of trying multiple different setups I still have not managed to 
get a VM to failover between compute nodes and am just running out of ideas.


I have 3 compute nodes each with 3 NICS (management, VMs traffic, storage), one 
management node with just a single NIC connection in the management network and 
a separate NFS server.


I have tried with and without the new Host HA KVM in CS v4.11 as from what I 
have read even without enabling the new Host HA KVM when you power off or 
reboot a compute node 

Re: Failover for VMs

2018-03-27 Thread Jon Marshall
Ok, significant progress made with this and have got Host HA KVM failover 
working for a number of different scenarios.


Will update this thread with tests run etc. and pick up after Easter as 
suggested by Paul.



From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 11:24
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs

I am just updating as I continue testing -


When i pulled the power lead as discussed below it goes from Suspect to Fencing 
but never gets to Fenced.  But when I put the power lead back in to the server 
CS almost immediately puts that server into maintenance mode and then does 
migrate t
ot sure of the logic but at least I got to see a VM failover
___
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 10:42
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs

Just as an update to this before I forget what I did :) -


I used "echo c /proc/sysrq-trigger" on one of the compute nodes and there was 
no VM failover.  Instead HA reported suspect and then IPMI rebooted the 
machine, it came bacVM started responding to pings again.  IPMI is out of band 
so that seems to be reasonable behaviour but no use in testing HA.


Next I just pulled all 3 NIC cables  from the same compute node and again HA 
reported suspect.  Again IPMI rebooted but then HA state changed to "Recovered" 
which I don't understand as the NIC cables were still disconnected so VM was 
not reachable and no failover.


I don't understand how it can think the node is recovered as apart from the 
IPMI out of band connection there are no network connections to this server.


Finally pulled power lead and this time HA went from suspect to Fencing and 
then stayed that way. Again no VM failover.   This makes sense as no power 
means IPMI cannot reboot server so it never moves to Fenced I assume. Again no 
failover.


I am wondering if it is to do with out of band IPMI or the way I have the NICs 
setup.  The management node only has one NIC in the management network but I 
assume this is okay.


I may try reloading with CS v4.9 and just try failover without the new HA KVM 
to see if I see anything different.



Jon


________
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 10:10
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs


Thanks Paul, will pick up after Easter break.

Doing some more testing with HA KVM at the moment so any progress will update 
this thread


i

From: Paul Angus <paul.an...@shapeblue.com
Sent: 27 March 2018 10:07
To: users@cloudstack.apache.org
Subject: RE: Failover for VMr
Jon,

I've been updating the Ansible to move our physical hosts from Centos6 to 
Centos7, now that's done I'll run through an HA setup and post answers 
(probably after easter break).

paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com<http://www.shapeblue.com>
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com<http://www.shapeblue.com>
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com<http://www.shapeblue.com>
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 09:19
To: users@cloudstack.apache.org
Subject: Failover for VMs

After 3 weeks of trying multiple different setups I still have not managed to 
get a VM to failover between compute nodes and am just running out of ideas.


I have 3 compute nodes each with 3 NICS (management, VMs traffic, storage), one 
management node with jus

Re: Failover for VMs

2018-03-27 Thread Jon Marshall
I am just updating as I continue testing -


When i pulled the power lead as discussed below it goes from Suspect to Fencing 
but never gets to Fenced.  But when I put the power lead back in to the server 
CS almost immediately puts that server into maintenance mode and then does 
migrate the VM.


Not sure of the logic but at least I got to see a VM failover :)



From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 10:42
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs

Just as an update to this before I forget what I did :) -


I used "echo c /proc/sysrq-trigger" on one of the compute nodes and there was 
no VM failover.  Instead HA reported suspect and then IPMI rebooted the 
machine, it came bacVM started responding to pings again.  IPMI is out of band 
so that seems to be reasonable behaviour but no use in testing HA.


Next I just pulled all 3 NIC cables  from the same compute node and again HA 
reported suspect.  Again IPMI rebooted but then HA state changed to "Recovered" 
which I don't understand as the NIC cables were still disconnected so VM was 
not reachable and no failover.


I don't understand how it can think the node is recovered as apart from the 
IPMI out of band connection there are no network connections to this server.


Finally pulled power lead and this time HA went from suspect to Fencing and 
then stayed that way. Again no VM failover.   This makes sense as no power 
means IPMI cannot reboot server so it never moves to Fenced I assume. Again no 
failover.


I am wondering if it is to do with out of band IPMI or the way I have the NICs 
setup.  The management node only has one NIC in the management network but I 
assume this is okay.


I may try reloading with CS v4.9 and just try failover without the new HA KVM 
to see if I see anything different.



Jon


________
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 10:10
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs


Thanks Paul, will pick up after Easter break.

Doing some more testing with HA KVM at the moment so any progress will update 
this thread


i

From: Paul Angus <paul.an...@shapeblue.com
Sent: 27 March 2018 10:07
To: users@cloudstack.apache.org
Subject: RE: Failover for VMr
Jon,

I've been updating the Ansible to move our physical hosts from Centos6 to 
Centos7, now that's done I'll run through an HA setup and post answers 
(probably after easter break).

paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com<http://www.shapeblue.com>
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com<http://www.shapeblue.com>
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 09:19
To: users@cloudstack.apache.org
Subject: Failover for VMs

After 3 weeks of trying multiple different setups I still have not managed to 
get a VM to failover between compute nodes and am just running out of ideas.


I have 3 compute nodes each with 3 NICS (management, VMs traffic, storage), one 
management node with just a single NIC connection in the management network and 
a separate NFS server.


I have tried with and without the new Host HA KVM in CS v4.11 as from what I 
have read even without enabling the new Host HA KVM when you power off or 
reboot a compute node your VMs should still migrate.


I have tried powering off a compute node, pulling the power lead, removing the 
management and NFS network cables and the management server just seems to carry 
on as if nothing has happened.


Could someone explain exactly how HA is meant to work so I can look at where it 
is going wrong.


Re: Failover for VMs

2018-03-27 Thread Jon Marshall
Just as an update to this before I forget what I did :) -


I used "echo c /proc/sysrq-trigger" on one of the compute nodes and there was 
no VM failover.  Instead HA reported suspect and then IPMI rebooted the 
machine, it came back online and the VM started responding to pings again.  
IPMI is out of band so that seems to be reasonable behaviour but no use in 
testing HA.


Next I just pulled all 3 NIC cables  from the same compute node and again HA 
reported suspect.  Again IPMI rebooted but then HA state changed to "Recovered" 
which I don't understand as the NIC cables were still disconnected so VM was 
not reachable and no failover.


I don't understand how it can think the node is recovered as apart from the 
IPMI out of band connection there are no network connections to this server.


Finally pulled power lead and this time HA went from suspect to Fencing and 
then stayed that way. Again no VM failover.   This makes sense as no power 
means IPMI cannot reboot server so it never moves to Fenced I assume. Again no 
failover.


I am wondering if it is to do with out of band IPMI or the way I have the NICs 
setup.  The management node only has one NIC in the management network but I 
assume this is okay.


I may try reloading with CS v4.9 and just try failover without the new HA KVM 
to see if I see anything different.



Jon


________
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 10:10
To: users@cloudstack.apache.org
Subject: Re: Failover for VMs


Thanks Paul, will pick up after Easter break.

Doing some more testing with HA KVM at the moment so any progress will update 
this thread


i

From: Paul Angus <paul.an...@shapeblue.com
Sent: 27 March 2018 10:07
To: users@cloudstack.apache.org
Subject: RE: Failover for VMr
Jon,

I've been updating the Ansible to move our physical hosts from Centos6 to 
Centos7, now that's done I'll run through an HA setup and post answers 
(probably after easter break).

paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com<http://www.shapeblue.com>
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 09:19
To: users@cloudstack.apache.org
Subject: Failover for VMs

After 3 weeks of trying multiple different setups I still have not managed to 
get a VM to failover between compute nodes and am just running out of ideas.


I have 3 compute nodes each with 3 NICS (management, VMs traffic, storage), one 
management node with just a single NIC connection in the management network and 
a separate NFS server.


I have tried with and without the new Host HA KVM in CS v4.11 as from what I 
have read even without enabling the new Host HA KVM when you power off or 
reboot a compute node your VMs should still migrate.


I have tried powering off a compute node, pulling the power lead, removing the 
management and NFS network cables and the management server just seems to carry 
on as if nothing has happened.


Could someone explain exactly how HA is meant to work so I can look at where it 
is going wrong.


Re: Failover for VMs

2018-03-27 Thread Jon Marshall

Thanks Paul, will pick up after Easter break.

Doing some more testing with HA KVM at the moment so any progress will update 
this thread


i

From: Paul Angus <paul.an...@shapeblue.com
Sent: 27 March 2018 10:07
To: users@cloudstack.apache.org
Subject: RE: Failover for VMr
Jon,

I've been updating the Ansible to move our physical hosts from Centos6 to 
Centos7, now that's done I'll run through an HA setup and post answers 
(probably after easter break).

paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-----
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 27 March 2018 09:19
To: users@cloudstack.apache.org
Subject: Failover for VMs

After 3 weeks of trying multiple different setups I still have not managed to 
get a VM to failover between compute nodes and am just running out of ideas.


I have 3 compute nodes each with 3 NICS (management, VMs traffic, storage), one 
management node with just a single NIC connection in the management network and 
a separate NFS server.


I have tried with and without the new Host HA KVM in CS v4.11 as from what I 
have read even without enabling the new Host HA KVM when you power off or 
reboot a compute node your VMs should still migrate.


I have tried powering off a compute node, pulling the power lead, removing the 
management and NFS network cables and the management server just seems to carry 
on as if nothing has happened.


Could someone explain exactly how HA is meant to work so I can look at where it 
is going wrong.


Failover for VMs

2018-03-27 Thread Jon Marshall
After 3 weeks of trying multiple different setups I still have not managed to 
get a VM to failover between compute nodes and am just running out of ideas.


I have 3 compute nodes each with 3 NICS (management, VMs traffic, storage), one 
management node with just a single NIC connection in the management network and 
a separate NFS server.


I have tried with and without the new Host HA KVM in CS v4.11 as from what I 
have read even without enabling the new Host HA KVM when you power off or 
reboot a compute node your VMs should still migrate.


I have tried powering off a compute node, pulling the power lead, removing the 
management and NFS network cables and the management server just seems to carry 
on as if nothing has happened.


Could someone explain exactly how HA is meant to work so I can look at where it 
is going wrong.


Re: KVM HostHA

2018-03-15 Thread Jon Marshall
ue - The CloudStack Company<http://www.shapeblue.com/>
> www.shapeblue.com<http://www.shapeblue.com>
> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a
> framework developed by ShapeBlue to deliver the rapid deployment of a
> standardised ...
>
>
>
> > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapeblue
> >
> >
> >
> > > On 14 Mar 2018, at 14:51, Andrija Panic <andrija.pa...@gmail.com>
> wrote:
> > >
> > > Hi Boris,
> > >
> > > ok thanks for the explanation - that makes sense, and covers my
> > "exception
> > > case" that I have.
> > >
> > > This is atm only available for NFS as I could read (KVM on NFS) ?
> > >
> > > Cheers
> > >
> > > On 14 March 2018 at 13:02, Boris Stoyanov <
> boris.stoya...@shapeblue.com>
> > > wrote:
> > >
> > >> Hi Andrija,
> > >>
> > >> There’s two types of checks Host-HA is doing to determine if host if
> > >> healthy.
> > >>
> > >> 1. Health checks - pings the host as soon as there’s connection issues
> > >> with the agent
> > >>
> > >> If that fails,
> > >>
> > >> 2. Activity checks - checks if there are any writing operations on the
> > >> Disks of the VMs that are running on the hosts. This is to determine
> if
> > the
> > >> VMs are actually alive and executing processes. Only if no disk
> > operations
> > >> are executed on the shared storage, only then it’s trying to Recover
> the
> > >> host with IPMI call, if that eventually fails, it migrates the VMs to
> a
> > >> healthy host and Fences the faulty one.
> > >>
> > >> Hope that explains your case.
> > >>
> > >> Boris.
> > >>
> > >>
> > >> boris.stoya...@shapeblue.com
> > >> www.shapeblue.com<http://www.shapeblue.com>
> > >> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > >> @shapeblue
> > >>
> > >>
> > >>
> > >>> On 14 Mar 2018, at 13:53, Andrija Panic <andrija.pa...@gmail.com>
> > wrote:
> > >>>
> > >>> Hi Paul,
> > >>>
> > >>> sorry to bump in the middle of the thread, but just curious about the
> > >> idea
> > >>> behing host-HA and why it behaves the way you exlained above:
> > >>>
> > >>>
> > >>> Would it be more sense (or not?), that when MGMT detects agents is
> > >>> unreachable or host unreachable (or after unsuccessful i.e. agent
> > >> restart,
> > >>> etc...,to be defined), to actually use IPMI to STONITH the node, thus
> > >>> making sure no VMS running and then to really start all HA-enabled
> VMs
> > on
> > >>> other hosts ?
> > >>>
> > >>> I'm just trying to make parallel to the corosync/pacemaker as
> > clustering
> > >>> suite/services in Linux (RHEL and others), where when majority of
> nodes
> > >>> detect that one node is down, a common thing (especially for shared
> > >>> storage) is to STONITH that node, make sure it;s down, then move
> > >> "resource"
> > >>> (in our case VMs) to other cluster nodes ?
> > >>>
> > >>> I see it's  actually much broader setup per
> > >>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA but
> > >> again -
> > >>> whole idea (in my head at least...) is when host get's down, we make
> > sure
> > >>> it's down (avoid VM corruption, by doint STONITH to that node) and
> then
> > >>> start HA VMs on ohter hosts.
> > >>>
> > >>> I understand there might be exceptions as I have right now (4.8) -
> > >> libvirt
> > >>> get stuck (librbd exception or similar) so agent get's disconnected,
> > but
> > >>> VMs are still running fine... (except DB get messed up, all NICs
> loose
> > >>> isolation_uri, VR's loose MAC addresses and other IP addresses
> etc...)
> > >>>
> > >>>
> > >>> Thanks
> > >>> Andrija
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On 14 March 2018 at 10:57, Jon Marshall <jms@hotmail.co.uk>
> wrote:
> > >>>
> >

Re: KVM HostHA

2018-03-15 Thread Jon Marshall
hat one node is down, a common thing (especially for shared
> >>> storage) is to STONITH that node, make sure it;s down, then move
> >> "resource"
> >>> (in our case VMs) to other cluster nodes ?
> >>>
> >>> I see it's  actually much broader setup per
> >>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA but
> >> again -
> >>> whole idea (in my head at least...) is when host get's down, we make
> sure
> >>> it's down (avoid VM corruption, by doint STONITH to that node) and then
> >>> start HA VMs on ohter hosts.
> >>>
> >>> I understand there might be exceptions as I have right now (4.8) -
> >> libvirt
> >>> get stuck (librbd exception or similar) so agent get's disconnected,
> but
> >>> VMs are still running fine... (except DB get messed up, all NICs loose
> >>> isolation_uri, VR's loose MAC addresses and other IP addresses etc...)
> >>>
> >>>
> >>> Thanks
> >>> Andrija
> >>>
> >>>
> >>>
> >>>
> >>> On 14 March 2018 at 10:57, Jon Marshall <jms@hotmail.co.uk> wrote:
> >>>
> >>>> That would make sense.
> >>>>
> >>>>
> >>>> I have another server being used for something else at the moment so I
> >>>> will add that in and update this thread when I have tested
> >>>>
> >>>>
> >>>> Jon
> >>>>
> >>>>
> >>>> 
> >>>> From: Paul Angus <paul.an...@shapeblue.com>
> >>>> Sent: 14 March 2018 09:16
> >>>> To: users@cloudstack.apache.org
> >>>> Subject: RE: KVM HostHA
> >>>>
> >>>> I'd need to do some testing, but I suspect that your problem is that
> you
> >>>> only have two hosts.  At the point that one host is deemed out of
> >> service,
> >>>> you only have one host left.  With only one host, CloudStack will show
> >> the
> >>>> cluster as ineligible.
> >>>>
> >>>> It is extremely common for any system working as a cluster to require
> a
> >>>> minimum starting point of 3 nodes to be able to function.
> >>>>
> >>>>
> >>>> Kind regards,
> >>>>
> >>>> Paul Angus
> >>>>
> >>>> paul.an...@shapeblue.com
> >>>> www.shapeblue.com<http://www.shapeblue.com>
> >>>> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<
> >>>> http://www.shapeblue.com/>
> >>>>
> >>>> Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
> >>>> www.shapeblue.com<http://www.shapeblue.com>
> >>>> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge
> >> is a
> >>>> framework developed by ShapeBlue to deliver the rapid deployment of a
> >>>> standardised ...
> >>>>
> >>>>
> >>>>
> >>>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> >>>> @shapeblue
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> -Original Message-
> >>>> From: Jon Marshall <jms@hotmail.co.uk>
> >>>> Sent: 14 March 2018 08:36
> >>>> To: users@cloudstack.apache.org
> >>>> Subject: Re: KVM HostHA
> >>>>
> >>>> Hi Paul
> >>>>
> >>>>
> >>>> My testing does indeed end up with the failed host in maintenance mode
> >> but
> >>>> the VMs are never migrated. As I posted earlier the management server
> >> seems
> >>>> to be saying there is no other host that the VM can be migrated to.
> >>>>
> >>>>
> >>>> Couple of questions if you have the time to respond -
> >>>>
> >>>>
> >>>> 1) this article seems to suggest a reboot or powering off a host will
> >> end
> >>>> result in the VMs being migrated and this was on CS v 4.2.1 back in
> >> 2013 so
> >>>> does Host HA do something different
> >>>>
> >>>>
> >>>> 2) Whenever one of my two nodes is taken down in testing the active
> >>>> compute nodes HA status goes from Available to Ineligib

Re: KVM HostHA

2018-03-14 Thread Jon Marshall
That would make sense.


I have another server being used for something else at the moment so I will add 
that in and update this thread when I have tested


Jon



From: Paul Angus <paul.an...@shapeblue.com>
Sent: 14 March 2018 09:16
To: users@cloudstack.apache.org
Subject: RE: KVM HostHA

I'd need to do some testing, but I suspect that your problem is that you only 
have two hosts.  At the point that one host is deemed out of service, you only 
have one host left.  With only one host, CloudStack will show the cluster as 
ineligible.

It is extremely common for any system working as a cluster to require a minimum 
starting point of 3 nodes to be able to function.


Kind regards,

Paul Angus

paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 14 March 2018 08:36
To: users@cloudstack.apache.org
Subject: Re: KVM HostHA

Hi Paul


My testing does indeed end up with the failed host in maintenance mode but the 
VMs are never migrated. As I posted earlier the management server seems to be 
saying there is no other host that the VM can be migrated to.


Couple of questions if you have the time to respond -


1) this article seems to suggest a reboot or powering off a host will end 
result in the VMs being migrated and this was on CS v 4.2.1 back in 2013 so 
does Host HA do something different


2) Whenever one of my two nodes is taken down in testing the active compute 
nodes HA status goes from Available to Ineligible. Should this happen ie. is it 
going to Ineligible stopping the manager from migrating the VMs.


Apologies for all the questions but I just can't get this to work at the 
moment. If I do eventually get it working I will do a write up for others with 
same issue :)



From: Paul Angus <paul.an...@shapeblue.com>
Sent: 14 March 2018 07:45
To: users@cloudstack.apache.org
Subject: RE: KVM HostHA

Hi Parth,

Two answer your questions, VM-HA does not restart VMs on an alternate host if 
the original host goes down.  The management server (without host-HA) cannot 
tell what happened to the host.  It cannot tell if there was a failure in the 
agent, loss of connectivity to the management NIC or if the host is truly down. 
 In the first two scenarios, the guest VMs can still be running perfectly well, 
and to restart them elsewhere would be very dangerous.  Therefore, the correct 
thing to do is - nothing but alert the operator.  These scenarios are what 
Host-HA was introduced for.

Wrt to STONITH, if no disk activity is detected on the host, host-HA will try 
to restart (via IPMI) the host. If, after a configurable number of attempts, 
the host agent still does not check in, then host-HA will shut down the host 
(via IPMA), trigger VM-HA and mark the host as in-maintenance.



paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com<http://www.shapeblue.com>
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue




-Original Message-
From: Parth Patel <parthpatel2...@gmail.com>
Sent: 14 March 2018 05:05
To: users@cloudstack.apache.org
Subject: Re: KVM HostHA

Hi Paul,

Thanks for the clarification. I currently don't have an ipmi enabled hardware 
(in test environment), but it will be beneficial if you can help me clear out 
some basic concepts of it:
- If HA-enabled VMs are autostarted on another host when current host goes 
down, what is the need or purpose of HA-host? (other than management server 
able to remotely control it's power interfaces)
- I understood the "Shoot-the-other-node-in-the-head" (STONITH) approach ACS 
uses to fence the host, but I couldn't find what mechanism or events trigger 
this?

Thanks and regards,
Parth Patel

On Wed, 14 Mar 2018 at 02:22 Paul Angus <paul.an...@shapeblue.com> wrote:

> The management server doesn't ping the host through IPMI.   However if
> IPMI is not available, you will not be able to use Host HA, as there
> is no way for CloudStack to 'fence' the host - that is shut it down to
> be sure that a VM cannot start again on that host.
>
> I can explai

Re: KVM HostHA

2018-03-14 Thread Jon Marshall
Hi Paul


My testing does indeed end up with the failed host in maintenance mode but the 
VMs are never migrated. As I posted earlier the management server seems to be 
saying there is no other host that the VM can be migrated to.


Couple of questions if you have the time to respond -


1) this article seems to suggest a reboot or powering off a host will end 
result in the VMs being migrated and this was on CS v 4.2.1 back in 2013 so 
does Host HA do something different


2) Whenever one of my two nodes is taken down in testing the active compute 
nodes HA status goes from Available to Ineligible. Should this happen ie. is it 
going to Ineligible stopping the manager from migrating the VMs.


Apologies for all the questions but I just can't get this to work at the 
moment. If I do eventually get it working I will do a write up for others with 
same issue :)



From: Paul Angus <paul.an...@shapeblue.com>
Sent: 14 March 2018 07:45
To: users@cloudstack.apache.org
Subject: RE: KVM HostHA

Hi Parth,

Two answer your questions, VM-HA does not restart VMs on an alternate host if 
the original host goes down.  The management server (without host-HA) cannot 
tell what happened to the host.  It cannot tell if there was a failure in the 
agent, loss of connectivity to the management NIC or if the host is truly down. 
 In the first two scenarios, the guest VMs can still be running perfectly well, 
and to restart them elsewhere would be very dangerous.  Therefore, the correct 
thing to do is - nothing but alert the operator.  These scenarios are what 
Host-HA was introduced for.

Wrt to STONITH, if no disk activity is detected on the host, host-HA will try 
to restart (via IPMI) the host. If, after a configurable number of attempts, 
the host agent still does not check in, then host-HA will shut down the host 
(via IPMA), trigger VM-HA and mark the host as in-maintenance.



paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-
From: Parth Patel <parthpatel2...@gmail.com>
Sent: 14 March 2018 05:05
To: users@cloudstack.apache.org
Subject: Re: KVM HostHA

Hi Paul,

Thanks for the clarification. I currently don't have an ipmi enabled hardware 
(in test environment), but it will be beneficial if you can help me clear out 
some basic concepts of it:
- If HA-enabled VMs are autostarted on another host when current host goes 
down, what is the need or purpose of HA-host? (other than management server 
able to remotely control it's power interfaces)
- I understood the "Shoot-the-other-node-in-the-head" (STONITH) approach ACS 
uses to fence the host, but I couldn't find what mechanism or events trigger 
this?

Thanks and regards,
Parth Patel

On Wed, 14 Mar 2018 at 02:22 Paul Angus <paul.an...@shapeblue.com> wrote:

> The management server doesn't ping the host through IPMI.   However if
> IPMI is not available, you will not be able to use Host HA, as there
> is no way for CloudStack to 'fence' the host - that is shut it down to
> be sure that a VM cannot start again on that host.
>
> I can explain why that is necessary if you wish.
>
>
> Kind regards,
>
> Paul Angus
>
> paul.an...@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue
>
>
>
>
> -Original Message-
> From: Parth Patel <parthpatel2...@gmail.com>
> Sent: 13 March 2018 16:57
> To: users@cloudstack.apache.org
> Cc: Jon Marshall <jms@hotmail.co.uk>
> Subject: Re: KVM HostHA
>
> Hi Jon and Victor,
>
> I think the management server pings your host using ipmi (I really don't
> hope this is the case).
> In my case, I did not have OOBM enabled at all (my hardware didn't support
> it)
> I think you could disable OOBM and/or HA-Host and give that a try :)
>
> On Tue, 13 Mar 2018 at 20:40 victor <vic...@ihnetworks.com> wrote:
>
> > Hello Guys,
> >
> > I have tried the following two cases.
> >
> > 1, "echo c > /proc/sysrq-trigger"
> >
> > 2, Pull

Re: KVM HostHA

2018-03-13 Thread Jon Marshall
Update on below.


I pulled the NICs for both management and storage from cnode 1.


1) The UI immediately showed the power state as Unknown but the state was Up.

2) The HA state on cnode1 showed as suspect. The HA state on cnode2 showed as 
available.

3) After about 4 mins the state on cnode1 went from Up to Alert

4) The HA state on cnode1 showed as Fencing and the HA state on cnode2 showed 
as Ineligible.


The HA enabled VMs on cnode1 never switched over to the working node cnode2.


Any ideas ?



From: Jon Marshall <jms@hotmail.co.uk>
Sent: 13 March 2018 10:50
To: users@cloudstack.apache.org
Subject: Re: KVM HostHAtot stop the server responding to an ipmitool request on 
the manager eg -


"ipmitool -I lanplus -H 172.16.7.29 -U admin3 -P letmein chassis status"


from the management server got an answer saying the chassis power was on so CS 
never registered the compute node as down.


I am obviously doing something wrong but cannot work it out.


The management server has one NIC - 172.16.7.4


Each compute node has 3 NICs -


   cnode1cnode2


mangement NIC172.16.7.5   172.16.7.6

vm NIC  172.16.6.130 172.16.6.131

storage - 172.16.250.4   172.16.250.5


Dell LOM (for Idrac)   172.16.7.29172.16.7.30


the dell LOM IPs are the ones used to configure OOBM  in the UI



If I pull the storage NIC presumably nothing will happen as the ipmitool check 
is running across the management NIC so I need to pull both ?

My understanding of host HA was the management server monitored the compute 
nodes using ipmitool and if it did not get a response because the host was down 
it would fence off that host and move the VMs to an active compute node.

This is obviously too simplistic so could someone explain how it is meant to 
work and what it is protecting against ?


From: Paul Angus <paul.an...@shapeblue.com>
Sent: 13 March 2018 07:01
To: users@cloudstack.apache.org
Subject: RE: KVM HostHA

Hi all,

One small note, unplugging the management NIC will only cause an HA event if 
the storage is running over that NIC also.

Is the storage is over a separate NIC then, the guest VMs will continue to run 
when the mgmt. NIC is unplugged, Host HA will detect the disk activity and 
conclude that there is nothing it can do, as the VMs are still running other 
than mark the hosts as degraded.


Kind regards,

Paul Angus

paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-
From: Parth Patel <parthpatel2...@gmail.com>
Sent: 12 March 2018 17:35
To: users@cloudstack.apache.org
Subject: Re: KVM HostHA

>
> Hi Jon,
>
> As I said, in my case, making the host HA didn't work but by just
> having a HA VM running on host and executing - (WARNING) "echo c >
> /proc/sysrq-trigger" to simulate a kernel crash on host, the
> management server registered it as down and started the VM on another
> host. I know I've suggested this before but I insist you give this a
> try. Also, you don't need to completely power off the machine manually
> but just plugging out the network cable works fine. The cloudstack
> agent after losing connection to management server auto reboots
> because of KVM heartbeat chec

Re: KVM HostHA

2018-03-13 Thread Jon Marshall
I tried "echo c > /proc/sysrq-trigger" which stopped me getting into the server 
but it did not stop the server responding to an ipmitool request on the manager 
eg -


"ipmitool -I lanplus -H 172.16.7.29 -U admin3 -P letmein chassis status"


from the management server got an answer saying the chassis power was on so CS 
never registered the compute node as down.


I am obviously doing something wrong but cannot work it out.


The management server has one NIC - 172.16.7.4


Each compute node has 3 NICs -


   cnode1cnode2


mangement NIC172.16.7.5   172.16.7.6

vm NIC  172.16.6.130 172.16.6.131

storage - 172.16.250.4   172.16.250.5


Dell LOM (for Idrac)   172.16.7.29172.16.7.30


the dell LOM IPs are the ones used to configure OOBM  in the UI



If I pull the storage NIC presumably nothing will happen as the ipmitool check 
is running across the management NIC so I need to pull both ?

My understanding of host HA was the management server monitored the compute 
nodes using ipmitool and if it did not get a response because the host was down 
it would fence off that host and move the VMs to an active compute node.

This is obviously too simplistic so could someone explain how it is meant to 
work and what it is protecting against ?


From: Paul Angus <paul.an...@shapeblue.com>
Sent: 13 March 2018 07:01
To: users@cloudstack.apache.org
Subject: RE: KVM HostHA

Hi all,

One small note, unplugging the management NIC will only cause an HA event if 
the storage is running over that NIC also.

Is the storage is over a separate NIC then, the guest VMs will continue to run 
when the mgmt. NIC is unplugged, Host HA will detect the disk activity and 
conclude that there is nothing it can do, as the VMs are still running other 
than mark the hosts as degraded.


Kind regards,

Paul Angus

paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-
From: Parth Patel <parthpatel2...@gmail.com>
Sent: 12 March 2018 17:35
To: users@cloudstack.apache.org
Subject: Re: KVM HostHA

>
> Hi Jon,
>
> As I said, in my case, making the host HA didn't work but by just
> having a HA VM running on host and executing - (WARNING) "echo c >
> /proc/sysrq-trigger" to simulate a kernel crash on host, the
> management server registered it as down and started the VM on another
> host. I know I've suggested this before but I insist you give this a
> try. Also, you don't need to completely power off the machine manually
> but just plugging out the network cable works fine. The cloudstack
> agent after losing connection to management server auto reboots
> because of KVM heartbeat check shell script mentioned by Rohit Yadav
> to one of my earlier queries in other thread.
>
> On Mon 12 Mar, 2018, 21:23 Jon Marshall, <jms@hotmail.co.uk> wrote:
> Hi Paul
>
>
> Thanks for the response.
>
>
> I think I am not understanding how it was meant to work then. My
> understanding was that the manager used ipmitool to just keep querying
> the compute nodes as to their status so I assumed it didn't matter how
> you shut the node down, once it was down the manager would get no
> response and mark it as down (which it does).
>
>
> I am in testing mode so I think I will just go and pull the power and
> see what happens :)
>
>
> Thanks
>
>
> Jon
>
>
> 
> From: Paul Angus <paul.an...@shapeblue.com>
> Sent: 12 March 2018 15:31
> To: users@cloudstack.apache.org
> Subject: RE: KVM HostHA
> Hi Jon,
>
> I think that what you guys are finding, is that a controlled host
> shutdown, which will cause the agent to shutdown cleanly; Is not
> considered an HA event. I wouldn't expect CloudStack to take any
> action if you shut down a host, only if the host (agent) stops responding.
>
>
>
>
> Kind regards,
>
> Paul Angus
>
> paul.an...@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
fram

Re: KVM HostHA

2018-03-12 Thread Jon Marshall
Hi Paul


Thanks for the response.


I think I am not understanding how it was meant to work then.  My understanding 
was that the manager used ipmitool to just keep querying the compute nodes as 
to their status so I assumed it didn't matter how you shut the node down, once 
it was down the manager would get no response and mark it as down (which it 
does).


I am in testing mode so I think I will just go and pull the power and see what 
happens :)


Thanks


Jon



From: Paul Angus <paul.an...@shapeblue.com>
Sent: 12 March 2018 15:31
To: users@cloudstack.apache.org
Subject: RE: KVM HostHA
 Hi Jon,

I think that what you guys are finding, is that a controlled host shutdown, 
which will cause the agent to shutdown cleanly;  Is not considered an HA event. 
 I wouldn't expect CloudStack to take any action if you shut down a host, only 
if the host (agent) stops responding.




Kind regards,

Paul Angus

paul.an...@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/>

Shapeblue - The CloudStack Company<http://www.shapeblue.com/>
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue




-Original Message-
From: Jon Marshall <jms@hotmail.co.uk>
Sent: 12 March 2018 15:15
To: users@cloudstack.apache.org
Subject: Re: KVM HostHA

I have the same issue here and am not entirely sure what the behaviour should 
be.


I have one manager node and 2 compute nodes running 4.11 with ipmi working 
correctly.


>From the UI under HA -


HA Enabled   Yes
HA State Available
HA Provider  kvmhaprovider


although interestingly from the "Details" tab it shows -


HA enabled   No


which I assume is a cosmetic issue ?


On each compute node I have one HA enabled VM and one non HA enabled VM.


I power off a compute node and the UI updates the host status and the VMs on 
that node stop responding but they never fail over to the other node.


Couple of things I noticed -


1) as soon as i power off the compute node the HA state on the other node shows 
"Ineligible"


2) In the UI the instances all still show as green even though two of them are 
not available


Any help much appreciated





From: victor <vic...@ihnetworks.com>
Sent: 07 March 2018 17:01
To: users@cloudstack.apache.org
Subject: KVM HostHA

Hello Guys,

I have installed cloudstack 4.11. I have enabled HA for each hosts I have 
added. I have also added ipmi successfully (using ipmi driver).
The hosts are showing like the following.

===

HA Enabled   Yes
HA State Available
HA Provider  kvmhaprovider

==

Also the host is showing the following correctly

Resource state --> Enabled
State --> UP
Power state --> On

So I have shutdown one of the hosts to see how the KVM hosts Ha is working.  I 
have waited for half an hour. But nothing has happened. What will happen to the 
VM's in that host, if the host failed to back up.
There isn't much from logs.

Regards
Victor


Re: KVM HostHA

2018-03-12 Thread Jon Marshall
I have the same issue here and am not entirely sure what the behaviour should 
be.


I have one manager node and 2 compute nodes running 4.11 with ipmi working 
correctly.


>From the UI under HA -


HA Enabled   Yes
HA State Available
HA Provider  kvmhaprovider


although interestingly from the "Details" tab it shows -


HA enabled   No


which I assume is a cosmetic issue ?


On each compute node I have one HA enabled VM and one non HA enabled VM.


I power off a compute node and the UI updates the host status and the VMs on 
that node stop responding but they never fail over to the other node.


Couple of things I noticed -


1) as soon as i power off the compute node the HA state on the other node shows 
"Ineligible"


2) In the UI the instances all still show as green even though two of them are 
not available


Any help much appreciated





From: victor 
Sent: 07 March 2018 17:01
To: users@cloudstack.apache.org
Subject: KVM HostHA

Hello Guys,

I have installed cloudstack 4.11. I have enabled HA for each hosts I
have added. I have also added ipmi successfully (using ipmi driver).
The hosts are showing like the following.

===

HA Enabled   Yes
HA State Available
HA Provider  kvmhaprovider

==

Also the host is showing the following correctly

Resource state --> Enabled
State --> UP
Power state --> On

So I have shutdown one of the hosts to see how the KVM hosts Ha is
working.  I have waited for half an hour. But nothing has happened. What
will happen to the VM's in that host, if the host failed to back up.
There isn't much from logs.

Regards
Victor


System VMs and bridge connections

2018-01-26 Thread Jon Marshall
Can someone tell me where I am going wrong or if this is possible (apologies 
for the long post)



I have configured the management server as per installation instructions with 
just an interface in the management network using subnet 172.16.7.0/27



I then configured a host with 3 separate NICs –



  1.  Management interface with IP from same subnet as management server IP
  2.  Second NIC using a subnet of 172.16.6.128/25. This is meant to be the 
subnet for the VMs.
  3.  Third NIC with an IP from the 172.16.232.0/28 subnet which is where the 
NFS server is.



I am using KVM so configured Linux bridges eg cloudbr0 for 1), cloudbr1 for 2) 
and cloudbr2 for 3).



I then connected to the UI and did the basic setup.



It worked in that the host showed as up and the system VMs came up but neither 
system VM was working properly so I logged into both and saw the same problem.



The VMs had picked up an IP from both the management network and the VM subnet 
eg.



Eth1 – 172.16.7.10

Eth2 – 172.16.6.177



The default gateway was 172.16.6.129 ie. From the VM subnet. But neither VM 
could ping that default gateway.



When I looked at the bridges on the host the mac address of eth2 was seen on 
cloudbr0 which is the management subnet. When I then logged into the physical 
L3 switch I could see eth2’s mac address in the management vlan and not the VM 
subnet vlan.



So it seems like the bridging between the VMs and the physical NICs is not 
working properly or more likely there is something basic I am not understanding.



Should I be looking to use advanced networking or is the above setup possible 
with just basic network.



I am using cloudstack v4.10 and feel a bit of an idiot as all the docs say 
setting up basic networking is really easy   (I did do an install where it is 
all on the same server and that worked fine).



Any pointers much appreciated.



PS.  I cannot console to the system VMs because of the above and the SSVM does 
not have interface in the NFS network even though there is a physical NIC on 
the host.