Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

2018-01-17 Thread Nux!
Mike, 

Run iptables-save on the hypervisor running an actual VM, from the rules above 
it looks like you are not running any (except system VMs). If you are running a 
VM there, then something seems horribly wrong with the security groups. 

Another way to check for firewall issues is to disable it altogether, not sure 
how Ubuntu handles that, but you can use this little script[1]. If once you do 
that your problems go away, then it's a firewall issue.

[1] - http://dl.nux.ro/utils/iptflush.sh

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Tutkowski, Mike" 
> To: "dev" 
> Sent: Tuesday, 16 January, 2018 20:31:23
> Subject: Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

> Hi,
> 
> Here is the results of iptables-save (ebtables-save appears not to be
> installed):
> 
> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
> *nat
> :PREROUTING ACCEPT [1914053:9571571583]
> :INPUT ACCEPT [206:3]
> :OUTPUT ACCEPT [4822:348457]
> :POSTROUTING ACCEPT [7039:610037]
> -A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
> -A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j MASQUERADE
> --to-ports 1024-65535
> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j MASQUERADE
> --to-ports 1024-65535
> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j MASQUERADE
> COMMIT
> # Completed on Tue Jan 16 13:23:25 2018
> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
> *mangle
> :PREROUTING ACCEPT [5214518:18468052456]
> :INPUT ACCEPT [2635017:8841915309]
> :FORWARD ACCEPT [214137:32291562]
> :OUTPUT ACCEPT [4343524:27594835296]
> :POSTROUTING ACCEPT [4558131:27627145644]
> -A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
> COMMIT
> # Completed on Tue Jan 16 13:23:25 2018
> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
> *filter
> :INPUT ACCEPT [884752:56694574]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [886649:47348857]
> :BF-cloudbr0 - [0:0]
> :BF-cloudbr0-IN - [0:0]
> :BF-cloudbr0-OUT - [0:0]
> :r-318-VM - [0:0]
> :s-316-VM - [0:0]
> :v-315-VM - [0:0]
> -A INPUT -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
> -A INPUT -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
> -A INPUT -i virbr0 -p udp -m udp --dport 67 -j ACCEPT
> -A INPUT -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT
> -A FORWARD -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate
> RELATED,ESTABLISHED -j ACCEPT
> -A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT
> -A FORWARD -i virbr0 -o virbr0 -j ACCEPT
> -A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable
> -A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable
> -A FORWARD -o cloudbr0 -m physdev --physdev-is-bridged -j BF-cloudbr0
> -A FORWARD -i cloudbr0 -m physdev --physdev-is-bridged -j BF-cloudbr0
> -A FORWARD -o cloudbr0 -j DROP
> -A FORWARD -i cloudbr0 -j DROP
> -A OUTPUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT
> -A BF-cloudbr0 -m state --state RELATED,ESTABLISHED -j ACCEPT
> -A BF-cloudbr0 -m physdev --physdev-is-in --physdev-is-bridged -j 
> BF-cloudbr0-IN
> -A BF-cloudbr0 -m physdev --physdev-is-out --physdev-is-bridged -j
> BF-cloudbr0-OUT
> -A BF-cloudbr0 -m physdev --physdev-out eth0 --physdev-is-bridged -j ACCEPT
> -A BF-cloudbr0-IN -m physdev --physdev-in vnet1 --physdev-is-bridged -j 
> v-315-VM
> -A BF-cloudbr0-IN -m physdev --physdev-in vnet2 --physdev-is-bridged -j 
> v-315-VM
> -A BF-cloudbr0-IN -m physdev --physdev-in vnet4 --physdev-is-bridged -j 
> s-316-VM
> -A BF-cloudbr0-IN -m physdev --physdev-in vnet5 --physdev-is-bridged -j 
> s-316-VM
> -A BF-cloudbr0-IN -m physdev --physdev-in vnet6 --physdev-is-bridged -j 
> r-318-VM
> -A BF-cloudbr0-OUT -m physdev --physdev-out vnet1 --physdev-is-bridged -j
> v-315-VM
> -A BF-cloudbr0-OUT -m physdev --physdev-out vnet2 --physdev-is-bridged -j
> v-315-VM
> -A BF-cloudbr0-OUT -m physdev --physdev-out vnet4 --physdev-is-bridged -j
> s-316-VM
> -A BF-cloudbr0-OUT -m physdev --physdev-out vnet5 --physdev-is-bridged -j
> s-316-VM
> -A BF-cloudbr0-OUT -m physdev --physdev-out vnet6 --physdev-is-bridged -j
> r-318-VM
> -A r-318-VM -m physdev --physdev-in vnet6 --physdev-is-bridged -j RETURN
> -A r-318-VM -j ACCEPT
> -A s-316-VM -m physdev --physdev-in vnet4 --physdev-is-bridged -j RETURN
> -A s-316-VM -m physdev --physdev-in vnet5 --physdev-is-bridged -j RETURN
> -A s-316-VM -j ACCEPT
> -A v-315-VM -m physdev --physdev-in vnet1 --physdev-is-bridged -j RETURN
> -A v-315-VM -m physdev --physdev-in vnet2 --physdev-is-bridged -j RETURN
> -A v-315-VM -j ACCEPT
> COMMIT
> # Completed on Tue Jan 16 13:23:25 2018
> 
> Thanks!
> Mike
> 
> On 1/16/18, 1:32 AM, "Nux!"  wrote:
> 
>Hi Mike,
>
>First thing to check would be the firewall on the hypervisor.
>Can you paste the output of iptables-save and ebtables-save ?
>
>--
>Sent from the Delta

Re: HA issues

2018-01-17 Thread Rohit Yadav
Hi Lucian,


The "Host HA" feature is entirely different from VM HA, however, they may work 
in tandem, so please stop using the terms interchangeably as it may cause the 
community to believe a regression has been caused.


The "Host HA" feature currently ships with only "Host HA" provider for KVM that 
is strictly tied to out-of-band management (IPMI for fencing, i.e power off and 
recovery, i.e. reboot) and NFS (as primary storage). (We also have a provider 
for simulator, but that's for coverage/testing purposes).


Therefore, "Host HA" for KVM (+nfs) currently works only when OOBM is enabled. 
The frameowkr allows interested parties may write their own HA providers for a 
hypervisor that can use a different strategy/mechanism for fencing/recovery of 
hosts (including write a non-IPMI based OOBM plugin) and host/disk activity 
checker that is non-NFS based.


The "Host HA" feature ships disabled by default and does not cause any 
interference with VM HA. However, when enabled and configured correctly, it is 
a known limitation that when it is unable to successfully perform recovery or 
fencing tasks it may not trigger VM HA. We can discuss how to handle such cases 
(thoughts?). "Host HA" would try couple of times to recover and failing to do 
so, it would eventually trigger a host fencing task. If it's unable to fence a 
host, it will indefinitely attempt to fence the host (the host state will be 
stuck at fencing state in cloud.ha_config table for example) and alerts will be 
sent to admin who can do some manual intervention to handle such situations (if 
you've email/smtp enabled, you should see alert emails).


We can discuss how to improve and have a workaround for the case you've hit, 
thanks for sharing.


- Rohit


From: Nux! 
Sent: Tuesday, January 16, 2018 10:42:35 PM
To: dev
Subject: Re: HA issues

Ok, reinstalled and re-tested.

What I've learned:

- HA only works now if OOB is configured, the old way HA no longer applies - 
this can be good and bad, not everyone has IPMIs

- HA only works if IPMI is reachable. I've pulled the cord on a HV and HA 
failed to do its thing, leaving me with a HV down along with all the VMs 
running there. That's bad.
I've opened this ticket for it:
https://issues.apache.org/jira/browse/CLOUDSTACK-10234

Let me know if you need any extra info or stuff to test.

Regards,
Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro


rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

- Original Message -
> From: "Nux!" 
> To: "dev" 
> Sent: Tuesday, 16 January, 2018 11:35:58
> Subject: Re: HA issues

> I'll reinstall my setup and try again, just to be sure I'm working on a clean
> slate.
>
> --
> Sent from the Delta quadrant using Borg technology!
>
> Nux!
> www.nux.ro
>
> - Original Message -
>> From: "Rohit Yadav" 
>> To: "dev" 
>> Sent: Tuesday, 16 January, 2018 11:29:51
>> Subject: Re: HA issues
>
>> Hi Lucian,
>>
>>
>> If you're talking about the new HostHA feature (with KVM+nfs+ipmi), please 
>> refer
>> to following docs:
>>
>> http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/latest/hosts.html#out-of-band-management
>>
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
>>
>>
>> We'll need to you look at logs perhaps create a JIRA ticket with the logs and
>> details? If you saw ipmi based reboot, then host-ha indeed tried to recover
>> i.e. reboot the host, once hostha has done its work it would schedule HA for 
>> VM
>> as soon as the recovery operation succeeds (we've simulator and kvm based
>> marvin tests for such scenarios).
>>
>>
>> Can you see it making attempt to schedule VM ha in logs, or any failure?
>>
>>
>> - Rohit
>>
>> 
>>
>>
>>
>> 
>> From: Nux! 
>> Sent: Tuesday, January 16, 2018 12:47:56 AM
>> To: dev
>> Subject: [4.11] HA issues
>>
>> Hi,
>>
>> I see there's a new HA engine for KVM and IPMI support which is really nice,
>> however it seems hit and miss.
>> I have created an instance with HA offering, kernel panicked one of the
>> hypervisors - after a while the server was rebooted via IPMI probably, but 
>> the
>> instance never moved to a running hypervisor and even after the original
>> hypervisor came back it was still left in Stopped state.
>> Is there any extra things I need to set up to have proper HA?
>>
>> Regards,
>> Lucian
>>
>> --
>> Sent from the Delta quadrant using Borg technology!
>>
>> Nux!
>> www.nux.ro
>>
>> rohit.ya...@shapeblue.com
>> www.shapeblue.com
>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapeblue


Re: HA issues

2018-01-17 Thread Nux!
Right, sorry for using the terms interchangeably, I see what you mean.

I'll do further testing then as VM HA was also not working in my setup.

I'll be back.

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Rohit Yadav" 
> To: "dev" 
> Sent: Wednesday, 17 January, 2018 09:09:19
> Subject: Re: HA issues

> Hi Lucian,
> 
> 
> The "Host HA" feature is entirely different from VM HA, however, they may work
> in tandem, so please stop using the terms interchangeably as it may cause the
> community to believe a regression has been caused.
> 
> 
> The "Host HA" feature currently ships with only "Host HA" provider for KVM 
> that
> is strictly tied to out-of-band management (IPMI for fencing, i.e power off 
> and
> recovery, i.e. reboot) and NFS (as primary storage). (We also have a provider
> for simulator, but that's for coverage/testing purposes).
> 
> 
> Therefore, "Host HA" for KVM (+nfs) currently works only when OOBM is enabled.
> The frameowkr allows interested parties may write their own HA providers for a
> hypervisor that can use a different strategy/mechanism for fencing/recovery of
> hosts (including write a non-IPMI based OOBM plugin) and host/disk activity
> checker that is non-NFS based.
> 
> 
> The "Host HA" feature ships disabled by default and does not cause any
> interference with VM HA. However, when enabled and configured correctly, it is
> a known limitation that when it is unable to successfully perform recovery or
> fencing tasks it may not trigger VM HA. We can discuss how to handle such 
> cases
> (thoughts?). "Host HA" would try couple of times to recover and failing to do
> so, it would eventually trigger a host fencing task. If it's unable to fence a
> host, it will indefinitely attempt to fence the host (the host state will be
> stuck at fencing state in cloud.ha_config table for example) and alerts will 
> be
> sent to admin who can do some manual intervention to handle such situations 
> (if
> you've email/smtp enabled, you should see alert emails).
> 
> 
> We can discuss how to improve and have a workaround for the case you've hit,
> thanks for sharing.
> 
> 
> - Rohit
> 
> 
> From: Nux! 
> Sent: Tuesday, January 16, 2018 10:42:35 PM
> To: dev
> Subject: Re: HA issues
> 
> Ok, reinstalled and re-tested.
> 
> What I've learned:
> 
> - HA only works now if OOB is configured, the old way HA no longer applies -
> this can be good and bad, not everyone has IPMIs
> 
> - HA only works if IPMI is reachable. I've pulled the cord on a HV and HA 
> failed
> to do its thing, leaving me with a HV down along with all the VMs running
> there. That's bad.
> I've opened this ticket for it:
> https://issues.apache.org/jira/browse/CLOUDSTACK-10234
> 
> Let me know if you need any extra info or stuff to test.
> 
> Regards,
> Lucian
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro
> 
> 
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>  
> 
> 
> - Original Message -
>> From: "Nux!" 
>> To: "dev" 
>> Sent: Tuesday, 16 January, 2018 11:35:58
>> Subject: Re: HA issues
> 
>> I'll reinstall my setup and try again, just to be sure I'm working on a clean
>> slate.
>>
>> --
>> Sent from the Delta quadrant using Borg technology!
>>
>> Nux!
>> www.nux.ro
>>
>> - Original Message -
>>> From: "Rohit Yadav" 
>>> To: "dev" 
>>> Sent: Tuesday, 16 January, 2018 11:29:51
>>> Subject: Re: HA issues
>>
>>> Hi Lucian,
>>>
>>>
>>> If you're talking about the new HostHA feature (with KVM+nfs+ipmi), please 
>>> refer
>>> to following docs:
>>>
>>> http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/latest/hosts.html#out-of-band-management
>>>
>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
>>>
>>>
>>> We'll need to you look at logs perhaps create a JIRA ticket with the logs 
>>> and
>>> details? If you saw ipmi based reboot, then host-ha indeed tried to recover
>>> i.e. reboot the host, once hostha has done its work it would schedule HA 
>>> for VM
>>> as soon as the recovery operation succeeds (we've simulator and kvm based
>>> marvin tests for such scenarios).
>>>
>>>
>>> Can you see it making attempt to schedule VM ha in logs, or any failure?
>>>
>>>
>>> - Rohit
>>>
>>> 
>>>
>>>
>>>
>>> 
>>> From: Nux! 
>>> Sent: Tuesday, January 16, 2018 12:47:56 AM
>>> To: dev
>>> Subject: [4.11] HA issues
>>>
>>> Hi,
>>>
>>> I see there's a new HA engine for KVM and IPMI support which is really nice,
>>> however it seems hit and miss.
>>> I have created an instance with HA offering, kernel panicked one of the
>>> hypervisors - after a while the server was rebooted via IPMI probably, but 
>>> the
>>> instance never moved to a running hypervisor and even after the original
>>> hypervisor came back it was still left in Stopped 

Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

2018-01-17 Thread Rohit Yadav
Mike,


I tested basic zone KVM (with and without SG enabled, and with/without local 
storage enabled)  and could not reproduce the issue. My guest VMs got 
ip/hostname from dnsmasq on the shared guest network VR. I tested it using 
CentOS7 based KVM box [1], but not with Ubuntu based boxes. It's quite possible 
the issue you're seeing may be environment/configuration related.


[1] monkeybox: https://github.com/rhtyd/monkeybox


- Rohit






From: Nux! 
Sent: Wednesday, January 17, 2018 2:34:24 PM
To: dev
Subject: Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

Mike,

Run iptables-save on the hypervisor running an actual VM, from the rules above 
it looks like you are not running any (except system VMs). If you are running a 
VM there, then something seems horribly wrong with the security groups.

Another way to check for firewall issues is to disable it altogether, not sure 
how Ubuntu handles that, but you can use this little script[1]. If once you do 
that your problems go away, then it's a firewall issue.

[1] - http://dl.nux.ro/utils/iptflush.sh

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro


rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

- Original Message -
> From: "Tutkowski, Mike" 
> To: "dev" 
> Sent: Tuesday, 16 January, 2018 20:31:23
> Subject: Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

> Hi,
>
> Here is the results of iptables-save (ebtables-save appears not to be
> installed):
>
> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
> *nat
> :PREROUTING ACCEPT [1914053:9571571583]
> :INPUT ACCEPT [206:3]
> :OUTPUT ACCEPT [4822:348457]
> :POSTROUTING ACCEPT [7039:610037]
> -A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
> -A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j MASQUERADE
> --to-ports 1024-65535
> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j MASQUERADE
> --to-ports 1024-65535
> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j MASQUERADE
> COMMIT
> # Completed on Tue Jan 16 13:23:25 2018
> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
> *mangle
> :PREROUTING ACCEPT [5214518:18468052456]
> :INPUT ACCEPT [2635017:8841915309]
> :FORWARD ACCEPT [214137:32291562]
> :OUTPUT ACCEPT [4343524:27594835296]
> :POSTROUTING ACCEPT [4558131:27627145644]
> -A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
> COMMIT
> # Completed on Tue Jan 16 13:23:25 2018
> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
> *filter
> :INPUT ACCEPT [884752:56694574]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [886649:47348857]
> :BF-cloudbr0 - [0:0]
> :BF-cloudbr0-IN - [0:0]
> :BF-cloudbr0-OUT - [0:0]
> :r-318-VM - [0:0]
> :s-316-VM - [0:0]
> :v-315-VM - [0:0]
> -A INPUT -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
> -A INPUT -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
> -A INPUT -i virbr0 -p udp -m udp --dport 67 -j ACCEPT
> -A INPUT -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT
> -A FORWARD -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate
> RELATED,ESTABLISHED -j ACCEPT
> -A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT
> -A FORWARD -i virbr0 -o virbr0 -j ACCEPT
> -A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable
> -A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable
> -A FORWARD -o cloudbr0 -m physdev --physdev-is-bridged -j BF-cloudbr0
> -A FORWARD -i cloudbr0 -m physdev --physdev-is-bridged -j BF-cloudbr0
> -A FORWARD -o cloudbr0 -j DROP
> -A FORWARD -i cloudbr0 -j DROP
> -A OUTPUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT
> -A BF-cloudbr0 -m state --state RELATED,ESTABLISHED -j ACCEPT
> -A BF-cloudbr0 -m physdev --physdev-is-in --physdev-is-bridged -j 
> BF-cloudbr0-IN
> -A BF-cloudbr0 -m physdev --physdev-is-out --physdev-is-bridged -j
> BF-cloudbr0-OUT
> -A BF-cloudbr0 -m physdev --physdev-out eth0 --physdev-is-bridged -j ACCEPT
> -A BF-cloudbr0-IN -m physdev --physdev-in vnet1 --physdev-is-bridged -j 
> v-315-VM
> -A BF-cloudbr0-IN -m physdev --physdev-in vnet2 --physdev-is-bridged -j 
> v-315-VM
> -A BF-cloudbr0-IN -m physdev --physdev-in vnet4 --physdev-is-bridged -j 
> s-316-VM
> -A BF-cloudbr0-IN -m physdev --physdev-in vnet5 --physdev-is-bridged -j 
> s-316-VM
> -A BF-cloudbr0-IN -m physdev --physdev-in vnet6 --physdev-is-bridged -j 
> r-318-VM
> -A BF-cloudbr0-OUT -m physdev --physdev-out vnet1 --physdev-is-bridged -j
> v-315-VM
> -A BF-cloudbr0-OUT -m physdev --physdev-out vnet2 --physdev-is-bridged -j
> v-315-VM
> -A BF-cloudbr0-OUT -m physdev --physdev-out vnet4 --physdev-is-bridged -j
> s-316-VM
> -A BF-cloudbr0-OUT -m physdev --physdev-out vnet5 --physdev-is-bridged -j
> s-316-VM
> -A BF-cloudbr0-OUT -m physdev --physdev-out vnet6 --physdev-is-bridged -j
> r-318-VM

Re: HA issues

2018-01-17 Thread Rohit Yadav
I performed VM HA sanity checks and was not able to reproduce any regression 
against two KVM CentOS7 hosts in a cluster.


Without the "Host HA" feature, I deployed few HA-enabled VMs on a KVM host2 and 
killed it (powered off). After few minutes of CloudStack attempting to find why 
the host (kvm agent) timed out, CloudStack kicked investigators, that 
eventually led KVM fencers to work and VM HA job kicked to start those few VMs 
on host1 and the KVM host2 was put to "Down" state.


- Rohit







rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

From: Rohit Yadav
Sent: Wednesday, January 17, 2018 2:39:19 PM
To: dev
Subject: Re: HA issues


Hi Lucian,


The "Host HA" feature is entirely different from VM HA, however, they may work 
in tandem, so please stop using the terms interchangeably as it may cause the 
community to believe a regression has been caused.


The "Host HA" feature currently ships with only "Host HA" provider for KVM that 
is strictly tied to out-of-band management (IPMI for fencing, i.e power off and 
recovery, i.e. reboot) and NFS (as primary storage). (We also have a provider 
for simulator, but that's for coverage/testing purposes).


Therefore, "Host HA" for KVM (+nfs) currently works only when OOBM is enabled. 
The frameowkr allows interested parties may write their own HA providers for a 
hypervisor that can use a different strategy/mechanism for fencing/recovery of 
hosts (including write a non-IPMI based OOBM plugin) and host/disk activity 
checker that is non-NFS based.


The "Host HA" feature ships disabled by default and does not cause any 
interference with VM HA. However, when enabled and configured correctly, it is 
a known limitation that when it is unable to successfully perform recovery or 
fencing tasks it may not trigger VM HA. We can discuss how to handle such cases 
(thoughts?). "Host HA" would try couple of times to recover and failing to do 
so, it would eventually trigger a host fencing task. If it's unable to fence a 
host, it will indefinitely attempt to fence the host (the host state will be 
stuck at fencing state in cloud.ha_config table for example) and alerts will be 
sent to admin who can do some manual intervention to handle such situations (if 
you've email/smtp enabled, you should see alert emails).


We can discuss how to improve and have a workaround for the case you've hit, 
thanks for sharing.


- Rohit


From: Nux! 
Sent: Tuesday, January 16, 2018 10:42:35 PM
To: dev
Subject: Re: HA issues

Ok, reinstalled and re-tested.

What I've learned:

- HA only works now if OOB is configured, the old way HA no longer applies - 
this can be good and bad, not everyone has IPMIs

- HA only works if IPMI is reachable. I've pulled the cord on a HV and HA 
failed to do its thing, leaving me with a HV down along with all the VMs 
running there. That's bad.
I've opened this ticket for it:
https://issues.apache.org/jira/browse/CLOUDSTACK-10234

Let me know if you need any extra info or stuff to test.

Regards,
Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Nux!" 
> To: "dev" 
> Sent: Tuesday, 16 January, 2018 11:35:58
> Subject: Re: HA issues

> I'll reinstall my setup and try again, just to be sure I'm working on a clean
> slate.
>
> --
> Sent from the Delta quadrant using Borg technology!
>
> Nux!
> www.nux.ro
>
> - Original Message -
>> From: "Rohit Yadav" 
>> To: "dev" 
>> Sent: Tuesday, 16 January, 2018 11:29:51
>> Subject: Re: HA issues
>
>> Hi Lucian,
>>
>>
>> If you're talking about the new HostHA feature (with KVM+nfs+ipmi), please 
>> refer
>> to following docs:
>>
>> http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/latest/hosts.html#out-of-band-management
>>
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
>>
>>
>> We'll need to you look at logs perhaps create a JIRA ticket with the logs and
>> details? If you saw ipmi based reboot, then host-ha indeed tried to recover
>> i.e. reboot the host, once hostha has done its work it would schedule HA for 
>> VM
>> as soon as the recovery operation succeeds (we've simulator and kvm based
>> marvin tests for such scenarios).
>>
>>
>> Can you see it making attempt to schedule VM ha in logs, or any failure?
>>
>>
>> - Rohit
>>
>> 
>>
>>
>>
>> 
>> From: Nux! 
>> Sent: Tuesday, January 16, 2018 12:47:56 AM
>> To: dev
>> Subject: [4.11] HA issues
>>
>> Hi,
>>
>> I see there's a new HA engine for KVM and IPMI support which is really nice,
>> however it seems hit and miss.
>> I have created an instance with HA offering, kernel panicked one of the
>> hypervisors - after a while the server was rebooted via IPMI probably, but 
>> the
>> instance never moved to a running hypervisor and e

Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

2018-01-17 Thread Wei ZHOU
Hi Mike,

Is dhclient installed in your vm ?
If you have the original password, try to log into the vm, and configure
the ip manually. By this way, we can see if there is issue with networking.

I faced the issue that some vms fail to fetch hostname and password because
of some miconfigurations in the vms.

-Wei

2018-01-16 21:39 GMT+01:00 Tutkowski, Mike :

> Hi Wei,
>
> No, there is no VLAN in operation here.
>
> Per your suggestion, I migrated the VM to the host that’s running the VR
> and rebooted the VM after migrating it to this host, but it still didn’t
> get its hostname or IP address.
>
> Thanks!
> Mike
>
> On 1/16/18, 1:32 AM, "Wei ZHOU"  wrote:
>
> Hi Mike,
>
> Have you configured vlan ? What if you migrate VM to same host as VR
> and
> reboot the VM ?
>
> -Wei
>
> 2018-01-15 22:36 GMT+01:00 Tutkowski, Mike  >:
>
> > Hi,
> >
> > I noticed a problem related to hostnames/IP addressing on KVM with
> RC1 for
> > 4.11.
> >
> > I have a single Basic Zone with KVM (no other hypervisor type in
> use). My
> > two KVM hosts are running on Ubuntu 14.04.
> >
> > All system VMs come up and I create a new VM whose root disk resides
> on
> > NFS (alongside the root disks of the system VMs).
> >
> > During the boot process, I see the following error:
> >
> > https://imgur.com/LdTIcb2
> >
> > When the VM has completed booting, it does not have the proper
> hostname
> > and has no IP address:
> >
> > https://imgur.com/PY47Lr8
> >
> > Thoughts?
> >
> > Thanks,
> > Mike
> >
>
>
>


Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

2018-01-17 Thread Boris Stoyanov
I think I’ve hit a blocker when upgrading to 4.11

Here’s the jira id: https://issues.apache.org/jira/browse/CLOUDSTACK-10236

I’ve upgraded from 4.5 to 4.11, then I’ve logged in with admin and got session 
expired immediately.

Regards,
Boris Stoyanov


boris.stoya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

On 17 Jan 2018, at 8:42, Tutkowski, Mike 
mailto:mike.tutkow...@netapp.com>> wrote:

Hi everyone,

For the past couple days, I have been running the KVM managed-storage 
regression-test suite against RC1.

With the exception of one issue (more on this below), all of these tests have 
passed.

Tomorrow I plan to start in on the VMware-related managed-storage tests.

Once I’ve completed running those, I expect to move on to the XenServer-related 
managed-storage tests.

I ran these XenServer and VMware tests just prior to RC1 being created, so I 
suspect all of those tests will come back successful.

Now, with regards to the one issue I found on KVM with managed storage:

It relates to a new feature whereby you can online migrate the storage of a VM 
from NFS or Ceph to managed storage.

During the code-review process, I made a change per a suggestion and it 
introduced an issue with this feature. The solution is just a couple lines of 
code and only impacts this one use case. If you are testing this release 
candidate and don’t really care about this particular feature, it should not at 
all impact your ability to test RC1.

Thanks!
Mike

On Jan 15, 2018, at 4:33 AM, Rohit Yadav 
mailto:ro...@apache.org>> wrote:

Hi All,

I've created a 4.11.0.0 release, with the following artifacts up for
testing and a vote:

Git Branch and Commit SH:
https://gitbox.apache.org/repos/asf?p=cloudstack.git;a=shortlog;h=refs/heads/4.11.0.0-RC20180115T1603
Commit: 1b8a532ba52127f388847690df70e65c6b46f4d4

Source release (checksums and signatures are available at the same
location):
https://dist.apache.org/repos/dist/dev/cloudstack/4.11.0.0/

PGP release keys (signed using 5ED1E1122DC5E8A4A45112C2484248210EE3D884):
https://dist.apache.org/repos/dist/release/cloudstack/KEYS

The vote will be open for 72 hours.

For sanity in tallying the vote, can PMC members please be sure to indicate
"(binding)" with their vote?

[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)

Additional information:

For users' convenience, I've built packages from
1b8a532ba52127f388847690df70e65c6b46f4d4 and published RC1 repository here:
http://cloudstack.apt-get.eu/testing/4.11-rc1

The release notes are still work-in-progress, but the systemvmtemplate
upgrade section has been updated. You may refer the following for
systemvmtemplate upgrade testing:
http://docs.cloudstack.apache.org/projects/cloudstack-release-notes/en/latest/index.html

4.11 systemvmtemplates are available from here:
https://download.cloudstack.org/systemvm/4.11/

Regards,
Rohit Yadav



Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

2018-01-17 Thread Kris Sterckx
4.11.0 looks like an awesome reason !  Special thanks to Rohit !

I vote +0

-  I vote for including CLOUDSTACK-9749 [1] into 4.11.0 still

-  And if that is accepted, I vote for including CLOUDSTACK-10233 [2] also
(Nuage-internal fix)

thanks

Kris

[1] https://issues.apache.org/jira/browse/CLOUDSTACK-9749
[2] https://issues.apache.org/jira/browse/CLOUDSTACK-10233


On 15 January 2018 at 12:32, Rohit Yadav  wrote:

> Hi All,
>
> I've created a 4.11.0.0 release, with the following artifacts up for
> testing and a vote:
>
> Git Branch and Commit SH:
> https://gitbox.apache.org/repos/asf?p=cloudstack.git;a=
> shortlog;h=refs/heads/4.11.0.0-RC20180115T1603
> Commit: 1b8a532ba52127f388847690df70e65c6b46f4d4
>
> Source release (checksums and signatures are available at the same
> location):
> https://dist.apache.org/repos/dist/dev/cloudstack/4.11.0.0/
>
> PGP release keys (signed using 5ED1E1122DC5E8A4A45112C2484248210EE3D884):
> https://dist.apache.org/repos/dist/release/cloudstack/KEYS
>
> The vote will be open for 72 hours.
>
> For sanity in tallying the vote, can PMC members please be sure to indicate
> "(binding)" with their vote?
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Additional information:
>
> For users' convenience, I've built packages from
> 1b8a532ba52127f388847690df70e65c6b46f4d4 and published RC1 repository
> here:
> http://cloudstack.apt-get.eu/testing/4.11-rc1
>
> The release notes are still work-in-progress, but the systemvmtemplate
> upgrade section has been updated. You may refer the following for
> systemvmtemplate upgrade testing:
> http://docs.cloudstack.apache.org/projects/cloudstack-
> release-notes/en/latest/index.html
>
> 4.11 systemvmtemplates are available from here:
> https://download.cloudstack.org/systemvm/4.11/
>
> Regards,
> Rohit Yadav
>


Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

2018-01-17 Thread Daan Hoogland
People, People,

a lot of us are busy with meltdown fixes and a full component test takes about 
the 72 hours that we have for our voting, I propose to extend the vote period 
until at least Monday.

Is that a good idea?

On 17/01/2018, 14:33, "Kris Sterckx"  wrote:

4.11.0 looks like an awesome reason !  Special thanks to Rohit !

I vote +0

-  I vote for including CLOUDSTACK-9749 [1] into 4.11.0 still

-  And if that is accepted, I vote for including CLOUDSTACK-10233 [2] also
(Nuage-internal fix)

thanks

Kris

[1] https://issues.apache.org/jira/browse/CLOUDSTACK-9749
[2] https://issues.apache.org/jira/browse/CLOUDSTACK-10233



daan.hoogl...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

On 15 January 2018 at 12:32, Rohit Yadav  wrote:

> Hi All,
>
> I've created a 4.11.0.0 release, with the following artifacts up for
> testing and a vote:
>
> Git Branch and Commit SH:
> https://gitbox.apache.org/repos/asf?p=cloudstack.git;a=
> shortlog;h=refs/heads/4.11.0.0-RC20180115T1603
> Commit: 1b8a532ba52127f388847690df70e65c6b46f4d4
>
> Source release (checksums and signatures are available at the same
> location):
> https://dist.apache.org/repos/dist/dev/cloudstack/4.11.0.0/
>
> PGP release keys (signed using 5ED1E1122DC5E8A4A45112C2484248210EE3D884):
> https://dist.apache.org/repos/dist/release/cloudstack/KEYS
>
> The vote will be open for 72 hours.
>
> For sanity in tallying the vote, can PMC members please be sure to 
indicate
> "(binding)" with their vote?
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Additional information:
>
> For users' convenience, I've built packages from
> 1b8a532ba52127f388847690df70e65c6b46f4d4 and published RC1 repository
> here:
> http://cloudstack.apt-get.eu/testing/4.11-rc1
>
> The release notes are still work-in-progress, but the systemvmtemplate
> upgrade section has been updated. You may refer the following for
> systemvmtemplate upgrade testing:
> http://docs.cloudstack.apache.org/projects/cloudstack-
> release-notes/en/latest/index.html
>
> 4.11 systemvmtemplates are available from here:
> https://download.cloudstack.org/systemvm/4.11/
>
> Regards,
> Rohit Yadav
>




Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

2018-01-17 Thread Tutkowski, Mike
Yes: I definitely won’t be able to complete my regression tests within the 
72-hour window. For 4.12, I plan to automate the remainder of my tests, but I’m 
not quite there with 4.11 (the vast majority of managed-storage tests are 
automated, but not yet all).

On 1/17/18, 7:34 AM, "Daan Hoogland"  wrote:

People, People,

a lot of us are busy with meltdown fixes and a full component test takes 
about the 72 hours that we have for our voting, I propose to extend the vote 
period until at least Monday.

Is that a good idea?

On 17/01/2018, 14:33, "Kris Sterckx"  wrote:

4.11.0 looks like an awesome reason !  Special thanks to Rohit !

I vote +0

-  I vote for including CLOUDSTACK-9749 [1] into 4.11.0 still

-  And if that is accepted, I vote for including CLOUDSTACK-10233 [2] 
also
(Nuage-internal fix)

thanks

Kris

[1] https://issues.apache.org/jira/browse/CLOUDSTACK-9749
[2] https://issues.apache.org/jira/browse/CLOUDSTACK-10233



daan.hoogl...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

On 15 January 2018 at 12:32, Rohit Yadav  wrote:

> Hi All,
>
> I've created a 4.11.0.0 release, with the following artifacts up for
> testing and a vote:
>
> Git Branch and Commit SH:
> https://gitbox.apache.org/repos/asf?p=cloudstack.git;a=
> shortlog;h=refs/heads/4.11.0.0-RC20180115T1603
> Commit: 1b8a532ba52127f388847690df70e65c6b46f4d4
>
> Source release (checksums and signatures are available at the same
> location):
> https://dist.apache.org/repos/dist/dev/cloudstack/4.11.0.0/
>
> PGP release keys (signed using 
5ED1E1122DC5E8A4A45112C2484248210EE3D884):
> https://dist.apache.org/repos/dist/release/cloudstack/KEYS
>
> The vote will be open for 72 hours.
>
> For sanity in tallying the vote, can PMC members please be sure to 
indicate
> "(binding)" with their vote?
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Additional information:
>
> For users' convenience, I've built packages from
> 1b8a532ba52127f388847690df70e65c6b46f4d4 and published RC1 repository
> here:
> http://cloudstack.apt-get.eu/testing/4.11-rc1
>
> The release notes are still work-in-progress, but the systemvmtemplate
> upgrade section has been updated. You may refer the following for
> systemvmtemplate upgrade testing:
> http://docs.cloudstack.apache.org/projects/cloudstack-
> release-notes/en/latest/index.html
>
> 4.11 systemvmtemplates are available from here:
> https://download.cloudstack.org/systemvm/4.11/
>
> Regards,
> Rohit Yadav
>






Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

2018-01-17 Thread Wido den Hollander



On 01/17/2018 03:34 PM, Daan Hoogland wrote:

People, People,

a lot of us are busy with meltdown fixes and a full component test takes about 
the 72 hours that we have for our voting, I propose to extend the vote period 
until at least Monday.

Is that a good idea?


Yes please :-) I won't be able to test before that window.

Wido


On 17/01/2018, 14:33, "Kris Sterckx"  wrote:

 4.11.0 looks like an awesome reason !  Special thanks to Rohit !
 
 I vote +0
 
 -  I vote for including CLOUDSTACK-9749 [1] into 4.11.0 still
 
 -  And if that is accepted, I vote for including CLOUDSTACK-10233 [2] also

 (Nuage-internal fix)
 
 thanks
 
 Kris
 
 [1] https://issues.apache.org/jira/browse/CLOUDSTACK-9749

 [2] https://issues.apache.org/jira/browse/CLOUDSTACK-10233
 
 
 
daan.hoogl...@shapeblue.com

www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
   
  


On 15 January 2018 at 12:32, Rohit Yadav  wrote:
 
 > Hi All,

 >
 > I've created a 4.11.0.0 release, with the following artifacts up for
 > testing and a vote:
 >
 > Git Branch and Commit SH:
 > https://gitbox.apache.org/repos/asf?p=cloudstack.git;a=
 > shortlog;h=refs/heads/4.11.0.0-RC20180115T1603
 > Commit: 1b8a532ba52127f388847690df70e65c6b46f4d4
 >
 > Source release (checksums and signatures are available at the same
 > location):
 > https://dist.apache.org/repos/dist/dev/cloudstack/4.11.0.0/
 >
 > PGP release keys (signed using 5ED1E1122DC5E8A4A45112C2484248210EE3D884):
 > https://dist.apache.org/repos/dist/release/cloudstack/KEYS
 >
 > The vote will be open for 72 hours.
 >
 > For sanity in tallying the vote, can PMC members please be sure to 
indicate
 > "(binding)" with their vote?
 >
 > [ ] +1  approve
 > [ ] +0  no opinion
 > [ ] -1  disapprove (and reason why)
 >
 > Additional information:
 >
 > For users' convenience, I've built packages from
 > 1b8a532ba52127f388847690df70e65c6b46f4d4 and published RC1 repository
 > here:
 > http://cloudstack.apt-get.eu/testing/4.11-rc1
 >
 > The release notes are still work-in-progress, but the systemvmtemplate
 > upgrade section has been updated. You may refer the following for
 > systemvmtemplate upgrade testing:
 > http://docs.cloudstack.apache.org/projects/cloudstack-
 > release-notes/en/latest/index.html
 >
 > 4.11 systemvmtemplates are available from here:
 > https://download.cloudstack.org/systemvm/4.11/
 >
 > Regards,
 > Rohit Yadav
 >
 



Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

2018-01-17 Thread Rene Moser
On 01/17/2018 03:34 PM, Daan Hoogland wrote:
> People, People,
> 
> a lot of us are busy with meltdown fixes and a full component test takes 
> about the 72 hours that we have for our voting, I propose to extend the vote 
> period until at least Monday.

+1

I wonder where this 72 hours windows come from... Is it just be or,
based on the amount of changes and "things" to test, I would like to
expect a window in the size of 7-14 days ...?

René


Re: [PROPOSE] EOL for supported OSes & Hypervisors

2018-01-17 Thread Ron Wheeler

It might also be helpful to know what version of ACS as well.
Some indication of your plan/desire to upgrade ACS, hypervisor, or 
management server operating system might be helpful.
There is a big difference between the situation where someone is running 
ACS 4.9x on CentOS 6 and wants to upgrade to ACS 4.12 while keeping 
CentOS 6 and another environment where the planned upgrade to ACS4.12 
will be done at the same time as an upgrade to CentOS 7.x.


Is it fair to say that any proposed changes in this area will occur in 
4.12 at the earliest and will not likely occur before summer 2018?



Ron


On 17/01/2018 4:23 AM, Paul Angus wrote:

Thanks Eric,

As you'll see from the intro email to this thread, the purpose here is to 
ensure that we don't strand a 'non-trivial' number of users by dropping support 
for any given hypervisor, or management server operating system.

Hence the request to users to let the community know what they are using, so 
that a fact-based community consensus can be reached.


Kind regards,

Paul Angus

paul.an...@shapeblue.com
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
   
  



-Original Message-
From: Eric Lee Green [mailto:eric.lee.gr...@gmail.com]
Sent: 16 January 2018 23:36
To: us...@cloudstack.apache.org
Subject: Re: [PROPOSE] EOL for supported OSes & Hypervisors


This is the type of discussion that I wanted to open - the argument
that I see for earlier dropping of v6 is that - Between May 2018 and
q2 2020 RHEL/CentOS 6.x will only receive security and mission
critical updates, meanwhile packages on which we depend or may want to
utilise in the future are been deprecated or not developed for v6.x

But this has always been the case for Centos 6.x. It is running antique 
versions of everything, and has been doing so for quite some time. It is, for 
example, running versions of Gnome and init that have been obsolete for years. 
Same deal with the version of MySQL that it comes with.

The reality is that Centos 6.x guest support, at the very least, needs to be 
tested with each new version of Cloudstack until final EOL of Centos 6 in Q2 
2020. New versions of Cloudstack with new features not supported by Centos 6 
(such as LVM support for KVM, which requires the LIO storage stack) can require 
Centos 7 or later, but the last Cloudstack version that supports Centos 6.x as 
its server host should continue to receive bug fixes until Centos 6.x is EOL.

Making someone's IT investment obsolete is a way to irrelevancy.
Cloudstack is already an also-ran in the cloud marketplace. Making someone's IT 
investment obsolete before the official EOL time for their IT investment is a 
good way to have a mass migration away from your technology.

This doesn't particularly affect me since my Centos 6 virtualization hosts are 
not running Cloudstack and are going to be re-imaged to Centos
7 before being added to the Cloudstack cluster, but ignoring the IT environment that 
people actually live in, as versus the one we wish existed, is annoying regardless. A 
friend of mine once said of the state of ERP software, "enterprise software is dog 
food if dog food was being designed by cats." I.e., the people writing the software 
rarely have any understanding of how it is actually used by real life enterprises in real 
life environments. Don't be those people.


On 01/16/2018 09:58 AM, Paul Angus wrote:

Hi Eric,

This is the type of discussion that I wanted to open - the argument
that I see for earlier dropping of v6 is that - Between May 2018 and q2 2020 
RHEL/CentOS 6.x will only receive security and mission critical updates, 
meanwhile packages on which we depend or may want to utilise in the future are 
been deprecated or not developed for v6.x Also the testing and development 
burden on the CloudStack community increases as we try to maintain backward 
compatibility while including new versions.

Needing installation documentation for centos 7 is a great point, and something 
that we need to address regardless.


Does anyone else have a view, I'd really like to here from a wide range of 
people.

Kind regards,

Paul Angus

paul.an...@shapeblue.com
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue

   



-Original Message-
From: Eric Green [mailto:eric.lee.gr...@gmail.com]
Sent: 12 January 2018 17:24
To: us...@cloudstack.apache.org
Cc: dev@cloudstack.apache.org
Subject: Re: [PROPOSE] EOL for supported OSes & Hypervisors

Official EOL for Centos 6 / RHEL 6 as declared by Red Hat Software is 
11/30/2020. Jumping the gun a bit there, padme.

People on Centos 6 should certainly be working on a migration strategy right 
now, but the end is not here *yet*. Furthermore, the install documentation is 
still written for Centos 6 rather than Centos 7. That needs to be fixed before 
discontinuing support for Centos 6, eh?


On Jan 12, 2018, at 04:35, Rohit Yadav  wrote:

+1 I've updated the page with upcoming Ubuntu 1

Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

2018-01-17 Thread Daan Hoogland
The 72 hours is to make sure all stakeholders had a chance to glance. Testing 
is supposed to have happened before. We have a culture of testing only after 
RC-cut which is part of the problem. The long duration of a single test run 
takes, is another part. And finally, in this case there is the new mindblow 
called meltdown. I think in general we should try to keep the 72 hours but for 
this release it is not realistic.

On 17/01/2018, 15:48, "Rene Moser"  wrote:

On 01/17/2018 03:34 PM, Daan Hoogland wrote:
> People, People,
> 
> a lot of us are busy with meltdown fixes and a full component test takes 
about the 72 hours that we have for our voting, I propose to extend the vote 
period until at least Monday.

+1

I wonder where this 72 hours windows come from... Is it just be or,
based on the amount of changes and "things" to test, I would like to
expect a window in the size of 7-14 days ...?

René



daan.hoogl...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

2018-01-17 Thread Tutkowski, Mike
Hi Lucian,

Thanks for the e-mail. I haven’t yet gotten around to trying suggestions from 
others, but I did run that script you pointed me to and then rebooted the user 
VM that was running on that host. Unfortunately, I see the same results: no 
specified hostname and no IP address for that VM.

In case you’re interested, here is the output from that script:

Stopping firewall and allowing all traffic ...

# Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
*raw
:PREROUTING ACCEPT [103:4120]
:OUTPUT ACCEPT [103:4120]
COMMIT
# Completed on Wed Jan 17 07:36:15 2018
# Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
*nat
:PREROUTING ACCEPT [2:133]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
# Completed on Wed Jan 17 07:36:15 2018
# Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
*mangle
:PREROUTING ACCEPT [259:10360]
:INPUT ACCEPT [259:10360]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [259:10360]
:POSTROUTING ACCEPT [259:10360]
COMMIT
# Completed on Wed Jan 17 07:36:15 2018
# Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
*filter
:INPUT ACCEPT [494:19760]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [494:19760]
COMMIT
# Completed on Wed Jan 17 07:36:15 2018

Done!

Thanks,
Mike

On 1/17/18, 2:04 AM, "Nux!"  wrote:

Mike, 

Run iptables-save on the hypervisor running an actual VM, from the rules 
above it looks like you are not running any (except system VMs). If you are 
running a VM there, then something seems horribly wrong with the security 
groups. 

Another way to check for firewall issues is to disable it altogether, not 
sure how Ubuntu handles that, but you can use this little script[1]. If once 
you do that your problems go away, then it's a firewall issue.

[1] - http://dl.nux.ro/utils/iptflush.sh

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Tutkowski, Mike" 
> To: "dev" 
> Sent: Tuesday, 16 January, 2018 20:31:23
> Subject: Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

> Hi,
> 
> Here is the results of iptables-save (ebtables-save appears not to be
> installed):
> 
> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
> *nat
> :PREROUTING ACCEPT [1914053:9571571583]
> :INPUT ACCEPT [206:3]
> :OUTPUT ACCEPT [4822:348457]
> :POSTROUTING ACCEPT [7039:610037]
> -A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
> -A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j 
MASQUERADE
> --to-ports 1024-65535
> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j 
MASQUERADE
> --to-ports 1024-65535
> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j MASQUERADE
> COMMIT
> # Completed on Tue Jan 16 13:23:25 2018
> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
> *mangle
> :PREROUTING ACCEPT [5214518:18468052456]
> :INPUT ACCEPT [2635017:8841915309]
> :FORWARD ACCEPT [214137:32291562]
> :OUTPUT ACCEPT [4343524:27594835296]
> :POSTROUTING ACCEPT [4558131:27627145644]
> -A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM 
--checksum-fill
> COMMIT
> # Completed on Tue Jan 16 13:23:25 2018
> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
> *filter
> :INPUT ACCEPT [884752:56694574]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [886649:47348857]
> :BF-cloudbr0 - [0:0]
> :BF-cloudbr0-IN - [0:0]
> :BF-cloudbr0-OUT - [0:0]
> :r-318-VM - [0:0]
> :s-316-VM - [0:0]
> :v-315-VM - [0:0]
> -A INPUT -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
> -A INPUT -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
> -A INPUT -i virbr0 -p udp -m udp --dport 67 -j ACCEPT
> -A INPUT -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT
> -A FORWARD -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate
> RELATED,ESTABLISHED -j ACCEPT
> -A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT
> -A FORWARD -i virbr0 -o virbr0 -j ACCEPT
> -A FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable
> -A FORWARD -i virbr0 -j REJECT --reject-with icmp-port-unreachable
> -A FORWARD -o cloudbr0 -m physdev --physdev-is-bridged -j BF-cloudbr0
> -A FORWARD -i cloudbr0 -m physdev --physdev-is-bridged -j BF-cloudbr0
> -A FORWARD -o cloudbr0 -j DROP
> -A FORWARD -i cloudbr0 -j DROP
> -A OUTPUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT
> -A BF-cloudbr0 -m state --state RELATED,ESTABLISHED -j ACCEPT
> -A BF-cloudbr0 -m physdev --physdev-is-in --physdev-is-bridged -j 
BF-cloudbr0-IN
> -A BF-cloudbr0 -m physdev --physdev-is-out --physdev-is-bridged -j
> BF-cloudbr0-OUT
> -A BF-cloudbr0 -m physdev --physdev-out eth0 --physdev-is-bri

Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

2018-01-17 Thread Tutkowski, Mike
The good part for 4.11 is that, per Rohit’s testing and comments, it seems like 
it’s just an environment misconfiguration that is leading to these results. 
That being the case, it’s not an issue we really need to be concerned with for 
the 4.11 release candidate.

On 1/17/18, 7:56 AM, "Tutkowski, Mike"  wrote:

Hi Lucian,

Thanks for the e-mail. I haven’t yet gotten around to trying suggestions 
from others, but I did run that script you pointed me to and then rebooted the 
user VM that was running on that host. Unfortunately, I see the same results: 
no specified hostname and no IP address for that VM.

In case you’re interested, here is the output from that script:

Stopping firewall and allowing all traffic ...

# Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
*raw
:PREROUTING ACCEPT [103:4120]
:OUTPUT ACCEPT [103:4120]
COMMIT
# Completed on Wed Jan 17 07:36:15 2018
# Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
*nat
:PREROUTING ACCEPT [2:133]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
# Completed on Wed Jan 17 07:36:15 2018
# Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
*mangle
:PREROUTING ACCEPT [259:10360]
:INPUT ACCEPT [259:10360]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [259:10360]
:POSTROUTING ACCEPT [259:10360]
COMMIT
# Completed on Wed Jan 17 07:36:15 2018
# Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
*filter
:INPUT ACCEPT [494:19760]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [494:19760]
COMMIT
# Completed on Wed Jan 17 07:36:15 2018

Done!

Thanks,
Mike

On 1/17/18, 2:04 AM, "Nux!"  wrote:

Mike, 

Run iptables-save on the hypervisor running an actual VM, from the 
rules above it looks like you are not running any (except system VMs). If you 
are running a VM there, then something seems horribly wrong with the security 
groups. 

Another way to check for firewall issues is to disable it altogether, 
not sure how Ubuntu handles that, but you can use this little script[1]. If 
once you do that your problems go away, then it's a firewall issue.

[1] - http://dl.nux.ro/utils/iptflush.sh

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Tutkowski, Mike" 
> To: "dev" 
> Sent: Tuesday, 16 January, 2018 20:31:23
> Subject: Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

> Hi,
> 
> Here is the results of iptables-save (ebtables-save appears not to be
> installed):
> 
> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
> *nat
> :PREROUTING ACCEPT [1914053:9571571583]
> :INPUT ACCEPT [206:3]
> :OUTPUT ACCEPT [4822:348457]
> :POSTROUTING ACCEPT [7039:610037]
> -A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
> -A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j 
MASQUERADE
> --to-ports 1024-65535
> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j 
MASQUERADE
> --to-ports 1024-65535
> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j MASQUERADE
> COMMIT
> # Completed on Tue Jan 16 13:23:25 2018
> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
> *mangle
> :PREROUTING ACCEPT [5214518:18468052456]
> :INPUT ACCEPT [2635017:8841915309]
> :FORWARD ACCEPT [214137:32291562]
> :OUTPUT ACCEPT [4343524:27594835296]
> :POSTROUTING ACCEPT [4558131:27627145644]
> -A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM 
--checksum-fill
> COMMIT
> # Completed on Tue Jan 16 13:23:25 2018
> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
> *filter
> :INPUT ACCEPT [884752:56694574]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [886649:47348857]
> :BF-cloudbr0 - [0:0]
> :BF-cloudbr0-IN - [0:0]
> :BF-cloudbr0-OUT - [0:0]
> :r-318-VM - [0:0]
> :s-316-VM - [0:0]
> :v-315-VM - [0:0]
> -A INPUT -i virbr0 -p udp -m udp --dport 53 -j ACCEPT
> -A INPUT -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT
> -A INPUT -i virbr0 -p udp -m udp --dport 67 -j ACCEPT
> -A INPUT -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT
> -A FORWARD -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate
> RELATED,ESTABLISHED -j ACCEPT
> -A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT
> -A FORWARD -i vir

Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

2018-01-17 Thread Tutkowski, Mike
If all of our testing were completely in an automated fashion, then I would 
agree that the 72-hour window is sufficient. However, we don’t have that kind 
of automated coverage and people aren’t always able to immediately begin 
testing things out like migrating from their version of CloudStack to the new 
one. That being the case, 72 hours does seem (at least for where we are now as 
a project in terms of automated testing coverage) a bit short.

On 1/17/18, 7:52 AM, "Daan Hoogland"  wrote:

The 72 hours is to make sure all stakeholders had a chance to glance. 
Testing is supposed to have happened before. We have a culture of testing only 
after RC-cut which is part of the problem. The long duration of a single test 
run takes, is another part. And finally, in this case there is the new mindblow 
called meltdown. I think in general we should try to keep the 72 hours but for 
this release it is not realistic.

On 17/01/2018, 15:48, "Rene Moser"  wrote:

On 01/17/2018 03:34 PM, Daan Hoogland wrote:
> People, People,
> 
> a lot of us are busy with meltdown fixes and a full component test 
takes about the 72 hours that we have for our voting, I propose to extend the 
vote period until at least Monday.

+1

I wonder where this 72 hours windows come from... Is it just be or,
based on the amount of changes and "things" to test, I would like to
expect a window in the size of 7-14 days ...?

René



daan.hoogl...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 





Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

2018-01-17 Thread Tutkowski, Mike
Or perhaps just the first RC should have a longer window?

On 1/17/18, 8:12 AM, "Tutkowski, Mike"  wrote:

If all of our testing were completely in an automated fashion, then I would 
agree that the 72-hour window is sufficient. However, we don’t have that kind 
of automated coverage and people aren’t always able to immediately begin 
testing things out like migrating from their version of CloudStack to the new 
one. That being the case, 72 hours does seem (at least for where we are now as 
a project in terms of automated testing coverage) a bit short.

On 1/17/18, 7:52 AM, "Daan Hoogland"  wrote:

The 72 hours is to make sure all stakeholders had a chance to glance. 
Testing is supposed to have happened before. We have a culture of testing only 
after RC-cut which is part of the problem. The long duration of a single test 
run takes, is another part. And finally, in this case there is the new mindblow 
called meltdown. I think in general we should try to keep the 72 hours but for 
this release it is not realistic.

On 17/01/2018, 15:48, "Rene Moser"  wrote:

On 01/17/2018 03:34 PM, Daan Hoogland wrote:
> People, People,
> 
> a lot of us are busy with meltdown fixes and a full component 
test takes about the 72 hours that we have for our voting, I propose to extend 
the vote period until at least Monday.

+1

I wonder where this 72 hours windows come from... Is it just be or,
based on the amount of changes and "things" to test, I would like to
expect a window in the size of 7-14 days ...?

René



daan.hoogl...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 







Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

2018-01-17 Thread Rohit Yadav
The 72hrs window is more of a guideline than a rule, without lazy consensus I 
don't think we've any choice here, so Monday it is.

Kris - thanks, if we need RC2 and your proposed issues are blocker/critical we 
can consider them so meanwhile engage with community to get them reviewed.

Bobby - can you attempt login in incognito mode or in a different browser after 
upgrading to 4.11 from 4.5, rule out caching issue?

Regards.

Get Outlook for Android


From: Tutkowski, Mike 
Sent: Wednesday, January 17, 2018 8:48:28 PM
To: dev@cloudstack.apache.org
Subject: Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

Or perhaps just the first RC should have a longer window?

On 1/17/18, 8:12 AM, "Tutkowski, Mike"  wrote:

If all of our testing were completely in an automated fashion, then I would 
agree that the 72-hour window is sufficient. However, we don’t have that kind 
of automated coverage and people aren’t always able to immediately begin 
testing things out like migrating from their version of CloudStack to the new 
one. That being the case, 72 hours does seem (at least for where we are now as 
a project in terms of automated testing coverage) a bit short.

On 1/17/18, 7:52 AM, "Daan Hoogland"  wrote:

The 72 hours is to make sure all stakeholders had a chance to glance. 
Testing is supposed to have happened before. We have a culture of testing only 
after RC-cut which is part of the problem. The long duration of a single test 
run takes, is another part. And finally, in this case there is the new mindblow 
called meltdown. I think in general we should try to keep the 72 hours but for 
this release it is not realistic.

On 17/01/2018, 15:48, "Rene Moser"  wrote:

On 01/17/2018 03:34 PM, Daan Hoogland wrote:
> People, People,
>
> a lot of us are busy with meltdown fixes and a full component 
test takes about the 72 hours that we have for our voting, I propose to extend 
the vote period until at least Monday.

+1

I wonder where this 72 hours windows come from... Is it just be or,
based on the amount of changes and "things" to test, I would like to
expect a window in the size of 7-14 days ...?

René



daan.hoogl...@shapeblue.com
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue








rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



Re: HA issues

2018-01-17 Thread Nux!
Hi Rohit,

I've reinstalled and tested. Still no go with VM HA.

What I did was to kernel panic that particular HV ("echo c > 
/proc/sysrq-trigger" <- this is a proper way to simulate a crash).
What happened next is the HV got marked as "Alert", the VM on it was all the 
time marked as "Running" and it was not migrated to another HV.
Once the panicked HV has booted back the VM reboots and becomes available.

I'm running on CentOS 7 mgmt + HVs and NFS primary and secondary storage. The 
VM has HA enabled service offering.
Host HA or OOBM configuration was not touched.

Full log http://tmp.nux.ro/W3s-management-server.log

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Rohit Yadav" 
> To: "dev" 
> Sent: Wednesday, 17 January, 2018 12:13:33
> Subject: Re: HA issues

> I performed VM HA sanity checks and was not able to reproduce any regression
> against two KVM CentOS7 hosts in a cluster.
> 
> 
> Without the "Host HA" feature, I deployed few HA-enabled VMs on a KVM host2 
> and
> killed it (powered off). After few minutes of CloudStack attempting to find 
> why
> the host (kvm agent) timed out, CloudStack kicked investigators, that
> eventually led KVM fencers to work and VM HA job kicked to start those few VMs
> on host1 and the KVM host2 was put to "Down" state.
> 
> 
> - Rohit
> 
> 
> 
> 
> 
> 
> 
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>  
> 
> 
> From: Rohit Yadav
> Sent: Wednesday, January 17, 2018 2:39:19 PM
> To: dev
> Subject: Re: HA issues
> 
> 
> Hi Lucian,
> 
> 
> The "Host HA" feature is entirely different from VM HA, however, they may work
> in tandem, so please stop using the terms interchangeably as it may cause the
> community to believe a regression has been caused.
> 
> 
> The "Host HA" feature currently ships with only "Host HA" provider for KVM 
> that
> is strictly tied to out-of-band management (IPMI for fencing, i.e power off 
> and
> recovery, i.e. reboot) and NFS (as primary storage). (We also have a provider
> for simulator, but that's for coverage/testing purposes).
> 
> 
> Therefore, "Host HA" for KVM (+nfs) currently works only when OOBM is enabled.
> The frameowkr allows interested parties may write their own HA providers for a
> hypervisor that can use a different strategy/mechanism for fencing/recovery of
> hosts (including write a non-IPMI based OOBM plugin) and host/disk activity
> checker that is non-NFS based.
> 
> 
> The "Host HA" feature ships disabled by default and does not cause any
> interference with VM HA. However, when enabled and configured correctly, it is
> a known limitation that when it is unable to successfully perform recovery or
> fencing tasks it may not trigger VM HA. We can discuss how to handle such 
> cases
> (thoughts?). "Host HA" would try couple of times to recover and failing to do
> so, it would eventually trigger a host fencing task. If it's unable to fence a
> host, it will indefinitely attempt to fence the host (the host state will be
> stuck at fencing state in cloud.ha_config table for example) and alerts will 
> be
> sent to admin who can do some manual intervention to handle such situations 
> (if
> you've email/smtp enabled, you should see alert emails).
> 
> 
> We can discuss how to improve and have a workaround for the case you've hit,
> thanks for sharing.
> 
> 
> - Rohit
> 
> 
> From: Nux! 
> Sent: Tuesday, January 16, 2018 10:42:35 PM
> To: dev
> Subject: Re: HA issues
> 
> Ok, reinstalled and re-tested.
> 
> What I've learned:
> 
> - HA only works now if OOB is configured, the old way HA no longer applies -
> this can be good and bad, not everyone has IPMIs
> 
> - HA only works if IPMI is reachable. I've pulled the cord on a HV and HA 
> failed
> to do its thing, leaving me with a HV down along with all the VMs running
> there. That's bad.
> I've opened this ticket for it:
> https://issues.apache.org/jira/browse/CLOUDSTACK-10234
> 
> Let me know if you need any extra info or stuff to test.
> 
> Regards,
> Lucian
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro
> 
> - Original Message -
>> From: "Nux!" 
>> To: "dev" 
>> Sent: Tuesday, 16 January, 2018 11:35:58
>> Subject: Re: HA issues
> 
>> I'll reinstall my setup and try again, just to be sure I'm working on a clean
>> slate.
>>
>> --
>> Sent from the Delta quadrant using Borg technology!
>>
>> Nux!
>> www.nux.ro
>>
>> - Original Message -
>>> From: "Rohit Yadav" 
>>> To: "dev" 
>>> Sent: Tuesday, 16 January, 2018 11:29:51
>>> Subject: Re: HA issues
>>
>>> Hi Lucian,
>>>
>>>
>>> If you're talking about the new HostHA feature (with KVM+nfs+ipmi), please 
>>> refer
>>> to following docs:
>>>
>>> http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/latest/hosts.html#out-of-band-managem

Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

2018-01-17 Thread Nux!
Mike,

Ok, at least we can rule out hypervisor firewall side, the problem in your 
particular case may be with the VR then, but if you feel further testing is not 
warranted then that's fine.

Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Tutkowski, Mike" 
> To: "dev" 
> Sent: Wednesday, 17 January, 2018 15:08:21
> Subject: Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

> The good part for 4.11 is that, per Rohit’s testing and comments, it seems 
> like
> it’s just an environment misconfiguration that is leading to these results.
> That being the case, it’s not an issue we really need to be concerned with for
> the 4.11 release candidate.
> 
> On 1/17/18, 7:56 AM, "Tutkowski, Mike"  wrote:
> 
>Hi Lucian,
>
>Thanks for the e-mail. I haven’t yet gotten around to trying suggestions 
> from
>others, but I did run that script you pointed me to and then rebooted the 
> user
>VM that was running on that host. Unfortunately, I see the same results: no
>specified hostname and no IP address for that VM.
>
>In case you’re interested, here is the output from that script:
>
>Stopping firewall and allowing all traffic ...
>
># Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
>*raw
>:PREROUTING ACCEPT [103:4120]
>:OUTPUT ACCEPT [103:4120]
>COMMIT
># Completed on Wed Jan 17 07:36:15 2018
># Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
>*nat
>:PREROUTING ACCEPT [2:133]
>:INPUT ACCEPT [0:0]
>:OUTPUT ACCEPT [0:0]
>:POSTROUTING ACCEPT [0:0]
>COMMIT
># Completed on Wed Jan 17 07:36:15 2018
># Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
>*mangle
>:PREROUTING ACCEPT [259:10360]
>:INPUT ACCEPT [259:10360]
>:FORWARD ACCEPT [0:0]
>:OUTPUT ACCEPT [259:10360]
>:POSTROUTING ACCEPT [259:10360]
>COMMIT
># Completed on Wed Jan 17 07:36:15 2018
># Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
>*filter
>:INPUT ACCEPT [494:19760]
>:FORWARD ACCEPT [0:0]
>:OUTPUT ACCEPT [494:19760]
>COMMIT
># Completed on Wed Jan 17 07:36:15 2018
>
>Done!
>
>Thanks,
>Mike
>
>On 1/17/18, 2:04 AM, "Nux!"  wrote:
>
>Mike,
>
>Run iptables-save on the hypervisor running an actual VM, from the 
> rules above
>it looks like you are not running any (except system VMs). If you are 
> running a
>VM there, then something seems horribly wrong with the security groups.
>
>Another way to check for firewall issues is to disable it altogether, 
> not sure
>how Ubuntu handles that, but you can use this little script[1]. If 
> once you do
>that your problems go away, then it's a firewall issue.
>
>[1] - http://dl.nux.ro/utils/iptflush.sh
>
>--
>Sent from the Delta quadrant using Borg technology!
>
>Nux!
>www.nux.ro
>
>- Original Message -
>> From: "Tutkowski, Mike" 
>> To: "dev" 
>> Sent: Tuesday, 16 January, 2018 20:31:23
>> Subject: Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address
>
>> Hi,
>> 
>> Here is the results of iptables-save (ebtables-save appears not to be
>> installed):
>> 
>> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
>> *nat
>> :PREROUTING ACCEPT [1914053:9571571583]
>> :INPUT ACCEPT [206:3]
>> :OUTPUT ACCEPT [4822:348457]
>> :POSTROUTING ACCEPT [7039:610037]
>> -A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
>> -A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
>> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j 
> MASQUERADE
>> --to-ports 1024-65535
>> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j 
> MASQUERADE
>> --to-ports 1024-65535
>> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j 
> MASQUERADE
>> COMMIT
>> # Completed on Tue Jan 16 13:23:25 2018
>> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
>> *mangle
>> :PREROUTING ACCEPT [5214518:18468052456]
>> :INPUT ACCEPT [2635017:8841915309]
>> :FORWARD ACCEPT [214137:32291562]
>> :OUTPUT ACCEPT [4343524:27594835296]
>> :POSTROUTING ACCEPT [4558131:27627145644]
>> -A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM 
> --checksum-fill
>> COMMIT
>> # Completed on Tue Jan 16 13:23:25 2018
>> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
>> *filter
>> :INPUT ACCEPT [884752:56694574]
>> :FORWARD ACCEPT [0:0]
>> :OUTPUT ACCEPT [886649:47348857]
> 

Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

2018-01-17 Thread Boris Stoyanov
Yes Rohit, tried other browser and I’m not able to login..

I’m +1 on the extend but unfortunately -1 cause of this blocker.

Bobby.


boris.stoya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

On 17 Jan 2018, at 18:24, Rohit Yadav 
mailto:rohit.ya...@shapeblue.com>> wrote:

The 72hrs window is more of a guideline than a rule, without lazy consensus I 
don't think we've any choice here, so Monday it is.

Kris - thanks, if we need RC2 and your proposed issues are blocker/critical we 
can consider them so meanwhile engage with community to get them reviewed.

Bobby - can you attempt login in incognito mode or in a different browser after 
upgrading to 4.11 from 4.5, rule out caching issue?

Regards.

Get Outlook for Android


From: Tutkowski, Mike 
mailto:mike.tutkow...@netapp.com>>
Sent: Wednesday, January 17, 2018 8:48:28 PM
To: dev@cloudstack.apache.org
Subject: Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

Or perhaps just the first RC should have a longer window?

On 1/17/18, 8:12 AM, "Tutkowski, Mike" 
mailto:mike.tutkow...@netapp.com>> wrote:

   If all of our testing were completely in an automated fashion, then I would 
agree that the 72-hour window is sufficient. However, we don’t have that kind 
of automated coverage and people aren’t always able to immediately begin 
testing things out like migrating from their version of CloudStack to the new 
one. That being the case, 72 hours does seem (at least for where we are now as 
a project in terms of automated testing coverage) a bit short.

   On 1/17/18, 7:52 AM, "Daan Hoogland" 
mailto:daan.hoogl...@shapeblue.com>> wrote:

   The 72 hours is to make sure all stakeholders had a chance to glance. 
Testing is supposed to have happened before. We have a culture of testing only 
after RC-cut which is part of the problem. The long duration of a single test 
run takes, is another part. And finally, in this case there is the new mindblow 
called meltdown. I think in general we should try to keep the 72 hours but for 
this release it is not realistic.

   On 17/01/2018, 15:48, "Rene Moser" 
mailto:m...@renemoser.net>> wrote:

   On 01/17/2018 03:34 PM, Daan Hoogland wrote:
People, People,

a lot of us are busy with meltdown fixes and a full component test takes about 
the 72 hours that we have for our voting, I propose to extend the vote period 
until at least Monday.

   +1

   I wonder where this 72 hours windows come from... Is it just be or,
   based on the amount of changes and "things" to test, I would like to
   expect a window in the size of 7-14 days ...?

   René



   daan.hoogl...@shapeblue.com
   
www.shapeblue.com>
   53 Chandos Place, Covent Garden, London  WC2N 4HSUK
   @shapeblue








rohit.ya...@shapeblue.com
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue



Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address

2018-01-17 Thread Tutkowski, Mike
Once I run through the rest of my testing for the release candidate, I will 
turn my attention back to this issue. Thanks!

> On Jan 17, 2018, at 10:53 AM, Nux!  wrote:
> 
> Mike,
> 
> Ok, at least we can rule out hypervisor firewall side, the problem in your 
> particular case may be with the VR then, but if you feel further testing is 
> not warranted then that's fine.
> 
> Lucian
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro
> 
> - Original Message -
>> From: "Tutkowski, Mike" 
>> To: "dev" 
>> Sent: Wednesday, 17 January, 2018 15:08:21
>> Subject: Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address
> 
>> The good part for 4.11 is that, per Rohit’s testing and comments, it seems 
>> like
>> it’s just an environment misconfiguration that is leading to these results.
>> That being the case, it’s not an issue we really need to be concerned with 
>> for
>> the 4.11 release candidate.
>> 
>> On 1/17/18, 7:56 AM, "Tutkowski, Mike"  wrote:
>> 
>>   Hi Lucian,
>> 
>>   Thanks for the e-mail. I haven’t yet gotten around to trying suggestions 
>> from
>>   others, but I did run that script you pointed me to and then rebooted the 
>> user
>>   VM that was running on that host. Unfortunately, I see the same results: no
>>   specified hostname and no IP address for that VM.
>> 
>>   In case you’re interested, here is the output from that script:
>> 
>>   Stopping firewall and allowing all traffic ...
>> 
>>   # Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
>>   *raw
>>   :PREROUTING ACCEPT [103:4120]
>>   :OUTPUT ACCEPT [103:4120]
>>   COMMIT
>>   # Completed on Wed Jan 17 07:36:15 2018
>>   # Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
>>   *nat
>>   :PREROUTING ACCEPT [2:133]
>>   :INPUT ACCEPT [0:0]
>>   :OUTPUT ACCEPT [0:0]
>>   :POSTROUTING ACCEPT [0:0]
>>   COMMIT
>>   # Completed on Wed Jan 17 07:36:15 2018
>>   # Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
>>   *mangle
>>   :PREROUTING ACCEPT [259:10360]
>>   :INPUT ACCEPT [259:10360]
>>   :FORWARD ACCEPT [0:0]
>>   :OUTPUT ACCEPT [259:10360]
>>   :POSTROUTING ACCEPT [259:10360]
>>   COMMIT
>>   # Completed on Wed Jan 17 07:36:15 2018
>>   # Generated by iptables-save v1.4.21 on Wed Jan 17 07:36:15 2018
>>   *filter
>>   :INPUT ACCEPT [494:19760]
>>   :FORWARD ACCEPT [0:0]
>>   :OUTPUT ACCEPT [494:19760]
>>   COMMIT
>>   # Completed on Wed Jan 17 07:36:15 2018
>> 
>>   Done!
>> 
>>   Thanks,
>>   Mike
>> 
>>   On 1/17/18, 2:04 AM, "Nux!"  wrote:
>> 
>>   Mike,
>> 
>>   Run iptables-save on the hypervisor running an actual VM, from the 
>> rules above
>>   it looks like you are not running any (except system VMs). If you are 
>> running a
>>   VM there, then something seems horribly wrong with the security groups.
>> 
>>   Another way to check for firewall issues is to disable it altogether, 
>> not sure
>>   how Ubuntu handles that, but you can use this little script[1]. If 
>> once you do
>>   that your problems go away, then it's a firewall issue.
>> 
>>   [1] - http://dl.nux.ro/utils/iptflush.sh
>> 
>>   --
>>   Sent from the Delta quadrant using Borg technology!
>> 
>>   Nux!
>>   www.nux.ro
>> 
>>   - Original Message -
>>> From: "Tutkowski, Mike" 
>>> To: "dev" 
>>> Sent: Tuesday, 16 January, 2018 20:31:23
>>> Subject: Re: 4.11 RC1 KVM Issue: Incorrect hostname/no IP address
>> 
>>> Hi,
>>> 
>>> Here is the results of iptables-save (ebtables-save appears not to be
>>> installed):
>>> 
>>> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
>>> *nat
>>> :PREROUTING ACCEPT [1914053:9571571583]
>>> :INPUT ACCEPT [206:3]
>>> :OUTPUT ACCEPT [4822:348457]
>>> :POSTROUTING ACCEPT [7039:610037]
>>> -A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
>>> -A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
>>> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j 
>>> MASQUERADE
>>> --to-ports 1024-65535
>>> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j 
>>> MASQUERADE
>>> --to-ports 1024-65535
>>> -A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j MASQUERADE
>>> COMMIT
>>> # Completed on Tue Jan 16 13:23:25 2018
>>> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
>>> *mangle
>>> :PREROUTING ACCEPT [5214518:18468052456]
>>> :INPUT ACCEPT [2635017:8841915309]
>>> :FORWARD ACCEPT [214137:32291562]
>>> :OUTPUT ACCEPT [4343524:27594835296]
>>> :POSTROUTING ACCEPT [4558131:27627145644]
>>> -A POSTROUTING -o virbr0 -p udp -m udp --dport 68 -j CHECKSUM 
>>> --checksum-fill
>>> COMMIT
>>> # Completed on Tue Jan 16 13:23:25 2018
>>> # Generated by iptables-save v1.4.21 on Tue Jan 16 13:23:25 2018
>>> *filter
>>> :INPUT ACCEPT [884752:56694574]
>>> :FORWARD ACCEPT [0:0]
>>> :OUTPUT ACCEPT [886649:47348857]
>>> :BF-cloudbr0 - [0:0]
>>> :BF-cloudbr0-IN - [0:0]
>>> :BF-cloudbr0-OUT - [0:0]
>>> :r-318-VM - [0:0]

Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

2018-01-17 Thread Nux!
The extension is welcome!

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Boris Stoyanov" 
> To: "dev" 
> Sent: Wednesday, 17 January, 2018 18:24:20
> Subject: Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)

> Yes Rohit, tried other browser and I’m not able to login..
> 
> I’m +1 on the extend but unfortunately -1 cause of this blocker.
> 
> Bobby.
> 
> 
> boris.stoya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>  
> 
> 
> On 17 Jan 2018, at 18:24, Rohit Yadav
> mailto:rohit.ya...@shapeblue.com>> wrote:
> 
> The 72hrs window is more of a guideline than a rule, without lazy consensus I
> don't think we've any choice here, so Monday it is.
> 
> Kris - thanks, if we need RC2 and your proposed issues are blocker/critical we
> can consider them so meanwhile engage with community to get them reviewed.
> 
> Bobby - can you attempt login in incognito mode or in a different browser 
> after
> upgrading to 4.11 from 4.5, rule out caching issue?
> 
> Regards.
> 
> Get Outlook for Android
> 
> 
> From: Tutkowski, Mike
> mailto:mike.tutkow...@netapp.com>>
> Sent: Wednesday, January 17, 2018 8:48:28 PM
> To: dev@cloudstack.apache.org
> Subject: Re: [VOTE] Apache Cloudstack 4.11.0.0 (LTS)
> 
> Or perhaps just the first RC should have a longer window?
> 
> On 1/17/18, 8:12 AM, "Tutkowski, Mike"
> mailto:mike.tutkow...@netapp.com>> wrote:
> 
>   If all of our testing were completely in an automated fashion, then I would
>   agree that the 72-hour window is sufficient. However, we don’t have that 
> kind
>   of automated coverage and people aren’t always able to immediately begin
>   testing things out like migrating from their version of CloudStack to the 
> new
>   one. That being the case, 72 hours does seem (at least for where we are now 
> as
>   a project in terms of automated testing coverage) a bit short.
> 
>   On 1/17/18, 7:52 AM, "Daan Hoogland"
>   mailto:daan.hoogl...@shapeblue.com>> wrote:
> 
>   The 72 hours is to make sure all stakeholders had a chance to glance. 
> Testing is
>   supposed to have happened before. We have a culture of testing only 
> after
>   RC-cut which is part of the problem. The long duration of a single test 
> run
>   takes, is another part. And finally, in this case there is the new 
> mindblow
>   called meltdown. I think in general we should try to keep the 72 hours 
> but for
>   this release it is not realistic.
> 
>   On 17/01/2018, 15:48, "Rene Moser"
>   mailto:m...@renemoser.net>> wrote:
> 
>   On 01/17/2018 03:34 PM, Daan Hoogland wrote:
> People, People,
> 
> a lot of us are busy with meltdown fixes and a full component test takes about
> the 72 hours that we have for our voting, I propose to extend the vote period
> until at least Monday.
> 
>   +1
> 
>   I wonder where this 72 hours windows come from... Is it just be or,
>   based on the amount of changes and "things" to test, I would like to
>   expect a window in the size of 7-14 days ...?
> 
>   René
> 
> 
> 
>   daan.hoogl...@shapeblue.com
>   
> www.shapeblue.com>
>   53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>   @shapeblue
> 
> 
> 
> 
> 
> 
> 
> 
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue