Re: Ansible deploy cloudstack and add zone

2019-07-24 Thread Jean-Francois Nadeau
Hi Li,

We are using Ansible to build and manage capacity of our cloudstack
environments end to end.  From setting up zones, projects, networks,
compute offering, pods, clusters and hosts.

Make sure to use current Ansible 2.8 or newer.   See
https://docs.ansible.com/ansible/latest/modules/list_of_cloud_modules.html

regards,

Jfn

On Wed, Jul 24, 2019 at 12:23 PM li jerry  wrote:

> Hello All
>
> I saw in the community documentation that you can deploy a set of
> cloudstacks via ansbile.
> Is there any way to quickly initialize CLOUDSTACK (create ZONE, add host,
> etc.) through ansbile or other tools?
>


Re: Using S3/Minio as the only secondary storage

2019-07-17 Thread Jean-Francois Nadeau
Thanks Will,

I remember having the discussion with Pierre-Luc on his use of Swift for
templates.  I was curious about the differences on S3 vs Swift for SS since
looking at the CS UI when it comes to setting up an S3 image store... the
NFS staging is optional.  And this make sense to me if your object storage
is fast and accessible locally,  why the need for staging/caching.The
documentation could mention if it is possible to use S3 secondary and
nothing else,  starting with if SSVM templates can be uploaded to a
bucket.I will certainly ask Syed later today :)

best

Jfn

On Wed, Jul 17, 2019 at 6:59 AM Will Stevens  wrote:

> Hey JF,
> We use the Swift object store as the storage backend for secondary
> storage.  I have not tried the S3 integration, but the last time I looked
> at the code for this (admittedly, a long time ago) the Swift and s3 logic
> was more intertwined than I liked. The CloudOps/cloud.ca team had to do a
> lot of work to get the Swift integration to a reasonable working state. I
> believe all of our changes have been upstreamed quite some time ago. I
> don't know if anyone is doing this for the S3 implementation.
>
> I can't speak to the S3 implementation because I have not looked at it in a
> very long time, but the Swift implementation requires a "temporary NFS
> staging area" that essentially acts kind of like a buffer between the
> object store and primary storage when templates and such are used by the
> hosts.
>
> I think Pierre-Luc and Syed have a clearer picture of all the moving
> pieces, but that is a quick summary of what I know without digging in.
>
> Hope that helps.
>
> Cheers,
>
> Will
>
> On Tue, Jul 16, 2019, 10:24 PM Jean-Francois Nadeau <
> the.jfnad...@gmail.com>
> wrote:
>
> > Hello Everyone,
> >
> > I was wondering if it was common or even recommended to use an S3
> > compatible storage system as the only secondary storage provider ?
> >
> > The environment is 4.11.3.0 with KVM (Centos 7.6),  and our tier1 storage
> > solution also provides an S3 compatible object store (apparently Minio
> > under the hood).
> >
> > I have always used NFS to install the SSVM templates and the install
> script
> > (cloud-install-sys-tmplt) only takes a mount point.  How, if possible,
> > would I proceed with S3 only storage ?
> >
> > best,
> >
> > Jean-Francois
> >
>


Using S3/Minio as the only secondary storage

2019-07-16 Thread Jean-Francois Nadeau
Hello Everyone,

I was wondering if it was common or even recommended to use an S3
compatible storage system as the only secondary storage provider ?

The environment is 4.11.3.0 with KVM (Centos 7.6),  and our tier1 storage
solution also provides an S3 compatible object store (apparently Minio
under the hood).

I have always used NFS to install the SSVM templates and the install script
(cloud-install-sys-tmplt) only takes a mount point.  How, if possible,
would I proceed with S3 only storage ?

best,

Jean-Francois


Help with configDrive label issue/discrepancy

2019-05-02 Thread Jean-Francois Nadeau
Good morning all,

I have this strange configDrive problem that shows up on 4.11.2 and only if
upgraded from 4.9.

This problems shows up with the DefaultL2NetworkOfferingConfigDriveVlan
network offering

In lab and a fresh 4.11.2.0 install,  the configDrive enabled network will
present an ISO to guest VMs with the expected "config-2" label and
cloud-init works just fine:

Ie:
# blkid /dev/sr1
/dev/sr1: UUID="2019-04-26-20-57-31-00" LABEL="config-2" TYPE="iso9660"

On our production 4.11.2 upgraded months ago from 4.9.3 (but running with
the same 4.11.2.0 systemvm templates as the lab),   here's the configdrive
ISO label with the same network offering:

# blkid /dev/sr1
/dev/sr1: UUID="2019-05-01-21-30-20-00" LABEL="config" TYPE="iso9660"

In this case cloud-init doesn't identify this label as a configDrive DS and
fail to apply user-data.

Im trying to find where this happens and can be changed.  Any clue anyone ?

best,

Jfn


SSVM, templates and managed storage (iscsi/KVM)... how does it work ?

2019-03-03 Thread Jean-Francois Nadeau
Hi all,

Im kicking the tires with managed storage with under 4.11.2 with KVM and
Datera as primary storage.

My first attempt at creating a VM from a template stored on NFS secondary
failed silently. Looking at the SSVM cloud logs I saw no exception.  The VM
root disks gets properly created on the backend and attached on the KVM
host but the block device is blank.  Somehow the template did not get
copied over.

Starting troubleshooting from this point... I realize I don't understand
how this work vs what Im used to with NFS as both primary and secondary
storage.

I presume the SSVM has to copy the qcow2 template from the NFS secondary to
the primary storage but this one is iscsi now... and I did not setup
initiator access to the SSVM or found instructions I need to do that.

Can someone fill the blank on to how this work ?

thanks all,

Jean-Francois


Re: L2+ConfigDrive

2019-02-12 Thread Jean-Francois Nadeau
Paul,  Is it true to say that parameter can only be enabled if you have NFS
primary storage enabled in the zone ?Or is there any chance this works
with managed storage ?

best,

Jfn

On Tue, Feb 12, 2019 at 4:29 AM Paul Angus  wrote:

> Great to hear that, thanks for letting us know!
>
> paul.an...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
>
> -Original Message-
> From: Piotr Pisz 
> Sent: 12 February 2019 06:54
> To: users@cloudstack.apache.org
> Subject: RE: L2+ConfigDrive
>
> Hi Angus,
>
> After changing the parameter vm.configdrive.primarypool.enabled to false
> everything works fine for the RBD pool.
> Thank you for your help :-)
>
> Regards,
> Piotr
>
>
> -Original Message-
> From: Paul Angus 
> Sent: Monday, February 11, 2019 7:31 PM
> To: users@cloudstack.apache.org; pi...@piszki.pl
> Subject: RE: L2+ConfigDrive
>
> In the global settings it is possible to specify whether you wish the
> config drive to be located on either primary or secondary storage. Perhaps
> you could check that the iso is being served from secondary storage, that
> was Ceph wouldn’t be involved.
> If it still doesn't work, please could you create a github issue for the
> problem.
>
> Kind regards
>
>
> Paul Angus
>
> paul.an...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK @shapeblue
>
>
>
>
> -Original Message-
> From: Piotr Pisz 
> Sent: 11 February 2019 09:03
> To: users@cloudstack.apache.org
> Subject: RE: L2+ConfigDrive
>
> Hey Angus,
>
> My ACS is 4.11.2 with advanced network on KVM, config drive is enabled as
> network provider.
> I have done more tests, I have two ACS installations, one has Primary
> Storage as Ceph RBD and the other as SharedMountPoint.
> The error is for the RBD pool (see log), everything works fine for SMP.
>
> Piotr
>
>
> -Original Message-
> From: Paul Angus 
> Sent: Monday, February 11, 2019 9:25 AM
> To: users@cloudstack.apache.org; pi...@piszki.pl
> Subject: RE: L2+ConfigDrive
>
> Hi Piotr,
>
> Which version of CloudStack are you looking at?
> And, have you checked that configdrive is enabled as a network provider?
>
> paul.an...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK @shapeblue
>
>
>
>
> -Original Message-
> From: Piotr Pisz 
> Sent: 10 February 2019 16:04
> To: users@cloudstack.apache.org
> Subject: L2+ConfigDrive
>
> Hello!
>
> I am interested in using Config Drive with the L2 network, unfortunately
> there is no word in the documentation for this topic.
> Has anyone used this and can guide me?
> Service offering L2 + ConfigDrive does not work, it is clearly missing (?)
> of the iso file.
> I have an error: Unable to start VM instance.
>
> Regards,
> Piotr
>
>
>


Re: Hyper-V with ACS

2018-11-20 Thread Jean-Francois Nadeau
We did a quick test with HyperV 2016 under 4.9.3 and some APIs changed in
hyperv we believed prevented us to deploy a zone correctly.  We did not
investigate further.

On Tue, Nov 20, 2018 at 6:18 AM Andrija Panic 
wrote:

> Hi all,
>
> anyone has experience with running Hyper-V with CloudStack, what is feature
> set supported (or more importantly not supported), what versions actually
> work (HyperV 2016 or not), etc.
>
> Any info, would be appreciated.
>
> --
>
> Andrija Panić
>


Re: Runing cloudstack failed in docker

2018-11-05 Thread Jean-Francois Nadeau
Just by the look of the "Protocol family unavailable" error  can you
disable IPv6 in the JVm startup script ?

On Mon, Nov 5, 2018 at 4:37 AM li li  wrote:

> Hi ALL
>
>
> I'm trying to encapsulate cloudstack 4.11 into docker. After build is
> successful, cloudstack-management cannot function properly.
>
> Can someone help me? Thank you very much.
>
>
> Dockerfile:
>
>
> https://github.com/apache/cloudstack/blob/4.11/tools/docker/Dockerfile.centos6
>
>
> from cloudstack-management.err Error:
>
> 05/11/2018 09:07:39 128 jsvc.exec error: Cannot start daemon
> 05/11/2018 09:07:39 126 jsvc.exec error: Service exit with a return value
> of 5
> java.lang.reflect.InvocationTargetException
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:498)
>at
> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:241)
> Caused by: java.net.SocketException: Protocol family unavailable
>at sun.nio.ch.Net.bind0(Native Method)
>at sun.nio.ch.Net.bind(Net.java:433)
>at sun.nio.ch.Net.bind(Net.java:425)
>at sun.nio.ch
> .ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>at
> org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:334)
>at
> org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:302)
>at
> org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
>at
> org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:238)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>at org.eclipse.jetty.server.Server.doStart(Server.java:397)
>at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>at org.apache.cloudstack.ServerDaemon.start(ServerDaemon.java:200)
>... 5 more
> OpenJDK 64-Bit Server VM warning: ignoring option PermSize=512M; support
> was removed in 8.0
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=800m;
> support was removed in 8.0
>
>
> From management-server.log:
>
> 2018-11-05 09:24:14,967 INFO  [o.a.c.s.m.m.i.DefaultModuleDefinitionSet]
> (main:null) (logid:) Starting module [outofbandmanagement]
> 2018-11-05 09:24:14,967 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle]
> (main:null) (logid:) Starting CloudStack Components
> 2018-11-05 09:24:14,967 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle]
> (main:null) (logid:) Done Starting CloudStack Components
> 2018-11-05 09:24:14,967 INFO  [o.a.c.s.m.m.i.DefaultModuleDefinitionSet]
> (main:null) (logid:) Starting module [ipmitool]
> 2018-11-05 09:24:14,967 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle]
> (main:null) (logid:) Starting CloudStack Components
> 2018-11-05 09:24:14,977 DEBUG
> [o.a.c.o.d.i.IpmitoolOutOfBandManagementDriver] (main:null) (logid:)
> OutOfBandManagementDriver ipmitool initialized: ipmitool version 1.8.15
>
> 2018-11-05 09:24:14,977 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle]
> (main:null) (logid:) Done Starting CloudStack Components
> 2018-11-05 09:24:14,977 INFO  [o.a.c.s.m.m.i.DefaultModuleDefinitionSet]
> (main:null) (logid:) Starting module [nested-cloudstack]
> 2018-11-05 09:24:14,977 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle]
> (main:null) (logid:) Starting CloudStack Components
> 2018-11-05 09:24:14,977 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle]
> (main:null) (logid:) Done Starting CloudStack Components
> 2018-11-05 09:24:14,978 INFO  [o.e.j.s.h.C.client] (main:null) (logid:)
> Initializing Spring root WebApplicationContext
> 2018-11-05 09:24:15,043 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle]
> (main:null) (logid:) Configuring CloudStack Components
> 2018-11-05 09:24:15,043 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle]
> (main:null) (logid:) Done Configuring CloudStack Components
> 2018-11-05 09:24:15,056 INFO  [c.c.u.LogUtils] (main:null) (logid:) log4j
> configuration found at /etc/cloudstack/management/log4j-cloud.xml
> 2018-11-05 09:24:15,072 INFO  [o.e.j.s.h.ContextHandler] (main:null)
> (logid:) Started o.e.j.w.WebAppContext@6b1274d2
> {/client,file:///usr/share/cloudstack-management/webapp/,AVAILABLE}{/usr/share/cloudstack-management/webapp}
> 2018-11-05 09:24:15,073 INFO  [o.e.j.s.h.ContextHandler] (main:null)
> (logid:) Started o.e.j.s.h.MovedContextHandler@5cdd8682{/,null,AVAILABLE}
>
>


Re: [VOTE] Apache CloudStack 4.11.2.0 RC4

2018-11-05 Thread Jean-Francois Nadeau
-1

Only because we believe this issue is a regression upgrading from 4.9.3.
Existing network offerings created under 4.9.3 should continue to work when
creating new networks under 4.11.2.  Please see
https://github.com/apache/cloudstack/issues/2989

best,

Jfn

On Mon, Nov 5, 2018 at 5:04 AM Boris Stoyanov 
wrote:

> +1
>
> I’ve done upgrade testing from 4.9 on a VMWare based environment and it
> went well, I think all the issues delivered with this RC are being verified
> and confirmed. I’ve also executed some general lifecycle tests around main
> components like (VMs, Networks, Volumes, Storages etc)
>
> Thanks,
> Bobby.
>
>
> boris.stoya...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
> On 4 Nov 2018, at 3:52, Rohit Yadav  rohit.ya...@shapeblue.com>> wrote:
>
> +1 (binding) based on automated smoketests on xenserver/vmware/kvm and
> manual tests on kvm+centos7 based Adv zone env. I did not check gpg
> signatures on the artifact this time.
>
> Regards.
>
> Get Outlook for Android
>
> 
> From: Andrija Panic  andrija.pa...@gmail.com>>
> Sent: Saturday, November 3, 2018 10:24:45 PM
> To: users
> Cc: dev; Paul Angus
> Subject: Re: [VOTE] Apache CloudStack 4.11.2.0 RC4
>
> Assuming I may vote:
>
> +1 from my side
>
> Tested:
> - building DEB packages for Ubuntu
> - advanced and basic zone deployment (KVM, clean install 4.11.2)
> - upgrade from 4.8.0.1 to 4.11.2
> - a bunch of integration tests done from in-house suite of tests (system
> and user tests) - all PASS, with exception that RAW templates are broken -
> there is already a GitHub issue from 4.11:
> https://github.com/apache/cloudstack/issues/2820
> - online and offline storage migration from NFS/CEPH to SolidFire
>
>
> Some issues I experienced (perhaps something local to me, but managed to
> reproduce it many times):
> Management/Agent on Ubuntu 14.04:
> When upgrading existing 4.8 installation to 4.11.2, init.d scripts were not
> created/overwritten nor I was asked if I want to replace or keep existing
> versions (like it's done with i.e. agent.properties, db.properties, etc...)
> so this seems like some packaging issue.
> Clean install (or in my problematic case - a complete uninstall and
> install) is working fine in regards to init.d scripts
>
> Cheers
> Andrija
>
>
>
>
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
> On Fri, 2 Nov 2018 at 16:36, Wido den Hollander  w...@widodh.nl>> wrote:
>
> +1 (binding)
>
> I've tested:
>
> - Building DEB packages for Ubuntu
> - Install DEB packages
> - Upgrade from 4.11.1 to 4.11.2
>
> Wido
>
> On 10/30/18 5:10 PM, Paul Angus wrote:
> Hi All,
>
> By popular demand, I've created a 4.11.2.0 release (RC4), with the
> following artefacts up for testing and a vote:
>
> Git Branch and Commit SH:
>
>
> https://gitbox.apache.org/repos/asf?p=cloudstack.git;a=shortlog;h=refs/heads/4.11.2.0-RC20181030T1040
> Commit: 840ad40017612e169665fa799a6d31a23ecad347
>
> Source release (checksums and signatures are available at the same
> location):
> https://dist.apache.org/repos/dist/dev/cloudstack/4.11.2.0/
>
> PGP release keys (signed using 8B309F7251EE0BC8):
> https://dist.apache.org/repos/dist/release/cloudstack/KEYS
>
> The vote will be open until Sunday 4th November.
>
> For sanity in tallying the vote, can PMC members please be sure to
> indicate "(binding)" with their vote?
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> Additional information:
>
> For users' convenience, I've built packages from
> 840ad40017612e169665fa799a6d31a23ecad347 and published RC4 repository here:
> http://packages.shapeblue.com/testing/41120rc4/
>
> The release notes are still work-in-progress, but the systemvm template
> upgrade section has been updated. You may refer the following for systemvm
> template upgrade testing:
>
>
> http://docs.cloudstack.apache.org/projects/cloudstack-release-notes/en/latest/index.html
>
> 4.11.2 systemvm templates are as before and available from here:
> http://packages.shapeblue.com/testing/systemvm/4112rc3
>
>
>
>
> Kind regards,
>
> Paul Angus
>
>
> paul.an...@shapeblue.com
> www.shapeblue.com http://www.shapeblue.com/>>
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
>
>
>
> --
>
> Andrija Panić
>
>


Re: Problem creating networks after 4.11 upgrade

2018-11-05 Thread Jean-Francois Nadeau
Hi Paul,

Thanks you on the precision about voting. I wasn't sure this was the right
place to rise blockers for targeted use cases. And in regards to the
issue, I do get the same error creating a shared network that is not scoped
to a project.

On Mon, Nov 5, 2018 at 8:09 AM Paul Angus  wrote:

> Hi Eric & Jean-Francois,
> Thanks for your work in testing.
> There is an open vote, could you now (and in future) respond to the
> thread, the official vote will/would pass as it stands. (I only caught this
> through doing a final sweep of the mailing lists).
>
> @Jean-Francois Nadeau is your error specifically related to creating
> shared networks in *projects* ?
>
> @eric please could you document the specific issues that you found, so
> that we can try to replicate and fix them
>
> Kind regards,
>
> Paul Angus
>
> paul.an...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
>
> -Original Message-
> From: Eric Lee Green 
> Sent: 04 November 2018 21:29
> To: users@cloudstack.apache.org
> Subject: Re: Problem creating networks after 4.11 upgrade
>
> Yeah, had all sorts of problems with custom network offerings after
> upgrading to 4.11.1, along with problems with launching virtual machines
> (every attempt to launch resulted in a "not enough resources" error),
> couldn't get virtual routers to come up for custom networks, etc. I didn't
> have time in my service window to do any detailed examinations of why they
> were failing, I just downgraded back to 4.9.2 before my service window
> ended. When 4.11 is stable, maybe I'll try upgrading to it again.
> (OS: Centos 7. Old version: 4.9.2. New version: 4.11.1. Hardware: Three
> compute servers with dual hexacore processors and 96gb+ of memory w/KVM.
> End result after two hours of trying to make it work: Downgrade back to
> 4.9.2.)
>
> I was thinking about migrating most of my other computer servers into the
> Cloudstack cloud because it's easier for my users to take care of their own
> resources, but I was hoping to do it after migrating to 4.11.
> I guess not.
>
> On 11/4/18 13:14, Jean-Francois Nadeau wrote:
>
> > I all,
> >
> > I was wondering if anyone else had this problem after upgrading from 4.9.
> >   All our networks are using a custom network offering with no services
> > defined since the physical network provides DHCP and DNS.   Environment
> is
> > CentOS 7, KVM with the openvswitch driver.
> >
> > Now after the upgrade to 4.11,  creating a network using that same
> > network offering fails with an  "Unable to convert network offering
> > with specified id to network profile" error.
> >
> > The issue is documented here:
> > https://github.com/apache/cloudstack/issues/2989
> >
> > I hope someone can have a look at it.  This is the last issue that
> > blocks us from upgrading.
> >
> > best,
> >
> > Jean-Francois
> >
>
>


Re: Problem creating networks after 4.11 upgrade

2018-11-05 Thread Jean-Francois Nadeau
Test was on 4.11.2rc3.  will send to dev list


Problem creating networks after 4.11 upgrade

2018-11-04 Thread Jean-Francois Nadeau
I all,

I was wondering if anyone else had this problem after upgrading from 4.9.
 All our networks are using a custom network offering with no services
defined since the physical network provides DHCP and DNS.   Environment is
CentOS 7, KVM with the openvswitch driver.

Now after the upgrade to 4.11,  creating a network using that same network
offering fails with an  "Unable to convert network offering with specified
id to network profile" error.

The issue is documented here:
https://github.com/apache/cloudstack/issues/2989

I hope someone can have a look at it.  This is the last issue that blocks
us from upgrading.

best,

Jean-Francois


Re: Host HA vs transient NFS problems on KVM

2018-10-23 Thread Jean-Francois Nadeau
We opened that one for the cloustack agent that closely relates to the
problem and change in behavior since 4.9:

https://github.com/apache/cloudstack/issues/2890

On Tue, Oct 23, 2018 at 2:26 PM Simon Weller 
wrote:

> JF,
>
>
> I suggest you open a github issue instead. It will get a lot more
> attention than Jira.
>
>
> - Si
>
>
> ____
> From: Jean-Francois Nadeau 
> Sent: Tuesday, October 23, 2018 11:32 AM
> To: users@cloudstack.apache.org
> Subject: Re: Host HA vs transient NFS problems on KVM
>
> I will fill a Jira for the issue.
>
> On Tue, Oct 23, 2018 at 11:36 AM ilya musayev <
> ilya.mailing.li...@gmail.com>
> wrote:
>
> > Would you please file the JIRA bugs describing in exact details
> >
> > 1) your setup
> > 2) what was done or happened
> > 3) expected result
> >
> > I imagine this will be fixed in the next point release if issues are
> indeed
> > correct. We’ve yet to try this framework and if it does not work as
> > anticipated we will have lots of issues.
> >
> >
> >
> > On Tue, Oct 23, 2018 at 8:30 AM Andrei Mikhailovsky
> >  wrote:
> >
> > > Hi Jean,
> > >
> > > I have previously done some HA testing and have pretty much came to
> > > similar conclusions as you have. My testing showed that using HA is
> very
> > > unreliable at best and data loosing at worst cases. I have had the
> > > following outcome from various testing scenarios:
> > >
> > > 1. Works as expected (very rarely)
> > > 2. Starts 2 vms on different hosts (data loss / corruption)
> > > 3. Reboots ALL KVM hosts (even those hosts that do not have a single vm
> > > with nfs volumes)
> > >
> > > Now, I can not justify having HA with even a slim chances of having 2
> or
> > 3
> > > above. Honestly, I do not know a single business that is happy to
> accept
> > > those scenarios. Frankly speaking, for me the cloudstack HA options
> > create
> > > more problems than solve and thus I've not enabled them. I have decided
> > > that ACS with KVM is not HA friendly, full stop. Having said this, I've
> > not
> > > tested the latest couple of releases, so I will give it a benefit of
> the
> > > doubt and wait for user's reports to prove my conclusion otherwise.
> I've
> > > wasted enough of my own time on KVM HA.
> > >
> > > My HA approach to ACS is more of a manual nature, which is far more
> > > reliable and is less prone to issues in my experience. I have a
> > monitoring
> > > system sending me alerts when VMs, host servers and storage become
> > > unreachable. It is not as convenient as a fully working automatic HA, I
> > > agree, but it is far better to be woken up at 3am to deal with
> > restarting a
> > > handful of vms and perhaps a KVM host force reboot than dealing with
> mass
> > > KVM hosts reboots and/or trying to find duplicate vms lurking somewhere
> > on
> > > the host servers. Been there, done that - NO THANKS!
> > >
> > > Cheers
> > >
> > > Andrei
> > >
> > > - Original Message -
> > > > From: "Jean-Francois Nadeau" 
> > > > To: "users" 
> > > > Sent: Monday, 22 October, 2018 22:13:35
> > > > Subject: Host HA vs transient NFS problems on KVM
> > >
> > > > Dear community,
> > > >
> > > > I want to share my concern upgrading from 4.9 to 4.11 in regards to
> how
> > > the
> > > > host HA framework works and the handling of various failure
> conditions.
> > > >
> > > > Since we have been running CS on 4.9.3 with NFS on KVM,  VM HA have
> > been
> > > > working as expected when hypervisor crashed and I agree we might
> > have
> > > > been lucky knowing the limitations of the KVM investigator and the
> > > > possibility to fire the same VM on 2 KVM hosts is real when you know
> > the
> > > > recipe for it.
> > > >
> > > > Still, on 4.9.3 we were tolerant to transient primary NFS storage
> > access
> > > > issues, typical of a network problem (and we've seen it lately for a
> 22
> > > > minutes disconnection).  Although these events are quite rare,  when
> > they
> > > > do happen their blast radius can be a huge impact on the business.
> > > >
> > > > So when we initially tested CS on 4.9.3 we purposely blocked access
> to
> > &g

Re: Host HA vs transient NFS problems on KVM

2018-10-23 Thread Jean-Francois Nadeau
I will fill a Jira for the issue.

On Tue, Oct 23, 2018 at 11:36 AM ilya musayev 
wrote:

> Would you please file the JIRA bugs describing in exact details
>
> 1) your setup
> 2) what was done or happened
> 3) expected result
>
> I imagine this will be fixed in the next point release if issues are indeed
> correct. We’ve yet to try this framework and if it does not work as
> anticipated we will have lots of issues.
>
>
>
> On Tue, Oct 23, 2018 at 8:30 AM Andrei Mikhailovsky
>  wrote:
>
> > Hi Jean,
> >
> > I have previously done some HA testing and have pretty much came to
> > similar conclusions as you have. My testing showed that using HA is very
> > unreliable at best and data loosing at worst cases. I have had the
> > following outcome from various testing scenarios:
> >
> > 1. Works as expected (very rarely)
> > 2. Starts 2 vms on different hosts (data loss / corruption)
> > 3. Reboots ALL KVM hosts (even those hosts that do not have a single vm
> > with nfs volumes)
> >
> > Now, I can not justify having HA with even a slim chances of having 2 or
> 3
> > above. Honestly, I do not know a single business that is happy to accept
> > those scenarios. Frankly speaking, for me the cloudstack HA options
> create
> > more problems than solve and thus I've not enabled them. I have decided
> > that ACS with KVM is not HA friendly, full stop. Having said this, I've
> not
> > tested the latest couple of releases, so I will give it a benefit of the
> > doubt and wait for user's reports to prove my conclusion otherwise. I've
> > wasted enough of my own time on KVM HA.
> >
> > My HA approach to ACS is more of a manual nature, which is far more
> > reliable and is less prone to issues in my experience. I have a
> monitoring
> > system sending me alerts when VMs, host servers and storage become
> > unreachable. It is not as convenient as a fully working automatic HA, I
> > agree, but it is far better to be woken up at 3am to deal with
> restarting a
> > handful of vms and perhaps a KVM host force reboot than dealing with mass
> > KVM hosts reboots and/or trying to find duplicate vms lurking somewhere
> on
> > the host servers. Been there, done that - NO THANKS!
> >
> > Cheers
> >
> > Andrei
> >
> > - Original Message -
> > > From: "Jean-Francois Nadeau" 
> > > To: "users" 
> > > Sent: Monday, 22 October, 2018 22:13:35
> > > Subject: Host HA vs transient NFS problems on KVM
> >
> > > Dear community,
> > >
> > > I want to share my concern upgrading from 4.9 to 4.11 in regards to how
> > the
> > > host HA framework works and the handling of various failure conditions.
> > >
> > > Since we have been running CS on 4.9.3 with NFS on KVM,  VM HA have
> been
> > > working as expected when hypervisor crashed and I agree we might
> have
> > > been lucky knowing the limitations of the KVM investigator and the
> > > possibility to fire the same VM on 2 KVM hosts is real when you know
> the
> > > recipe for it.
> > >
> > > Still, on 4.9.3 we were tolerant to transient primary NFS storage
> access
> > > issues, typical of a network problem (and we've seen it lately for a 22
> > > minutes disconnection).  Although these events are quite rare,  when
> they
> > > do happen their blast radius can be a huge impact on the business.
> > >
> > > So when we initially tested CS on 4.9.3 we purposely blocked access to
> > NFS
> > > and we observe the results.   Changing the kvmhearbeat.sh script so it
> > > doesn't reboot the node after 5 minutes has been essential to defuse
> the
> > > potential of a massive KVM hosts reboot.In the end,  it's far less
> > > damage to let NFS recover than having all those VMs rebooted.   On
> 4.9.3
> > > the cloudtack-agent will remain "Up"  and not fire any VM twice if the
> > NFS
> > > storage becomes available again within 30 minutes.
> > >
> > > Now, testing the upgrade from 4.9 to 4.11 in our lab and the same
> > failure
> > > conditions we rapidly saw a different behavior although not perfectly
> > > consistent.  On 4.11.2 without host HA enabled,  we will see the agent
> > > "try" to disconnect after 5 minutes tho sometimes the KVM host goes
> into
> > > Disconnect state and sometimes it goes straight to Down state.  In that
> > > case we'll see a duplicate VM created in no time and once the NFS issue
> > is
> >

Host HA vs transient NFS problems on KVM

2018-10-22 Thread Jean-Francois Nadeau
Dear community,

I want to share my concern upgrading from 4.9 to 4.11 in regards to how the
host HA framework works and the handling of various failure conditions.

Since we have been running CS on 4.9.3 with NFS on KVM,  VM HA have been
working as expected when hypervisor crashed and I agree we might have
been lucky knowing the limitations of the KVM investigator and the
possibility to fire the same VM on 2 KVM hosts is real when you know the
recipe for it.

Still, on 4.9.3 we were tolerant to transient primary NFS storage access
issues, typical of a network problem (and we've seen it lately for a 22
minutes disconnection).  Although these events are quite rare,  when they
do happen their blast radius can be a huge impact on the business.

So when we initially tested CS on 4.9.3 we purposely blocked access to NFS
and we observe the results.   Changing the kvmhearbeat.sh script so it
doesn't reboot the node after 5 minutes has been essential to defuse the
potential of a massive KVM hosts reboot.In the end,  it's far less
damage to let NFS recover than having all those VMs rebooted.   On 4.9.3
the cloudtack-agent will remain "Up"  and not fire any VM twice if the NFS
storage becomes available again within 30 minutes.

Now, testing the upgrade from 4.9 to 4.11 in our lab and the same  failure
conditions we rapidly saw a different behavior although not perfectly
consistent.  On 4.11.2 without host HA enabled,  we will see the agent
"try" to disconnect after 5 minutes tho sometimes the KVM host goes into
Disconnect state and sometimes it goes straight to Down state.  In that
case we'll see a duplicate VM created in no time and once the NFS issue is
resolved,  we have 2 copies of that VM and cloudstack only knowns about
that last copy.   This is obviously a disaster forcing us to look at how
host HA can help.

Now with host HA enabled and simulating the same NFS hiccup,  we won't get
duplicate VMs but we will get a KVM host reset.  The problem here is that,
yes the host HA does ensure we don't have dup VMs but at scale this would
also provoke a lot of KVM host resets (if not all of them).   If we are at
risk with host HA to have massive KVM host resets,  then I might prefer to
disable host/VM HA entirely and just handle KVM host failures manually.
This is supper annoying for the ops team,  but far less risky for the
business.

Im trying to find if there's a middle ground here between the 4.9 behavior
with NFS hiccups and the reliability of the new host HA framework.

best,

Jean-Francois


Re: New SSVM wont start after upgrade from 4.9.3 to 4.11.2rc3

2018-10-19 Thread Jean-Francois Nadeau
was my bad  I had the wrong KVM host upgraded so systemvm.iso was from
4.9.3.  Console is working now.   Thanks for the hint

On Thu, Oct 18, 2018 at 5:57 PM Rohit Yadav 
wrote:

> Hi Jean-Francois,
>
>
> Did you upgrade your KVM agent and restart it as well? The
> systemvmtemplate seems find but it may be possible that an older
> systemvm.iso has patched the systemvm which is why you're seeing the error.
>
>
> - Rohit
>
> <https://cloudstack.apache.org>
>
>
>
> ________
> From: Jean-Francois Nadeau 
> Sent: Friday, October 19, 2018 2:13:16 AM
> To: users@cloudstack.apache.org
> Subject: New SSVM wont start after upgrade from 4.9.3 to 4.11.2rc3
>
> hi all,
>
> After upgrading from 4.9.3 to 4.11.2rc3 on centos7/KVM,  the old SSMV were
> running and working fine until I destroyed them to get them on the current
> version (I uploaded the 4.11.2rc3 template before the upgrade)
>
> Now whatever I do there's nothing running on the new Console proxy VM
>
> Oct 18 20:30:53 v-15-VM systemd[1]: Starting /etc/rc.local Compatibility...
> Oct 18 20:30:53 v-15-VM systemd[788]: rc-local.service: Failed at step EXEC
> spawning /etc/rc.local: Exec format error
> Oct 18 20:30:53 v-15-VM systemd[783]: cloud.service: Failed at step CHROOT
> spawning /usr/local/cloud/systemvm/_run.sh: No such file or directory
>
> root@v-15-VM:~# systemctl status cloud --no-pager -l
> ● cloud.service - CloudStack Agent service
>Loaded: loaded (/etc/systemd/system/cloud.service; enabled; vendor
> preset: enabled)
>Active: activating (auto-restart) (Result: exit-code) since Thu
> 2018-10-18 20:36:45 UTC; 881ms ago
>   Process: 1244 ExecStart=/usr/local/cloud/systemvm/_run.sh (code=exited,
> status=210/CHROOT)
>  Main PID: 1244 (code=exited, status=210/CHROOT)
>
> Oct 18 20:36:45 v-15-VM systemd[1]: cloud.service: Unit entered failed
> state.
> Oct 18 20:36:45 v-15-VM systemd[1]: cloud.service: Failed with result
> 'exit-code'.
>
>
> I that systemvm build valid ?
>
> root@v-15-VM:~# cat /etc/cloudstack-release
> Cloudstack Release 4.11.2 Wed Oct 17 19:09:25 UTC 2018
>
> thanks !
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
>


Re: New SSVM wont start after upgrade from 4.9.3 to 4.11.2rc3

2018-10-19 Thread Jean-Francois Nadeau
Same on the systemvmtemplate-4.11.0-kvm.qcow2 image I guess I don't
understand how the template gets customized and why it doesn't work for us.

On Fri, Oct 19, 2018 at 11:09 AM Jean-Francois Nadeau <
the.jfnad...@gmail.com> wrote:

> So at first I did not upgrade the agent thinking I would first make sure
> our controllers upgrade went as planned then do the upgrade of all the
> agents on several hundreds of KVM hosts.   So if something goes wrong I
> would roll back the controller only.
>
> Since this upgrade is a test in a lab environment,  I did try to upgrade
> the agent and deleted the console proxy agin but it will still not come up
> and from the console I see the /usr/local/cloud directory is empty...
> expected ?   The systemvmtemplate-4.11.2-kvm.qcow2 image has nothing under
> /usr/local/cloud
>
>
>
>
> On Thu, Oct 18, 2018 at 5:57 PM Rohit Yadav 
> wrote:
>
>> Hi Jean-Francois,
>>
>>
>> Did you upgrade your KVM agent and restart it as well? The
>> systemvmtemplate seems find but it may be possible that an older
>> systemvm.iso has patched the systemvm which is why you're seeing the error.
>>
>>
>> - Rohit
>>
>> <https://cloudstack.apache.org>
>>
>>
>>
>> 
>> From: Jean-Francois Nadeau 
>> Sent: Friday, October 19, 2018 2:13:16 AM
>> To: users@cloudstack.apache.org
>> Subject: New SSVM wont start after upgrade from 4.9.3 to 4.11.2rc3
>>
>> hi all,
>>
>> After upgrading from 4.9.3 to 4.11.2rc3 on centos7/KVM,  the old SSMV were
>> running and working fine until I destroyed them to get them on the current
>> version (I uploaded the 4.11.2rc3 template before the upgrade)
>>
>> Now whatever I do there's nothing running on the new Console proxy VM
>>
>> Oct 18 20:30:53 v-15-VM systemd[1]: Starting /etc/rc.local
>> Compatibility...
>> Oct 18 20:30:53 v-15-VM systemd[788]: rc-local.service: Failed at step
>> EXEC
>> spawning /etc/rc.local: Exec format error
>> Oct 18 20:30:53 v-15-VM systemd[783]: cloud.service: Failed at step CHROOT
>> spawning /usr/local/cloud/systemvm/_run.sh: No such file or directory
>>
>> root@v-15-VM:~# systemctl status cloud --no-pager -l
>> ● cloud.service - CloudStack Agent service
>>Loaded: loaded (/etc/systemd/system/cloud.service; enabled; vendor
>> preset: enabled)
>>Active: activating (auto-restart) (Result: exit-code) since Thu
>> 2018-10-18 20:36:45 UTC; 881ms ago
>>   Process: 1244 ExecStart=/usr/local/cloud/systemvm/_run.sh (code=exited,
>> status=210/CHROOT)
>>  Main PID: 1244 (code=exited, status=210/CHROOT)
>>
>> Oct 18 20:36:45 v-15-VM systemd[1]: cloud.service: Unit entered failed
>> state.
>> Oct 18 20:36:45 v-15-VM systemd[1]: cloud.service: Failed with result
>> 'exit-code'.
>>
>>
>> I that systemvm build valid ?
>>
>> root@v-15-VM:~# cat /etc/cloudstack-release
>> Cloudstack Release 4.11.2 Wed Oct 17 19:09:25 UTC 2018
>>
>> thanks !
>>
>> rohit.ya...@shapeblue.com
>> www.shapeblue.com
>> Amadeus House, Floral Street, London  WC2E 9DPUK
>> @shapeblue
>>
>>
>>
>>


Re: New SSVM wont start after upgrade from 4.9.3 to 4.11.2rc3

2018-10-19 Thread Jean-Francois Nadeau
So at first I did not upgrade the agent thinking I would first make sure
our controllers upgrade went as planned then do the upgrade of all the
agents on several hundreds of KVM hosts.   So if something goes wrong I
would roll back the controller only.

Since this upgrade is a test in a lab environment,  I did try to upgrade
the agent and deleted the console proxy agin but it will still not come up
and from the console I see the /usr/local/cloud directory is empty...
expected ?   The systemvmtemplate-4.11.2-kvm.qcow2 image has nothing under
/usr/local/cloud




On Thu, Oct 18, 2018 at 5:57 PM Rohit Yadav 
wrote:

> Hi Jean-Francois,
>
>
> Did you upgrade your KVM agent and restart it as well? The
> systemvmtemplate seems find but it may be possible that an older
> systemvm.iso has patched the systemvm which is why you're seeing the error.
>
>
> - Rohit
>
> <https://cloudstack.apache.org>
>
>
>
> ________
> From: Jean-Francois Nadeau 
> Sent: Friday, October 19, 2018 2:13:16 AM
> To: users@cloudstack.apache.org
> Subject: New SSVM wont start after upgrade from 4.9.3 to 4.11.2rc3
>
> hi all,
>
> After upgrading from 4.9.3 to 4.11.2rc3 on centos7/KVM,  the old SSMV were
> running and working fine until I destroyed them to get them on the current
> version (I uploaded the 4.11.2rc3 template before the upgrade)
>
> Now whatever I do there's nothing running on the new Console proxy VM
>
> Oct 18 20:30:53 v-15-VM systemd[1]: Starting /etc/rc.local Compatibility...
> Oct 18 20:30:53 v-15-VM systemd[788]: rc-local.service: Failed at step EXEC
> spawning /etc/rc.local: Exec format error
> Oct 18 20:30:53 v-15-VM systemd[783]: cloud.service: Failed at step CHROOT
> spawning /usr/local/cloud/systemvm/_run.sh: No such file or directory
>
> root@v-15-VM:~# systemctl status cloud --no-pager -l
> ● cloud.service - CloudStack Agent service
>Loaded: loaded (/etc/systemd/system/cloud.service; enabled; vendor
> preset: enabled)
>Active: activating (auto-restart) (Result: exit-code) since Thu
> 2018-10-18 20:36:45 UTC; 881ms ago
>   Process: 1244 ExecStart=/usr/local/cloud/systemvm/_run.sh (code=exited,
> status=210/CHROOT)
>  Main PID: 1244 (code=exited, status=210/CHROOT)
>
> Oct 18 20:36:45 v-15-VM systemd[1]: cloud.service: Unit entered failed
> state.
> Oct 18 20:36:45 v-15-VM systemd[1]: cloud.service: Failed with result
> 'exit-code'.
>
>
> I that systemvm build valid ?
>
> root@v-15-VM:~# cat /etc/cloudstack-release
> Cloudstack Release 4.11.2 Wed Oct 17 19:09:25 UTC 2018
>
> thanks !
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
>


New SSVM wont start after upgrade from 4.9.3 to 4.11.2rc3

2018-10-18 Thread Jean-Francois Nadeau
hi all,

After upgrading from 4.9.3 to 4.11.2rc3 on centos7/KVM,  the old SSMV were
running and working fine until I destroyed them to get them on the current
version (I uploaded the 4.11.2rc3 template before the upgrade)

Now whatever I do there's nothing running on the new Console proxy VM

Oct 18 20:30:53 v-15-VM systemd[1]: Starting /etc/rc.local Compatibility...
Oct 18 20:30:53 v-15-VM systemd[788]: rc-local.service: Failed at step EXEC
spawning /etc/rc.local: Exec format error
Oct 18 20:30:53 v-15-VM systemd[783]: cloud.service: Failed at step CHROOT
spawning /usr/local/cloud/systemvm/_run.sh: No such file or directory

root@v-15-VM:~# systemctl status cloud --no-pager -l
● cloud.service - CloudStack Agent service
   Loaded: loaded (/etc/systemd/system/cloud.service; enabled; vendor
preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Thu
2018-10-18 20:36:45 UTC; 881ms ago
  Process: 1244 ExecStart=/usr/local/cloud/systemvm/_run.sh (code=exited,
status=210/CHROOT)
 Main PID: 1244 (code=exited, status=210/CHROOT)

Oct 18 20:36:45 v-15-VM systemd[1]: cloud.service: Unit entered failed
state.
Oct 18 20:36:45 v-15-VM systemd[1]: cloud.service: Failed with result
'exit-code'.


I that systemvm build valid ?

root@v-15-VM:~# cat /etc/cloudstack-release
Cloudstack Release 4.11.2 Wed Oct 17 19:09:25 UTC 2018

thanks !


Re: How exactly does CloudStack stop a VM?

2018-06-06 Thread Jean-Francois Nadeau
If the xentools are installed and running in the guest OS it should detect
the shutdown sent via XAPI.

On Wed, Jun 6, 2018 at 6:58 PM, Yiping Zhang  wrote:

> We are using XenServers with our CloudStack instances.
>
> On 6/6/18, 3:11 PM, "Jean-Francois Nadeau" 
> wrote:
>
> On KVM,  AFAIK the shutdown is the equivalent of pressing the power
> button.  To get the Linux OS to catch this and initiate a clean
> shutdown,
> you need the ACPID service running in the guest OS.
>
> On Wed, Jun 6, 2018 at 6:01 PM, Yiping Zhang 
> wrote:
>
> > Hi, all:
> >
> > We have a few VM instances which will hang when issue a Stop command
> from
> > CloudStack web UI or thru API calls, due to the app’s own
> startup/stop
> > script in guest OS was not properly invoked.  The app’s startup/stop
> script
> > works properly if we issue shutdown/reboot command in guest OS
> directly.
> >
> > Hence here is my question:  when CloudStack tries to stop a running
> VM
> > instance, what is the exact command it sends to VM to stop it, with
> or
> > without forced flag?  What are the interactions between the
> CloudStack, the
> > hypervisor and the guest VM?
> >
> > Yiping
> >
>
>
>


Re: How exactly does CloudStack stop a VM?

2018-06-06 Thread Jean-Francois Nadeau
On KVM,  AFAIK the shutdown is the equivalent of pressing the power
button.  To get the Linux OS to catch this and initiate a clean shutdown,
you need the ACPID service running in the guest OS.

On Wed, Jun 6, 2018 at 6:01 PM, Yiping Zhang  wrote:

> Hi, all:
>
> We have a few VM instances which will hang when issue a Stop command from
> CloudStack web UI or thru API calls, due to the app’s own startup/stop
> script in guest OS was not properly invoked.  The app’s startup/stop script
> works properly if we issue shutdown/reboot command in guest OS directly.
>
> Hence here is my question:  when CloudStack tries to stop a running VM
> instance, what is the exact command it sends to VM to stop it, with or
> without forced flag?  What are the interactions between the CloudStack, the
> hypervisor and the guest VM?
>
> Yiping
>


Re: New L2 nework types are not shared

2018-01-25 Thread Jean-Francois Nadeau
Perfect. Thanks Boris.

Can the fix for shared L2 networks also be merged in 4.11 RC2 ?

On Thu, Jan 25, 2018 at 4:02 AM, Boris Stoyanov <
boris.stoya...@shapeblue.com> wrote:

> Thanks Jean-Francois,
> We’ve just confirmed CLOUDSTACK-10239 is a regression and will be included
> in 4.11 RC2. Thanks for bringing this up.
>
> Boris
>
>
> boris.stoya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
> > On 24 Jan 2018, at 9:57, Boris Stoyanov <boris.stoya...@shapeblue.com>
> wrote:
> >
> > Hi Jean-Francois, I don’t know if you’re following dev@ list, there’s
> an open thread for voting on release candidates. If you reply to that
> thread with your findings it’ll be more visible to devs and release
> management.
> >
> > Bobby.
> >
> >
> > boris.stoya...@shapeblue.com
> > www.shapeblue.com
> > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapeblue
> >
> >
> >
> >> On 23 Jan 2018, at 16:23, Jean-Francois Nadeau <the.jfnad...@gmail.com>
> wrote:
> >>
> >> Thank you both Boris and Nux!
> >>
> >> I'll keep an eye 4.11.1.   If I may get your attention on another issue
> >> that blocks me from further testing on 4.11, I filled CLOUDSTACK-10239
> :-)
> >>
> >> best regards,
> >>
> >> Jean-Francois
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Jan 23, 2018 at 8:55 AM, Boris Stoyanov <
> >> boris.stoya...@shapeblue.com> wrote:
> >>
> >>> Thanks for bringing this up Jean-Francois,
> >>> One of our devs has addressed this and with the following PR it’s now
> >>> fixed. It turned out a constraint was violated when deploying a VM in a
> >>> project. This will likely get merged in 4.11.1.0
> >>> https://github.com/apache/cloudstack/pull/2420
> >>>
> >>> Please let keep us posted if you find any further issues.
> >>>
> >>> Regards,
> >>> Boris.
> >>>
> >>> On 19 Jan 2018, at 17:57, Nux! <n...@li.nux.ro<mailto:n...@li.nux.ro>>
> >>> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Might be worth sending this to dev@ as well, especially now that we
> are
> >>> doing testing for 4.11.
> >>>
> >>> HTH
> >>>
> >>> --
> >>> Sent from the Delta quadrant using Borg technology!
> >>>
> >>> Nux!
> >>> www.nux.ro<http://www.nux.ro>
> >>>
> >>>
> >>> boris.stoya...@shapeblue.com
> >>> www.shapeblue.com
> >>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> >>> @shapeblue
> >>>
> >>>
> >>>
> >>> - Original Message -
> >>> From: "Jean-Francois Nadeau" <the.jfnad...@gmail.com>
> >>> To: "users" <users@cloudstack.apache.org>
> >>> Sent: Wednesday, 17 January, 2018 21:09:19
> >>> Subject: New L2 nework types are not shared
> >>>
> >>> Hi all,
> >>>
> >>> I'm testing 4.11-rc1 and the new L2 network type feature as shown at
> >>> http://www.shapeblue.com/layer-2-networks-in-cloudstack/
> >>>
> >>> I want to use this as a replacement to a shared network offering with
> no
> >>> DHCP which works to support an external DHCP server but still required
> to
> >>> fill some CIDR information.
> >>>
> >>> I thought the intent was that L2 network type was to replace the
> previous
> >>> approach when required to integrate an existing network.
> >>>
> >>> If I attempt to provision a VM in a project as the root admin using
> the L2
> >>> network I get denied... apparently because they are not shared and I
> can't
> >>> make them public.
> >>>
> >>> ('HTTP 531 response from CloudStack', , {u'errorcode':
> 531,
> >>> u'uuidList': [], u'cserrorcode': 4365, u'errortext': u'Unable to use
> >>> network with id= 7712102b-bbdf-4c54-bdbf-9fddfa16de46, permission
> >>> denied'})
> >>>
> >>> Provisioning in the root project works just fine  but really I
> want to
> >>> use L2 networks in user projects even if only the admin can do so.
> >>>
> >>> Thoughts ?
> >>>
> >>>
> >
>
>


Re: New L2 nework types are not shared

2018-01-23 Thread Jean-Francois Nadeau
Thank you both Boris and Nux!

I'll keep an eye 4.11.1.   If I may get your attention on another issue
that blocks me from further testing on 4.11, I filled CLOUDSTACK-10239 :-)

best regards,

Jean-Francois






On Tue, Jan 23, 2018 at 8:55 AM, Boris Stoyanov <
boris.stoya...@shapeblue.com> wrote:

> Thanks for bringing this up Jean-Francois,
> One of our devs has addressed this and with the following PR it’s now
> fixed. It turned out a constraint was violated when deploying a VM in a
> project. This will likely get merged in 4.11.1.0
> https://github.com/apache/cloudstack/pull/2420
>
> Please let keep us posted if you find any further issues.
>
> Regards,
> Boris.
>
> On 19 Jan 2018, at 17:57, Nux! <n...@li.nux.ro<mailto:n...@li.nux.ro>>
> wrote:
>
> Hi,
>
> Might be worth sending this to dev@ as well, especially now that we are
> doing testing for 4.11.
>
> HTH
>
> --
> Sent from the Delta quadrant using Borg technology!
>
> Nux!
> www.nux.ro<http://www.nux.ro>
>
>
> boris.stoya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
> - Original Message -
> From: "Jean-Francois Nadeau" <the.jfnad...@gmail.com>
> To: "users" <users@cloudstack.apache.org>
> Sent: Wednesday, 17 January, 2018 21:09:19
> Subject: New L2 nework types are not shared
>
> Hi all,
>
> I'm testing 4.11-rc1 and the new L2 network type feature as shown at
> http://www.shapeblue.com/layer-2-networks-in-cloudstack/
>
> I want to use this as a replacement to a shared network offering with no
> DHCP which works to support an external DHCP server but still required to
> fill some CIDR information.
>
> I thought the intent was that L2 network type was to replace the previous
> approach when required to integrate an existing network.
>
> If I attempt to provision a VM in a project as the root admin using the L2
> network I get denied... apparently because they are not shared and I can't
> make them public.
>
> ('HTTP 531 response from CloudStack', , {u'errorcode': 531,
> u'uuidList': [], u'cserrorcode': 4365, u'errortext': u'Unable to use
> network with id= 7712102b-bbdf-4c54-bdbf-9fddfa16de46, permission
> denied'})
>
> Provisioning in the root project works just fine  but really I want to
> use L2 networks in user projects even if only the admin can do so.
>
> Thoughts ?
>
>


Re: [PROPOSE] EOL for supported OSes & Hypervisors

2018-01-18 Thread Jean-Francois Nadeau
+1 On the versions of ACS in the matrix.   I.e. it sounds like today most
production setup runs 4.9 or before and until 4.11 is GA and stabilizes it
sounds like 4.9 is the only good option for a go live today.  Knowing how
long 4.9 would be supported is key.

On Wed, Jan 17, 2018 at 9:50 AM, Ron Wheeler  wrote:

> It might also be helpful to know what version of ACS as well.
> Some indication of your plan/desire to upgrade ACS, hypervisor, or
> management server operating system might be helpful.
> There is a big difference between the situation where someone is running
> ACS 4.9x on CentOS 6 and wants to upgrade to ACS 4.12 while keeping CentOS
> 6 and another environment where the planned upgrade to ACS4.12 will be done
> at the same time as an upgrade to CentOS 7.x.
>
> Is it fair to say that any proposed changes in this area will occur in
> 4.12 at the earliest and will not likely occur before summer 2018?
>
>
> Ron
>
>
>
> On 17/01/2018 4:23 AM, Paul Angus wrote:
>
>> Thanks Eric,
>>
>> As you'll see from the intro email to this thread, the purpose here is to
>> ensure that we don't strand a 'non-trivial' number of users by dropping
>> support for any given hypervisor, or management server operating system.
>>
>> Hence the request to users to let the community know what they are using,
>> so that a fact-based community consensus can be reached.
>>
>>
>> Kind regards,
>>
>> Paul Angus
>>
>> paul.an...@shapeblue.com
>> www.shapeblue.com
>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>> @shapeblue
>>
>>
>> -Original Message-
>> From: Eric Lee Green [mailto:eric.lee.gr...@gmail.com]
>> Sent: 16 January 2018 23:36
>> To: users@cloudstack.apache.org
>> Subject: Re: [PROPOSE] EOL for supported OSes & Hypervisors
>>
>> This is the type of discussion that I wanted to open - the argument
>>> that I see for earlier dropping of v6 is that - Between May 2018 and
>>> q2 2020 RHEL/CentOS 6.x will only receive security and mission
>>> critical updates, meanwhile packages on which we depend or may want to
>>> utilise in the future are been deprecated or not developed for v6.x
>>>
>> But this has always been the case for Centos 6.x. It is running antique
>> versions of everything, and has been doing so for quite some time. It is,
>> for example, running versions of Gnome and init that have been obsolete for
>> years. Same deal with the version of MySQL that it comes with.
>>
>> The reality is that Centos 6.x guest support, at the very least, needs to
>> be tested with each new version of Cloudstack until final EOL of Centos 6
>> in Q2 2020. New versions of Cloudstack with new features not supported by
>> Centos 6 (such as LVM support for KVM, which requires the LIO storage
>> stack) can require Centos 7 or later, but the last Cloudstack version that
>> supports Centos 6.x as its server host should continue to receive bug fixes
>> until Centos 6.x is EOL.
>>
>> Making someone's IT investment obsolete is a way to irrelevancy.
>> Cloudstack is already an also-ran in the cloud marketplace. Making
>> someone's IT investment obsolete before the official EOL time for their IT
>> investment is a good way to have a mass migration away from your technology.
>>
>> This doesn't particularly affect me since my Centos 6 virtualization
>> hosts are not running Cloudstack and are going to be re-imaged to Centos
>> 7 before being added to the Cloudstack cluster, but ignoring the IT
>> environment that people actually live in, as versus the one we wish
>> existed, is annoying regardless. A friend of mine once said of the state of
>> ERP software, "enterprise software is dog food if dog food was being
>> designed by cats." I.e., the people writing the software rarely have any
>> understanding of how it is actually used by real life enterprises in real
>> life environments. Don't be those people.
>>
>>
>> On 01/16/2018 09:58 AM, Paul Angus wrote:
>>
>>> Hi Eric,
>>>
>>> This is the type of discussion that I wanted to open - the argument
>>> that I see for earlier dropping of v6 is that - Between May 2018 and q2
>>> 2020 RHEL/CentOS 6.x will only receive security and mission critical
>>> updates, meanwhile packages on which we depend or may want to utilise in
>>> the future are been deprecated or not developed for v6.x Also the testing
>>> and development burden on the CloudStack community increases as we try to
>>> maintain backward compatibility while including new versions.
>>>
>>> Needing installation documentation for centos 7 is a great point, and
>>> something that we need to address regardless.
>>>
>>>
>>> Does anyone else have a view, I'd really like to here from a wide range
>>> of people.
>>>
>>> Kind regards,
>>>
>>> Paul Angus
>>>
>>> paul.an...@shapeblue.com
>>> www.shapeblue.com
>>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue
>>>
>>>
>>> -Original Message-
>>> From: Eric Green [mailto:eric.lee.gr...@gmail.com]
>>> Sent: 12 January 2018 17:24
>>> To: 

New L2 nework types are not shared

2018-01-17 Thread Jean-Francois Nadeau
Hi all,

I'm testing 4.11-rc1 and the new L2 network type feature as shown at
http://www.shapeblue.com/layer-2-networks-in-cloudstack/

I want to use this as a replacement to a shared network offering with no
DHCP which works to support an external DHCP server but still required to
fill some CIDR information.

I thought the intent was that L2 network type was to replace the previous
approach when required to integrate an existing network.

If I attempt to provision a VM in a project as the root admin using the L2
network I get denied... apparently because they are not shared and I can't
make them public.

('HTTP 531 response from CloudStack', , {u'errorcode': 531,
u'uuidList': [], u'cserrorcode': 4365, u'errortext': u'Unable to use
network with id= 7712102b-bbdf-4c54-bdbf-9fddfa16de46, permission denied'})

Provisioning in the root project works just fine  but really I want to
use L2 networks in user projects even if only the admin can do so.

Thoughts ?


Re: Recover VM after KVM host down (and HA not working) ?

2017-12-27 Thread Jean-Francois Nadeau
Hmm could this be the culprit ?

WARN  [c.c.h.KVMInvestigator] (AgentTaskPool-10:ctx-694feb6c)
(logid:160220c5) Agent investigation was requested on host
Host[-4-Routing], but host does not support investigation because it has no
NFS storage. Skipping investigation.

The primary storage is NFS.

On Sat, Dec 23, 2017 at 10:14 AM, Jean-Francois Nadeau <
the.jfnad...@gmail.com> wrote:

> Clearly the management server doesn't realize the instance on the failed
> host is not running...  but the host is in Alert state and powered down,
> and missing NFS heartbeats.
>
> 2017-12-23 14:57:52,427 DEBUG [c.c.h.Status] (AgentTaskPool-10:ctx-694feb6c)
> (logid:160220c5) Transition:[Resource state = Enabled, Agent event =
> AgentDisconnected, Host id = 4, name = r62-i122-36-01.domain.com]
> 2017-12-23 14:58:24,487 DEBUG [c.c.c.CapacityManagerImpl]
> (CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 1 VMs on host 4
> 2017-12-23 14:58:24,495 DEBUG [c.c.c.CapacityManagerImpl]
> (CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 0 VM, not running on
> host 4
>
> Next step ?
>
> On Sat, Dec 23, 2017 at 9:49 AM, Jean-Francois Nadeau <
> the.jfnad...@gmail.com> wrote:
>
>> I'd really like to get at the bottom of this.It does sound like the
>> behavior mentioned in https://issues.apache.org/j
>> ira/browse/CLOUDSTACK-5582 but should be long fixed.
>>
>> One suspect log entry (be unrelated) I noticed is this recurring
>> exception in the manger logs :
>>
>> ERROR [c.c.v.UserVmManagerImpl] (UserVm-ipfetch-3:ctx-d4c44c2b)
>> (logid:16dd70ad) Caught the Exception in VmIpFetchTask
>>
>> Which I guess is caused by the use of an external DHCP so manager fails
>> to determine a running VM IP.Which brings me to my next question
>> how is a VM marked for HA actually monitored ?
>>
>>
>> On Sat, Dec 23, 2017 at 3:38 AM, Eric Green <eric.lee.gr...@gmail.com>
>> wrote:
>>
>>> If all else fails, change its state to the correct  state in the MySQL
>>> database and restart the management  service. Sadly that is the only way
>>> I
>>> could do it when my Cloudstack got confused and stuck an instance in an
>>> intermediate state where I couldn't do anything with it.
>>>
>>> On Dec 22, 2017 at 9:09 AM, >> the.jfnad...@gmail.com>>
>>> wrote:
>>>
>>> Good morning,
>>>
>>> New to ACS and doing a POC with 4.10 on Centos 7 and KVM.
>>>
>>> Im trying to recover VMs after an host failure (powered off from OOB).
>>>
>>> Primary storage is NFS and IPMI is configured for the KVM hosts.  Zone is
>>> advanced mode with vlan separation and created a shared network with no
>>> services since I wish to use an external DHCP.
>>>
>>> First,  say I don't have a compute offering with HA enabled and a KVM
>>> host
>>> goes down...  I can't put it in maintenance mode while down and disabling
>>> it have no effect on the state of the lost VMs.  VM stays in running
>>> state
>>> according to manager.   What should I do to force restart on remaining
>>> healthy hosts ?
>>>
>>> Then I enabled  IPMI on all KVM hosts and attempted the same experience
>>> with a compute offering with HA enabled.   Same result.  Manager do see
>>> the
>>> host as disconnected and powered off but take no action.   I certainly
>>> miss
>>> something here.  Please help !
>>>
>>> Regards,
>>>
>>> Jean-Francois
>>>
>>
>>
>


Re: Recover VM after KVM host down (and HA not working) ?

2017-12-23 Thread Jean-Francois Nadeau
Clearly the management server doesn't realize the instance on the failed
host is not running...  but the host is in Alert state and powered down,
and missing NFS heartbeats.

2017-12-23 14:57:52,427 DEBUG [c.c.h.Status]
(AgentTaskPool-10:ctx-694feb6c) (logid:160220c5) Transition:[Resource state
= Enabled, Agent event = AgentDisconnected, Host id = 4, name =
r62-i122-36-01.domain.com]
2017-12-23 14:58:24,487 DEBUG [c.c.c.CapacityManagerImpl]
(CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 1 VMs on host 4
2017-12-23 14:58:24,495 DEBUG [c.c.c.CapacityManagerImpl]
(CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 0 VM, not running on
host 4

Next step ?

On Sat, Dec 23, 2017 at 9:49 AM, Jean-Francois Nadeau <
the.jfnad...@gmail.com> wrote:

> I'd really like to get at the bottom of this.It does sound like the
> behavior mentioned in https://issues.apache.org/
> jira/browse/CLOUDSTACK-5582 but should be long fixed.
>
> One suspect log entry (be unrelated) I noticed is this recurring exception
> in the manger logs :
>
> ERROR [c.c.v.UserVmManagerImpl] (UserVm-ipfetch-3:ctx-d4c44c2b)
> (logid:16dd70ad) Caught the Exception in VmIpFetchTask
>
> Which I guess is caused by the use of an external DHCP so manager fails to
> determine a running VM IP.Which brings me to my next question how
> is a VM marked for HA actually monitored ?
>
>
> On Sat, Dec 23, 2017 at 3:38 AM, Eric Green <eric.lee.gr...@gmail.com>
> wrote:
>
>> If all else fails, change its state to the correct  state in the MySQL
>> database and restart the management  service. Sadly that is the only way I
>> could do it when my Cloudstack got confused and stuck an instance in an
>> intermediate state where I couldn't do anything with it.
>>
>> On Dec 22, 2017 at 9:09 AM, > >>
>> wrote:
>>
>> Good morning,
>>
>> New to ACS and doing a POC with 4.10 on Centos 7 and KVM.
>>
>> Im trying to recover VMs after an host failure (powered off from OOB).
>>
>> Primary storage is NFS and IPMI is configured for the KVM hosts.  Zone is
>> advanced mode with vlan separation and created a shared network with no
>> services since I wish to use an external DHCP.
>>
>> First,  say I don't have a compute offering with HA enabled and a KVM host
>> goes down...  I can't put it in maintenance mode while down and disabling
>> it have no effect on the state of the lost VMs.  VM stays in running state
>> according to manager.   What should I do to force restart on remaining
>> healthy hosts ?
>>
>> Then I enabled  IPMI on all KVM hosts and attempted the same experience
>> with a compute offering with HA enabled.   Same result.  Manager do see
>> the
>> host as disconnected and powered off but take no action.   I certainly
>> miss
>> something here.  Please help !
>>
>> Regards,
>>
>> Jean-Francois
>>
>
>


Re: Recover VM after KVM host down (and HA not working) ?

2017-12-23 Thread Jean-Francois Nadeau
I'd really like to get at the bottom of this.It does sound like the
behavior mentioned in https://issues.apache.org/jira/browse/CLOUDSTACK-5582
but should be long fixed.

One suspect log entry (be unrelated) I noticed is this recurring exception
in the manger logs :

ERROR [c.c.v.UserVmManagerImpl] (UserVm-ipfetch-3:ctx-d4c44c2b)
(logid:16dd70ad) Caught the Exception in VmIpFetchTask

Which I guess is caused by the use of an external DHCP so manager fails to
determine a running VM IP.Which brings me to my next question how
is a VM marked for HA actually monitored ?


On Sat, Dec 23, 2017 at 3:38 AM, Eric Green 
wrote:

> If all else fails, change its state to the correct  state in the MySQL
> database and restart the management  service. Sadly that is the only way I
> could do it when my Cloudstack got confused and stuck an instance in an
> intermediate state where I couldn't do anything with it.
>
> On Dec 22, 2017 at 9:09 AM,  >>
> wrote:
>
> Good morning,
>
> New to ACS and doing a POC with 4.10 on Centos 7 and KVM.
>
> Im trying to recover VMs after an host failure (powered off from OOB).
>
> Primary storage is NFS and IPMI is configured for the KVM hosts.  Zone is
> advanced mode with vlan separation and created a shared network with no
> services since I wish to use an external DHCP.
>
> First,  say I don't have a compute offering with HA enabled and a KVM host
> goes down...  I can't put it in maintenance mode while down and disabling
> it have no effect on the state of the lost VMs.  VM stays in running state
> according to manager.   What should I do to force restart on remaining
> healthy hosts ?
>
> Then I enabled  IPMI on all KVM hosts and attempted the same experience
> with a compute offering with HA enabled.   Same result.  Manager do see the
> host as disconnected and powered off but take no action.   I certainly miss
> something here.  Please help !
>
> Regards,
>
> Jean-Francois
>


Recover VM after KVM host down (and HA not working) ?

2017-12-22 Thread Jean-Francois Nadeau
Good morning,

New to ACS and doing a POC with 4.10 on Centos 7 and KVM.

Im trying to recover VMs after an host failure (powered off from OOB).

Primary storage is NFS and IPMI is configured for the KVM hosts.  Zone is
advanced mode with vlan separation and created a shared network with no
services since I wish to use an external DHCP.

First,  say I don't have a compute offering with HA enabled and a KVM host
goes down...  I can't put it in maintenance mode while down and disabling
it have no effect on the state of the lost VMs.  VM stays in running state
according to manager.   What should I do to force restart on remaining
healthy hosts ?

Then I enabled  IPMI on all KVM hosts and attempted the same experience
with a compute offering with HA enabled.   Same result.  Manager do see the
host as disconnected and powered off but take no action.   I certainly miss
something here.  Please help !

Regards,

Jean-Francois