Re: Known trillian test failures

2017-12-20 Thread Ron Wheeler
While cleaning up the tests is there any value in splitting out tests 
that are redundant
- test that test low level functions whose failures will be picked up in 
other tests of higher level functions

- tests that are run on modules that "never" change.

The lower level test may still be useful for testing a change to a low 
level function or for tracking down a failure in a higher level function 
that uses a low level routine but may not add much value to a test suite 
that is run frequently.


Would this reduce the amount of time taken to do a full test at the 
expense of some increased risk that an edge case might be missed?
Would setting aside the clutter allow the team to focus on the tests 
that really matter?


Ron

On 20/12/2017 1:21 PM, Paul Angus wrote:

Hi Marc-Aurèle, (and everyone else)

The title probably is slightly incorrect.  It should really say known Marvin 
test failures.  Trillian is the automation that creates the environments to run 
the tests in, the tests are purely those that are in Marvin codebase so anyone 
can repeat them.  In fact we would like to see other people running the tests 
in their environments and comparing the results.

With regard to the failing tests, I agree, that it would be dangerous to hide 
failures.
I would like to see however, a matrix of known good and known bad tests, and 
any PR that then fails known good tests has a problem.
With a visible list of known bad tests we can 'not fail' a PR due to failing a 
bad test, and also there would be a list of bad tests which the community can 
attack and whittle down the list until all tests *should* pass.

That way we can make clear (automated) decisions on pass/fail.  Rather than get 
a list of pass/fails that we then have to interpret.



Kind regards,

Paul Angus

paul.an...@shapeblue.com
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
   
  



-Original Message-
From: Marc-Aurèle Brothier [mailto:ma...@exoscale.ch]
Sent: 20 December 2017 12:56
To: dev@cloudstack.apache.org
Subject: Known trillian test failures

@rhtyd

Could something be done to avoid confusing people pushing PR to have trillian 
test failures, which apparently are know to fail all the time or often? I know 
it's hard to keep the tests in good shape and make them run smoothly but I find 
it very disturbing and therefore I have to admit I'm not paying attention to 
those outputs, sadly.

Skipping them adds the high risk of never getting fixed... I would hope that 
someone having full access the the management & agent's logs could fix them, 
since AFAIK they aren't available.

Cheers



--
Ron Wheeler
President
Artifact Software Inc
email: rwhee...@artifact-software.com
skype: ronaldmwheeler
phone: 866-970-2435, ext 102



RE: Known trillian test failures

2017-12-20 Thread Paul Angus
Hi Marc-Aurèle, (and everyone else)

The title probably is slightly incorrect.  It should really say known Marvin 
test failures.  Trillian is the automation that creates the environments to run 
the tests in, the tests are purely those that are in Marvin codebase so anyone 
can repeat them.  In fact we would like to see other people running the tests 
in their environments and comparing the results.

With regard to the failing tests, I agree, that it would be dangerous to hide 
failures.
I would like to see however, a matrix of known good and known bad tests, and 
any PR that then fails known good tests has a problem.
With a visible list of known bad tests we can 'not fail' a PR due to failing a 
bad test, and also there would be a list of bad tests which the community can 
attack and whittle down the list until all tests *should* pass.

That way we can make clear (automated) decisions on pass/fail.  Rather than get 
a list of pass/fails that we then have to interpret.



Kind regards,

Paul Angus

paul.an...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 


-Original Message-
From: Marc-Aurèle Brothier [mailto:ma...@exoscale.ch] 
Sent: 20 December 2017 12:56
To: dev@cloudstack.apache.org
Subject: Known trillian test failures

@rhtyd

Could something be done to avoid confusing people pushing PR to have trillian 
test failures, which apparently are know to fail all the time or often? I know 
it's hard to keep the tests in good shape and make them run smoothly but I find 
it very disturbing and therefore I have to admit I'm not paying attention to 
those outputs, sadly.

Skipping them adds the high risk of never getting fixed... I would hope that 
someone having full access the the management & agent's logs could fix them, 
since AFAIK they aren't available.

Cheers


Re: Call for participation: Issue triaging and PR review/testing

2017-12-20 Thread Ron Wheeler
If a "freeze" means no new functionality can be added but testing and 
bug fixes continue until a decision is made that 4.11.0 is ready for 
release, that makes an early freeze more desirable rather than a later 
freeze. if the goal is to get a LTS replacement for 4.10 as soon as 
possible.


Any bugs found in 4.10 should be fixed in 4.11, if they apply, during 
the time between the freeze and the release.


The freeze allows people to test a product that is more stable than a 
product that is getting regular inputs of new functionality.
It also gets the updating of the documentation and testing of the 
installation procedures started.


The real questions are :
Are there any important functional improvements that are going to be 
deferred to 4.12 by a Jan 8 freeze?
Is there any danger that forcing an early freeze will cause some new 
functionality to be squeezed in that has not had the proper review and 
testing to the point that the release will be delayed or shipped with 
problematic code?


Ron

On 20/12/2017 7:37 AM, Rohit Yadav wrote:

Hi Ivan,


I took some time to reflect and get back to you:


I agree the freeze date may be a bit aggressive, based on past experiences that 
we've all seen the final release may take some time (weeks even). After the 
freeze, we can all work towards the stability, fix bugs for a stable and proper 
release.


A non-LTS branch such as 4.10 may not receive much usage and attention, 
therefore less reporting of bugs and fixes. For production users, an LTS 
release is recommended who are keen on stability than (new) features, the 
current one being the 4.9.x releases. Starting with 4.11 it's something we'll 
fix on the website i.e. to recommend most users to stay with LTS releases, give 
information about support dates for such releases etc. Like any software 
release, there will be bugs and it's not an issue as long as they get reported 
and fixed.


I'm glad to see your participation, PRs and will work with you and others for a 
stable release. Keep sending PRs, review other's PRs and help with testing and 
reporting of bugs!


- Rohit


From: Ivan Kudryavtsev 
Sent: Wednesday, December 13, 2017 9:34:43 AM
To: dev@cloudstack.apache.org
Cc: Rohit Yadav; us...@cloudstack.apache.org
Subject: Re: Call for participation: Issue triaging and PR review/testing

Hello, devs, users, Rohit. Have a good day.

Rohit, you intend to freeze 4.11 on 8 january and, frankly speaking, I see
risks here. A major risk is that 4.10 is too buggy and it seems nobody uses
it actually right now in production because it's unusable, unfortunately,
so we are planning to freeze 4.11 which stands on untested 4.10 with a lot
of lacks still undiscovered and not reported. I believe it's a very
dangerous way to release one more release with bad quality. Actually,
marvin and units don't cover regressions I meet in 4.10. Ok, let's take a
look at new one our engineers found today in 4.10:

It happens rarely for Domain Admin, probably other roles are affected. We
meet it once a week. It's for resource accounting. Domain admin resource
quota is 200GB of primary storage. During continouos creation of VMs and
removing them it leads to "Maximum number of resources of type
'primary_storage' for domain id=2 has been exceeded", however only 26 GB is
used actually.

I mean smoke tests run well, unit tests run well, but *nobody reported very
obvious bug* which should be met number of times if community use 4.10. I
suppose, there are a lot of undiscovered and unreported regressions in 4.10
and this is a huge risk to the 4.11 release quality and more we move that
way quality decreases. Unfortunately, I'm not able to propose a silver
bullet, but I suppose feature development speed should be decreased until
master is tested very thoroughly and might be 8 january is too early for
4.11 freeze.



2017-12-09 2:00 GMT+07:00 Wido den Hollander :



On 12/08/2017 03:24 PM, Rohit Yadav wrote:


All,

Given we've about a month left until 4.11/LTS freeze date of 8th Jan 2018
[1], I would like to call our community for active participation.

Given a huge pile of open issues, please share on this thread a 'brief'
list of top bugs/issues that you would want to see fixed and are
applicable
to master branch, especially critical/blocker issues and regressions. I
would especially like to engage with the users of the community towards
this effort. I'll start weekly reporting of open issues by end of next
week.

Developers - feel free to comment tagging Daan (@DaanHoogland), Paul
(@PaulAngus) and me (@rhtyd) in your pull requests if you're seeking
review
and testing (and merging) of your PR. I hope to work with you all with my
committer/contributor hat on.

Finally, I look forward to all of your help and support towards the next
(LTS) release.



Let's make it work!

Thoughts, comments?



With the holiday season coming up we might see things slow down a bit. But
I think we already have enough PRs open for 4.

Re: Known trillian test failures

2017-12-20 Thread Rohit Yadav
Hi Marc,


You've raised a very valid concern. When we've known list of smoketest 
failures, it's understandable that most people may not understand how to 
interpret them and ignore them. Access to the Trillian environment is another 
issue. I don't have all the answers and a solution ot these problems at the 
moment but let me discuss that internally and get back to you.


Can you and others help review PR 2211 where I've tried to address that? I ask 
this because this PR not only tries to migrate us to a newer Debian 
systemvmtemplate but focuses on stabilizing master by getting almost 100% 
smoketest pass rate on VMware/KVM/XenServer, to get there I had to fix some 
tests as well. Once we can have such a pass rate on master, it will be easier 
to verify test results on other PRs against the baseline.


I'll see if we can improvement Trillian test runs to include management server 
(and agent) logs in the marvin log zip that is put as part of the result on the 
github pr.


- Rohit


From: Marc-Aur?le Brothier 
Sent: Wednesday, December 20, 2017 6:26:05 PM
To: dev@cloudstack.apache.org
Subject: Known trillian test failures

@rhtyd

Could something be done to avoid confusing people pushing PR to have
trillian test failures, which apparently are know to fail all the time or
often? I know it's hard to keep the tests in good shape and make them run
smoothly but I find it very disturbing and therefore I have to admit I'm
not paying attention to those outputs, sadly.

Skipping them adds the high risk of never getting fixed... I would hope
that someone having full access the the management & agent's logs could fix
them, since AFAIK they aren't available.

Cheers

rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



Re: PrivateGateaway ACL rules blocker

2017-12-20 Thread Voloshanenko Igor
Test passed.
So will wait till your guys will have time to review this one-liner and
merge it )

2017-12-20 14:16 GMT+02:00 Rohit Yadav :

> Sure, I've kicked some tests. Will merge when tests pass and we've some
> review/feedback from others.
>
>
> -Rohit
>
> 
> From: Voloshanenko Igor 
> Sent: Wednesday, December 20, 2017 5:33:10 PM
> To: dev@cloudstack.apache.org
> Subject: Re: PrivateGateaway ACL rules blocker
>
> Tnx a lot Rohit!
>
> As we only handle a special case - I think all tests will pass without any
> issues. As I can't imagine that somebody assumed inside test to check if
> buggy condition passed :D
>
> 2017-12-20 13:02 GMT+02:00 Rohit Yadav :
>
> > Hi Voloshanenko,
> >
> >
> > Thanks for reporting and sharing, I'll kick some tests.
> >
> >
> > - Rohit
> >
> > 
> > From: Voloshanenko Igor 
> > Sent: Wednesday, December 20, 2017 6:29:46 AM
> > To: dev@cloudstack.apache.org
> > Subject: PrivateGateaway ACL rules blocker
> >
> > Hi all!
> >
> > Guys, can i please kindly ask you to review this issue and PR -
> > https://issues.apache.org/jira/browse/CLOUDSTACK-10200
> >
> > We have a lot of clients with PG and want to include this in 4.11 LTS as
> > without this small one-liner patch PrivateGateway component useless
> >
> > Tnx in advance!
> >
> > rohit.ya...@shapeblue.com
> > www.shapeblue.com
> > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapeblue
> >
> >
> >
> >
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>


Known trillian test failures

2017-12-20 Thread Marc-Aurèle Brothier
@rhtyd

Could something be done to avoid confusing people pushing PR to have
trillian test failures, which apparently are know to fail all the time or
often? I know it's hard to keep the tests in good shape and make them run
smoothly but I find it very disturbing and therefore I have to admit I'm
not paying attention to those outputs, sadly.

Skipping them adds the high risk of never getting fixed... I would hope
that someone having full access the the management & agent's logs could fix
them, since AFAIK they aren't available.

Cheers


Re: Adding Spellchecker to code style validator

2017-12-20 Thread Daan Hoogland
like the idea Ivan, I hope it won't be enforced though, and just a help.
Coders are notorious for using spelling to distinguish between instances.

On Wed, Dec 20, 2017 at 1:36 PM, Rafael Weingärtner <
rafaelweingart...@gmail.com> wrote:

> +1
>
> On Wed, Dec 20, 2017 at 10:29 AM, Rohit Yadav 
> wrote:
>
> > Hi Ivan,
> >
> >
> > Thanks for the PR, I think that would be a good idea. We can introduce
> > such a checker/task in Travis's first job that currently does some sanity
> > checks (rat+build+unit tests etc).
> >
> >
> > - Rohit
> >
> > 
> > From: Ivan Kudryavtsev 
> > Sent: Tuesday, December 19, 2017 12:48:04 PM
> > To: dev
> > Subject: Adding Spellchecker to code style validator
> >
> > Hello, devs.
> >
> > How about adding spell checking to code style guide. ACS uses a lot of
> java
> > introspection including JSON generation, etc. so typos migrate to
> protocol
> > level. Working on CLOUDSTACK-10168 I found ipv4_adress inside python
> code /
> > dhcp related json, trying to improve "the camp" I moved to java code and
> > found ipv4Adress private var which is used in gson serializer resulting
> to
> > the protocol with bad keywords. Might it be spellchecker is able to fix
> it.
> >
> > The same thing is for logging messages, I usually looking for "address"
> not
> > for "adress" so It's really impossible to find relevant message if typos
> > exist.
> >
> > I'm not java guru to add spellchecker by myself and it's a project policy
> > thing, so might be it's a thing which worth adoption?
> >
> >
> > --
> > With best regards, Ivan Kudryavtsev
> > Bitworks Software, Ltd.
> > Cell: +7-923-414-1515
> > WWW: http://bitworks.software/ 
> >
> > rohit.ya...@shapeblue.com
> > www.shapeblue.com
> > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapeblue
> >
> >
> >
> >
>
>
> --
> Rafael Weingärtner
>



-- 
Daan


Re: Call for participation: Issue triaging and PR review/testing

2017-12-20 Thread Rohit Yadav
Hi Ivan,


I took some time to reflect and get back to you:


I agree the freeze date may be a bit aggressive, based on past experiences that 
we've all seen the final release may take some time (weeks even). After the 
freeze, we can all work towards the stability, fix bugs for a stable and proper 
release.


A non-LTS branch such as 4.10 may not receive much usage and attention, 
therefore less reporting of bugs and fixes. For production users, an LTS 
release is recommended who are keen on stability than (new) features, the 
current one being the 4.9.x releases. Starting with 4.11 it's something we'll 
fix on the website i.e. to recommend most users to stay with LTS releases, give 
information about support dates for such releases etc. Like any software 
release, there will be bugs and it's not an issue as long as they get reported 
and fixed.


I'm glad to see your participation, PRs and will work with you and others for a 
stable release. Keep sending PRs, review other's PRs and help with testing and 
reporting of bugs!


- Rohit


From: Ivan Kudryavtsev 
Sent: Wednesday, December 13, 2017 9:34:43 AM
To: dev@cloudstack.apache.org
Cc: Rohit Yadav; us...@cloudstack.apache.org
Subject: Re: Call for participation: Issue triaging and PR review/testing

Hello, devs, users, Rohit. Have a good day.

Rohit, you intend to freeze 4.11 on 8 january and, frankly speaking, I see
risks here. A major risk is that 4.10 is too buggy and it seems nobody uses
it actually right now in production because it's unusable, unfortunately,
so we are planning to freeze 4.11 which stands on untested 4.10 with a lot
of lacks still undiscovered and not reported. I believe it's a very
dangerous way to release one more release with bad quality. Actually,
marvin and units don't cover regressions I meet in 4.10. Ok, let's take a
look at new one our engineers found today in 4.10:

It happens rarely for Domain Admin, probably other roles are affected. We
meet it once a week. It's for resource accounting. Domain admin resource
quota is 200GB of primary storage. During continouos creation of VMs and
removing them it leads to "Maximum number of resources of type
'primary_storage' for domain id=2 has been exceeded", however only 26 GB is
used actually.

I mean smoke tests run well, unit tests run well, but *nobody reported very
obvious bug* which should be met number of times if community use 4.10. I
suppose, there are a lot of undiscovered and unreported regressions in 4.10
and this is a huge risk to the 4.11 release quality and more we move that
way quality decreases. Unfortunately, I'm not able to propose a silver
bullet, but I suppose feature development speed should be decreased until
master is tested very thoroughly and might be 8 january is too early for
4.11 freeze.



2017-12-09 2:00 GMT+07:00 Wido den Hollander :

>
>
> On 12/08/2017 03:24 PM, Rohit Yadav wrote:
>
>> All,
>>
>> Given we've about a month left until 4.11/LTS freeze date of 8th Jan 2018
>> [1], I would like to call our community for active participation.
>>
>> Given a huge pile of open issues, please share on this thread a 'brief'
>> list of top bugs/issues that you would want to see fixed and are
>> applicable
>> to master branch, especially critical/blocker issues and regressions. I
>> would especially like to engage with the users of the community towards
>> this effort. I'll start weekly reporting of open issues by end of next
>> week.
>>
>> Developers - feel free to comment tagging Daan (@DaanHoogland), Paul
>> (@PaulAngus) and me (@rhtyd) in your pull requests if you're seeking
>> review
>> and testing (and merging) of your PR. I hope to work with you all with my
>> committer/contributor hat on.
>>
>> Finally, I look forward to all of your help and support towards the next
>> (LTS) release.
>>
>>
> Let's make it work!
>
> Thoughts, comments?
>>
>>
> With the holiday season coming up we might see things slow down a bit. But
> I think we already have enough PRs open for 4.11
>
> We should be able to get out a proper release again.
>
> Wido
>
>
> [1] http://markmail.org/message/mszlluye35acvn2j
>>
>> Regards,
>> Rohit Yadav
>> http://rohityadav.cloud | @rhtyd
>>
>>__?.o/   Apache CloudStack
>>   ()# The best of CloudStack is yet to come!
>> (___(_)   https://cloudstack.apache.org
>>
>>


--
With best regards, Ivan Kudryavtsev
Bitworks Software, Ltd.
Cell: +7-923-414-1515
WWW: http://bitworks.software/ 

rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



Re: Adding Spellchecker to code style validator

2017-12-20 Thread Rafael Weingärtner
+1

On Wed, Dec 20, 2017 at 10:29 AM, Rohit Yadav 
wrote:

> Hi Ivan,
>
>
> Thanks for the PR, I think that would be a good idea. We can introduce
> such a checker/task in Travis's first job that currently does some sanity
> checks (rat+build+unit tests etc).
>
>
> - Rohit
>
> 
> From: Ivan Kudryavtsev 
> Sent: Tuesday, December 19, 2017 12:48:04 PM
> To: dev
> Subject: Adding Spellchecker to code style validator
>
> Hello, devs.
>
> How about adding spell checking to code style guide. ACS uses a lot of java
> introspection including JSON generation, etc. so typos migrate to protocol
> level. Working on CLOUDSTACK-10168 I found ipv4_adress inside python code /
> dhcp related json, trying to improve "the camp" I moved to java code and
> found ipv4Adress private var which is used in gson serializer resulting to
> the protocol with bad keywords. Might it be spellchecker is able to fix it.
>
> The same thing is for logging messages, I usually looking for "address" not
> for "adress" so It's really impossible to find relevant message if typos
> exist.
>
> I'm not java guru to add spellchecker by myself and it's a project policy
> thing, so might be it's a thing which worth adoption?
>
>
> --
> With best regards, Ivan Kudryavtsev
> Bitworks Software, Ltd.
> Cell: +7-923-414-1515
> WWW: http://bitworks.software/ 
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>


-- 
Rafael Weingärtner


Re: 4.11 : Physical networking migration ; Config Drive support

2017-12-20 Thread Kris Sterckx
+1  we are rebasing our outstanding PR's on daily basis

Totally agree with reviewing other PRs as well; we are doing this already
and will increase those efforts in 2018.

Thanks

kris


On 20 December 2017 at 13:18, Rohit Yadav  wrote:

> Kris,
>
>
> With on-going PR review/merging, it is requested to engage in those PRs
> and fix merge conflicts that may happen again as other PRs merge.
>
>
> I would also like to encourage you and other Nuage folks to review other
> PRs and engage on outstanding review comments and questions.
>
>
> - Rohit
>
> 
> From: Kris Sterckx 
> Sent: Wednesday, December 20, 2017 2:45:14 AM
> To: dev@cloudstack.apache.org
> Subject: Re: 4.11 : Physical networking migration ; Config Drive support
>
> Also, both PRs have been (re)rebased today :
>
> Network migration : https://github.com/apache/cloudstack/pull/2259
> Config Drive : https://github.com/apache/cloudstack/pull/2097
>
> Hope to make these happen.
>
> Thanks,
>
> Kris
>
>
>
> On 19 December 2017 at 15:44, Kris Sterckx  >
> wrote:
>
> > Hi Wido.
> >
> > sorry for late reply.  Thanks, that sounds great.
> >
> > Yes we can move from SS to PS if we accept that once 4.12 is deployed
> with
> > updated code, in order to migrate a particular's VM iso, the VM needs to
> be
> > stopped & restarted. Then the iso will be recreated.
> >
> > My suggestion would be merging in Config drive support and declare as
> > experimental in 4.11 , then people (ops and devs) can use it for
> > experimental use and give feedback and we can further harden it in 4.12
> and
> > declare it as GA.
> > I think any other way won't work.
> >
> > Let me know your view.
> >
> > Thanks, Kris
> >
> >
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
> > On 14 December 2017 at 09:59, Wido den Hollander  wrote:
> >
> >>
> >>
> >> On 12/07/2017 03:17 PM, Kris Sterckx wrote:
> >>
> >>> Hi all,
> >>>
> >>>
> >>> I would like to raise attention to 2 features which are pending 4.11,
> >>> which
> >>> received great appreciation from some of you and which from Nuage
> >>> perspective we see great interest to from large operators also. They
> are
> >>> :
> >>>
> >>> - Physical networking migration, or : Migrate networks to a new
> >>> physical
> >>> network
> >>> Jira : https://issues.apache.org/jira/browse/CLOUDSTACK-10024
> >>> Design doc :
> >>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Physi
> >>> cal+Networking+Migration
> >>> PR : https://github.com/apache/cloudstack/pull/2259
> >>> - Config Drive support :
> >>>
> >>> Jira : https://issues.apache.org/jira/browse/CLOUDSTACK-9813
> >>> Design doc : https://issues.apache.org/jira/browse/CLOUDSTACK-9813
> >>> PR : https://github.com/apache/cloudstack/pull/2097
> >>>
> >>>
> >>> I would like to emphasis first that both features are generic
> solutions,
> >>> not tight to any SDN solution whatsoever.
> >>>
> >>>
> >>> Regarding network migration support : to anyone interested in this
> >>> capability, please review the design doc and/or the PR. The more buy-in
> >>> we
> >>> get to this large amount of work we spent to this, the more appraisal
> you
> >>> give to the guys who spent the work. We hope to get this PR merged in
> >>> before end of the year.
> >>>
> >>>
> >>> Regarding config drive support : as already discussed in the Miami
> >>> Collaboration conference in May, this PR gives you fully standard
> >>> cloud-init support using the OpenStack configdrive datasource provider
> >>> and
> >>> it works like a charm. We received very positive feedback to this.
> There
> >>> has been discussion about the storage strategy though and people voted
> >>> for
> >>> storing the iso's at primary storage rather than at secondary storage.
> >>> There has been further work spent on this by Frank Maximus and there
> has
> >>> been some review of that but i am not sure we still want to include
> that
> >>> change still in 4.11. Frankly my proposal would be we take an
> incremental
> >>> approach and start larger validation cycles for the outstanding PR and
> >>> approach further evolution of it in the next release cycle. Open to
> your
> >>> feedback.  In any case I would find it a missed opportunity if config
> >>> drive
> >>> would eventually not find its way into 4.11 at all.  Again, we see
> great
> >>> interest in this.  Potentially we could declare it as experimental
> >>> support,
> >>> to be enabled via an advanced setting - just an idea. In that way more
> >>> people get can a sense of how it works and provide feedback. Another
> item
> >>> to notice is that as Nuage team does not have XenServer deployments, we
> >>> have no facility to verify the PR with XenServer. Helping out on that
> >>> would
> >>> be appreciated also.
> >>>
> >>>
> >> We could do it in two steps indeed. Move to PS in a later stage (if
> >> that's doable?).
> >>
> >> I would love to take a 

Re: Bug in ViewResponseHelper.java of 4627fb2

2017-12-20 Thread Rohit Yadav
Hi Mike,


Yes, please send a PR!


- Rohit


From: Tutkowski, Mike 
Sent: Tuesday, December 19, 2017 2:08:42 AM
To: dev@cloudstack.apache.org
Subject: Bug in ViewResponseHelper.java of 4627fb2

Hi,

I noticed an issue today with a fairly recent commit: 4627fb2.

In ViewResponseHelper.java, a NullPointerException can be thrown when 
interacting with a data disk on VMware because the disk chain value in 
cloud.volumes can have a value of NULL.

I can put in a check for NULL and avoid the NullPointerException, but perhaps 
someone knows the history of why this particular field is used in this case and 
can fill me in.

Thanks!
Mike

rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



Re: Adding Spellchecker to code style validator

2017-12-20 Thread Rohit Yadav
Hi Ivan,


Thanks for the PR, I think that would be a good idea. We can introduce such a 
checker/task in Travis's first job that currently does some sanity checks 
(rat+build+unit tests etc).


- Rohit


From: Ivan Kudryavtsev 
Sent: Tuesday, December 19, 2017 12:48:04 PM
To: dev
Subject: Adding Spellchecker to code style validator

Hello, devs.

How about adding spell checking to code style guide. ACS uses a lot of java
introspection including JSON generation, etc. so typos migrate to protocol
level. Working on CLOUDSTACK-10168 I found ipv4_adress inside python code /
dhcp related json, trying to improve "the camp" I moved to java code and
found ipv4Adress private var which is used in gson serializer resulting to
the protocol with bad keywords. Might it be spellchecker is able to fix it.

The same thing is for logging messages, I usually looking for "address" not
for "adress" so It's really impossible to find relevant message if typos
exist.

I'm not java guru to add spellchecker by myself and it's a project policy
thing, so might be it's a thing which worth adoption?


--
With best regards, Ivan Kudryavtsev
Bitworks Software, Ltd.
Cell: +7-923-414-1515
WWW: http://bitworks.software/ 

rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



Re: [DISCUSS] Management server (pre-)shutdown to avoid killing jobs

2017-12-20 Thread Rohit Yadav
Hi Marc,


I like the idea, I guess a locking-service was needed in CloudStack to no only 
solve the issue of locking and getting rid of DB-based lock (which I suppose if 
we can get rid of, may help people migrate to mysql-clusters with active-active 
setup which cannot be used due to LOCK usage), but also fix the issue of claim 
and ownership (i.e which management server owns which resource such as hosts, 
vms, volumes etc).


To retain CloudStack as a turnkey/standalone solution embedded-ZooKeeper may be 
used for this purpose and the new CA framework if applicable could be used to 
secure a cluster of mgmt server running the ZK plugin/services. This will also 
require refactoring of the job manager/service layer to be locking-service 
aware. I guess a general and pluggable locking service manager could be 
implemented for this purpose that supports plugins with a default plugin that 
is (embedded) ZK based.


With the agent-management server model, CloudStack agents such as KVM, SSVM and 
CPVM agents currently only have a single mgmt server 'host' IP that it connects 
to. With the introduction of the CA framework I had tried to change this to a 
list of host/IPs that it tries to connect to on disconnection (say a mgmt 
server shutdown) and as mentioned there is PR 2309 that further 
improves/introduces a way of balancing. To solve the issue of balancing 
(claim+ownership) of the agents using a locking-service such as ZK, across the 
cluster of management servers we may need a locking service/manager that can 
help. It can trigger events such as rebalancing of tasks. We may also explore 
use Gossip and other ways of discovery propagation and rebalancing of agents 
with the new locking-service/manager. I'm excited to see your attempt at 
solving the problem.


- Rohit


From: Marc-Aurèle Brothier 
Sent: Monday, December 18, 2017 7:26:21 PM
To: dev@cloudstack.apache.org
Subject: [DISCUSS] Management server (pre-)shutdown to avoid killing jobs

Hi everyone,

Another point, another thread. Currently when shutting down a management
server, despite all the "stop()" method not being called as far as I know,
the server could be in the middle of processing an async job task. It will
lead to a failed job since the response won't be delivered to the correct
management server even though the job might have succeed on the agent. To
overcome this limitation due to our weekly production upgrades, we added a
pre-shutdown mechanism which works along side HA-proxy. The management
server keeps a eye onto a file "lb-agent" in which some keywords can be
written following the HA proxy guide (
https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#5.2-agent-check).
When it finds "maint", "stopped" or "drain", it stops those threads:
 - AsyncJobManager._heartbeatScheduler: responsible to fetch and start
execution of AsyncJobs
 - AlertManagerImpl._timer: responsible to send capacity check commands
 - StatsCollector._executor: responsible to schedule stats command

Then the management server stops most of its scheduled tasks. The correct
thing to do before shutting down the server would be to send
"rebalance/reconnect" commands to all agents connected on that management
server to ensure that commands won't go through this server at all.

Here, HA-proxy is responsible to stop sending API requests to the
corresponding server with the help of this local agent check.

In case you want to cancel the maintenance shutdown, you could write
"up/ready" in the file and the different schedulers will be restarted.

This is really more a change for operation around CS for people doing live
upgrade on a regular basis, so I'm unsure if the community would want such
a change in the code base. It goes a bit in the opposite direction of the
change for removing the need of HA-proxy
https://github.com/apache/cloudstack/pull/2309

If there is enough positive feedback for such a change, I will port them to
match with the upstream branch in a PR.

Kind regards,
Marc-Aurèle

rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



Re: 4.11 : Physical networking migration ; Config Drive support

2017-12-20 Thread Rohit Yadav
Kris,


With on-going PR review/merging, it is requested to engage in those PRs and fix 
merge conflicts that may happen again as other PRs merge.


I would also like to encourage you and other Nuage folks to review other PRs 
and engage on outstanding review comments and questions.


- Rohit


From: Kris Sterckx 
Sent: Wednesday, December 20, 2017 2:45:14 AM
To: dev@cloudstack.apache.org
Subject: Re: 4.11 : Physical networking migration ; Config Drive support

Also, both PRs have been (re)rebased today :

Network migration : https://github.com/apache/cloudstack/pull/2259
Config Drive : https://github.com/apache/cloudstack/pull/2097

Hope to make these happen.

Thanks,

Kris



On 19 December 2017 at 15:44, Kris Sterckx 
wrote:

> Hi Wido.
>
> sorry for late reply.  Thanks, that sounds great.
>
> Yes we can move from SS to PS if we accept that once 4.12 is deployed with
> updated code, in order to migrate a particular's VM iso, the VM needs to be
> stopped & restarted. Then the iso will be recreated.
>
> My suggestion would be merging in Config drive support and declare as
> experimental in 4.11 , then people (ops and devs) can use it for
> experimental use and give feedback and we can further harden it in 4.12 and
> declare it as GA.
> I think any other way won't work.
>
> Let me know your view.
>
> Thanks, Kris
>
>

rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

> On 14 December 2017 at 09:59, Wido den Hollander  wrote:
>
>>
>>
>> On 12/07/2017 03:17 PM, Kris Sterckx wrote:
>>
>>> Hi all,
>>>
>>>
>>> I would like to raise attention to 2 features which are pending 4.11,
>>> which
>>> received great appreciation from some of you and which from Nuage
>>> perspective we see great interest to from large operators also. They are
>>> :
>>>
>>> - Physical networking migration, or : Migrate networks to a new
>>> physical
>>> network
>>> Jira : https://issues.apache.org/jira/browse/CLOUDSTACK-10024
>>> Design doc :
>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Physi
>>> cal+Networking+Migration
>>> PR : https://github.com/apache/cloudstack/pull/2259
>>> - Config Drive support :
>>>
>>> Jira : https://issues.apache.org/jira/browse/CLOUDSTACK-9813
>>> Design doc : https://issues.apache.org/jira/browse/CLOUDSTACK-9813
>>> PR : https://github.com/apache/cloudstack/pull/2097
>>>
>>>
>>> I would like to emphasis first that both features are generic solutions,
>>> not tight to any SDN solution whatsoever.
>>>
>>>
>>> Regarding network migration support : to anyone interested in this
>>> capability, please review the design doc and/or the PR. The more buy-in
>>> we
>>> get to this large amount of work we spent to this, the more appraisal you
>>> give to the guys who spent the work. We hope to get this PR merged in
>>> before end of the year.
>>>
>>>
>>> Regarding config drive support : as already discussed in the Miami
>>> Collaboration conference in May, this PR gives you fully standard
>>> cloud-init support using the OpenStack configdrive datasource provider
>>> and
>>> it works like a charm. We received very positive feedback to this. There
>>> has been discussion about the storage strategy though and people voted
>>> for
>>> storing the iso's at primary storage rather than at secondary storage.
>>> There has been further work spent on this by Frank Maximus and there has
>>> been some review of that but i am not sure we still want to include that
>>> change still in 4.11. Frankly my proposal would be we take an incremental
>>> approach and start larger validation cycles for the outstanding PR and
>>> approach further evolution of it in the next release cycle. Open to your
>>> feedback.  In any case I would find it a missed opportunity if config
>>> drive
>>> would eventually not find its way into 4.11 at all.  Again, we see great
>>> interest in this.  Potentially we could declare it as experimental
>>> support,
>>> to be enabled via an advanced setting - just an idea. In that way more
>>> people get can a sense of how it works and provide feedback. Another item
>>> to notice is that as Nuage team does not have XenServer deployments, we
>>> have no facility to verify the PR with XenServer. Helping out on that
>>> would
>>> be appreciated also.
>>>
>>>
>> We could do it in two steps indeed. Move to PS in a later stage (if
>> that's doable?).
>>
>> I would love to take a look at this as I'd love to see this feature in
>> 4.11, but I'm very low on time to actually look at it.
>>
>> Wido
>>
>>
>>>
>>> Thanks,
>>>
>>>
>>> Kris
>>>
>>>
>


Re: PrivateGateaway ACL rules blocker

2017-12-20 Thread Rohit Yadav
Sure, I've kicked some tests. Will merge when tests pass and we've some 
review/feedback from others.


-Rohit


From: Voloshanenko Igor 
Sent: Wednesday, December 20, 2017 5:33:10 PM
To: dev@cloudstack.apache.org
Subject: Re: PrivateGateaway ACL rules blocker

Tnx a lot Rohit!

As we only handle a special case - I think all tests will pass without any
issues. As I can't imagine that somebody assumed inside test to check if
buggy condition passed :D

2017-12-20 13:02 GMT+02:00 Rohit Yadav :

> Hi Voloshanenko,
>
>
> Thanks for reporting and sharing, I'll kick some tests.
>
>
> - Rohit
>
> 
> From: Voloshanenko Igor 
> Sent: Wednesday, December 20, 2017 6:29:46 AM
> To: dev@cloudstack.apache.org
> Subject: PrivateGateaway ACL rules blocker
>
> Hi all!
>
> Guys, can i please kindly ask you to review this issue and PR -
> https://issues.apache.org/jira/browse/CLOUDSTACK-10200
>
> We have a lot of clients with PG and want to include this in 4.11 LTS as
> without this small one-liner patch PrivateGateway component useless
>
> Tnx in advance!
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>

rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



Re: PrivateGateaway ACL rules blocker

2017-12-20 Thread Voloshanenko Igor
Tnx a lot Rohit!

As we only handle a special case - I think all tests will pass without any
issues. As I can't imagine that somebody assumed inside test to check if
buggy condition passed :D

2017-12-20 13:02 GMT+02:00 Rohit Yadav :

> Hi Voloshanenko,
>
>
> Thanks for reporting and sharing, I'll kick some tests.
>
>
> - Rohit
>
> 
> From: Voloshanenko Igor 
> Sent: Wednesday, December 20, 2017 6:29:46 AM
> To: dev@cloudstack.apache.org
> Subject: PrivateGateaway ACL rules blocker
>
> Hi all!
>
> Guys, can i please kindly ask you to review this issue and PR -
> https://issues.apache.org/jira/browse/CLOUDSTACK-10200
>
> We have a lot of clients with PG and want to include this in 4.11 LTS as
> without this small one-liner patch PrivateGateway component useless
>
> Tnx in advance!
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>


Re: PrivateGateaway ACL rules blocker

2017-12-20 Thread Rohit Yadav
Hi Voloshanenko,


Thanks for reporting and sharing, I'll kick some tests.


- Rohit


From: Voloshanenko Igor 
Sent: Wednesday, December 20, 2017 6:29:46 AM
To: dev@cloudstack.apache.org
Subject: PrivateGateaway ACL rules blocker

Hi all!

Guys, can i please kindly ask you to review this issue and PR -
https://issues.apache.org/jira/browse/CLOUDSTACK-10200

We have a lot of clients with PG and want to include this in 4.11 LTS as
without this small one-liner patch PrivateGateway component useless

Tnx in advance!

rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue