Re: [Gluster-infra] [Gluster-devel] Reduce regression runs wait time - New gerrit/review work flow

2015-06-17 Thread Kaushal M
On 16-Jun-2015 19:22, "Aravinda"  wrote:

> +1 for running regressions on need basis.
>
> Also we need to make the tests more intelligent so that same tests runs
> differently when triggered nightly/regular.
>
> function test_some_functionality
> {
> if env.NIGHTLY {
> # run nightly tests
> # time consuming tests
> # test with more data
> run_nightly_tests();
> }
> # Regular basic tests
> run_basic_tests();
> }
>
> TEST test_some_functionality;
>
> This way we can maintain all tests in single place. Set env/config
> variable as NIGHTLY whenever we need to run nightly tests.
>
>
I've discussed this idea with different people before. But the major
concern was how do we identify that minimal set of basic tests that would
do a good job of identifying most regressions. Considering that regression
suite will keep growing and will take longer to complete in the future,
nightly full regression runs would be nice to have.

--
> regards
> Aravinda
>
>
>
> On 06/15/2015 06:49 AM, Kaushal M wrote:
>
>> Hi all,
>>
>> The recent rush of reviews being sent due to the release of 3.7 was a
>> cause of frustration for many of us because of the regression tests
>> (gerrit troubles themselves are another thing).
>>
>> W.R.T regression 3 main sources of frustration were,
>> 1. Spurious test failures
>> 2. Long wait times
>> 3. Regression slave troubles
>>
>> We've already tackled the spurious failure issue and are quite stable
>> now. The trouble with the slave vms is related to the gerrit issues,
>> and is mainly due to the network issues we are having between the
>> data-centers hosting the slaves and gerrit/jenkins. People have been
>> looking into this, but we haven't had much success. This leaves the
>> issue of the long wait times.
>>
>> The long wait times are because of the long queues of pending jobs,
>> some of which take days to get scheduled. Two things cause the long
>> queues,
>> 1. Automatic regression job triggering for all submissions to gerrit
>> 2. Long run time for regression (~2h)
>>
>> The long queues coupled with the spurious failure and network
>> problems, meant that jobs would fail for no reason after a long wait,
>> and would have to be added to the back of the queue to be re-run. This
>> meant that developers would have to wait days for their changes to get
>> merged, and was one of the causes for the delay in the release of 3.7.
>>
>> The solution reduce wait times for regression runs. To reduce wait
>> times we should,
>> 1. Trigger runs only when required
>> 2. Reduce regression run time.
>>
>> Raghavendra Talur (rtalur/RaSTar) will soon send out a mail with his
>> findings on the regression run times, and we can continue discussion
>> on it on that thread.
>>
>> Earlier, the regression runs used to be manually triggered by the
>> maintainers once they had determined that a change was ready for
>> submission. But as there were only two maintainers before (Vijay and
>> Avati) auto triggering was brought in to reduce their load. Auto
>> triggering worked fine when we had a lower volume of changes being
>> submitted to gerrit. But now, with the large volumes we see during the
>> release freeze dates, auto triggering just adds to problems.
>>
>> I propose that we move back to the old model of starting regression
>> runs only once the maintainers are ready to merge. But instead of the
>> maintainers manually tiggering the runs, we could automate it.
>>
>> We can model our new workflow on those of OpenStack[1] and
>> Wikimedia[2]. The existing Gerrit plugin for Jenkins doesn't provide
>> the features necessary to enable selective triggering based on Gerrit
>> flags. Both OpenStack and Wikimedia use a project gating tool called
>> Zuul[3], which provides a much better integration with Jenkins and
>> Gerrit and more features on top.
>>
>> I propose the following work flow,
>>
>> - Developer pushes change to Gerrit.
>>- Zuul is notified by Gerrit of new change
>> - Zuul runs pre-review checks on Jenkins. This will be the current smoke
>> tests.
>>- Zuul reports back status of the checks to Gerrit.
>>  - If checks fail, developer will need to resend the change after
>> the required fixes. The process starts once more.
>>  - If the checks pass, the change is now ready for review
>> - The change is now reviewed by other developers and maintainers.
>> Non-maintainers will be able to give only a +1 review.
>>- On a negative review, the developer will need to rework the change
>> and resend it. The process starts once more.
>> - The maintainer give a +2 review once he/she is satisfied. The
>> maintainers work is done here.
>>- Zuul is notified of the +2 review
>> - Zuul runs the regression runs and reports back the status.
>>- If the regression runs fail, the process starts over again.
>>- If the runs pass, the change is ready for acceptance.
>> - Zuul will pick the change into the repository.
>>- If the pick fails, Zuul will re

Re: [Gluster-infra] Fedora 19 VM's in Rackspace

2015-06-17 Thread Justin Clift
Sounds good. :)

On 17 Jun 2015, at 17:58, Thiago da Silva  wrote:
> Hello,
> I just created two VMs in Rackspace to replace the VMs mentioned below.
> I created:
> gluster-swift-f22-1
> gluster-swift-el7-1
> 
> These VMs will be used for libgfapi-python and swift-on-file builds.
> Once we have the Jenkins builds setup correctly to build from these new
> VMs, I will delete the older VMs.
> 
> Please let me know if there are any issues with plan.
> 
> Thanks,
> 
> Thiago

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Justin Clift
On 17 Jun 2015, at 20:14, Niels de Vos  wrote:
> On Wed, Jun 17, 2015 at 03:14:31PM +0200, Michael Scherer wrote:
>> Le mercredi 17 juin 2015 à 11:58 +0100, Justin Clift a écrit :
>>> On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
 Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
>> Venky Shankar  wrote:
>> 
>>> If that's the case, then I'll vote for this even if it takes some time
>>> to get things in workable state.
>> 
>> See my other mail about this: you enter a new slave VM in the DNS and it
>> does not resolve, or somethimes you get 20s delays. I am convinced this
>> is the reason why Jenkins bugs.
> 
> But cloud.gluster.org is handled by rackspace, not sure how much control
> we have for it ( not sure even where to start there ).
 
 So I cannot change the DNS destination.
 
 What I can do is to create a new dns zone, and then, we can delegate as
 we want. And migrate some slaves and not others, and see how it goes ?
 
 slaves.gluster.org would be ok for everybody ?
>>> 
>>> Try it out, and see if it works. :)
>>> 
>>> On the "scaling the infrastructure" side of things, are the two OSAS servers
>>> for Gluster still available?
>> 
>> They are online.
>> $ ssh r...@ci.gluster.org uptime
>> 09:13:37 up 33 days, 16:34,  0 users,  load average: 0,00, 0,01, 0,05
> 
> Can it run some Jenkins Slave VMs too?

There are two boxes.  A pretty beefy one for running Jenkins slave VM's 
(probably
about 40 VM's simultaneously), and a slightly less beefy one for running
Jenkins/Gerrit/whatever.

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Fedora 19 VM's in Rackspace

2015-06-17 Thread Thiago da Silva
Hello,
I just created two VMs in Rackspace to replace the VMs mentioned below.
I created:
gluster-swift-f22-1
gluster-swift-el7-1

These VMs will be used for libgfapi-python and swift-on-file builds.
Once we have the Jenkins builds setup correctly to build from these new
VMs, I will delete the older VMs.

Please let me know if there are any issues with plan.

Thanks,

Thiago

On Mon, 2015-04-20 at 19:48 +0100, Justin Clift wrote:
> Either way is good.  Completely up to do. :)
> 
> + Justin
> 
> 
> On 20 Apr 2015, at 18:24, Thiago da Silva  wrote:
> > Hi Justin,
> > We are currently using these VMs so they should not be deleted yet, 
> > but
> > I agree we need to upgrade them to newer OS versions. Probably the 
> > same
> > is true about our CentOS machine. What's better? Can we just create 
> > two
> > new VMs and then nuke these once we are done setting up the new 
> > ones or
> > upgrade these machines to Fedora 21?
> > 
> > Thanks,
> > 
> > Thiago
> > 
> > On Sun, 2015-04-19 at 10:44 +0100, Justin Clift wrote:
> > > Hi Thiago,
> > > 
> > > Can we nuke these two VM's in Rackspace?
> > > 
> > >  * g4s-rackspace-f19-1
> > >  * g4s-rackspace-f19-3
> > > 
> > > They're running Fedora 19, which is no longer receiving any kind
> > > of package updates.  So... they'll become a security problem at
> > > some point, if they're not already.
> > > 
> > > ?
> > > 
> > > Regards and best wishes,
> > > 
> > > Justin Clift
> > > 
> > > --
> > > GlusterFS - http://www.gluster.org
> > > 
> > > An open source, distributed file system scaling to several
> > > petabytes, and handling thousands of clients.
> > > 
> > > My personal twitter: twitter.com/realjustinclift
> > > 
> > 
> > 
> 
> --
> GlusterFS - http://www.gluster.org
> 
> An open source, distributed file system scaling to several
> petabytes, and handling thousands of clients.
> 
> My personal twitter: twitter.com/realjustinclift
> 
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Vijay Bellur

On Wednesday 17 June 2015 04:23 PM, Niels de Vos wrote:

On Wed, Jun 17, 2015 at 09:56:32PM +0200, Emmanuel Dreyfus wrote:

Niels de Vos  wrote:


Maybe, but I hope those issues stay masked when resolving the hostnames
is more stable. When we have the other servers up and running, we would
have a better understanding and options to investigate issues like this.


But Jenkins is still unable to launch an agent on e.g. nbslave75.
Perhaps it needs to be restarted?


Yes, a Jenkins restart might be good. But, I do not know how it gets
stopped safely, or started.



The only downside of a Jenkins restart is that we would need to manually 
re-trigger all existing jobs.


Shall we just do that now?

-Vijay

___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Niels de Vos
On Wed, Jun 17, 2015 at 09:56:32PM +0200, Emmanuel Dreyfus wrote:
> Niels de Vos  wrote:
> 
> > Maybe, but I hope those issues stay masked when resolving the hostnames
> > is more stable. When we have the other servers up and running, we would
> > have a better understanding and options to investigate issues like this.
> 
> But Jenkins is still unable to launch an agent on e.g. nbslave75.
> Perhaps it needs to be restarted?

Yes, a Jenkins restart might be good. But, I do not know how it gets
stopped safely, or started.

Niels
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Reduce regression runs wait time - New gerrit/review work flow

2015-06-17 Thread Niels de Vos
On Mon, Jun 15, 2015 at 04:19:14PM +0530, Kaushal M wrote:
> Hi all,
...
> I propose that we move back to the old model of starting regression
> runs only once the maintainers are ready to merge. But instead of the
> maintainers manually tiggering the runs, we could automate it.

I think auto triggering regression tests is good. We should ask the
developers to run regression tests before posting complex changes. If
the parallelisation of regression tests is done, the wait time should
reduce too.

As a maintainer that spends quite some time reviewing patches, I prefer
to see a +1 verified before I start to review something. With that, I at
least have some confidence that there are no obvious mistakes I need to
point out. If developers have to wait on me before regression testing
gets started, I feel more like a block on road than helping them. There
really are *many* patches that get a FAILED result where there is a
problem in the code. Developers should get a response about that as soon
as possible, and waiting for a maintainer to start the regression tests
does not help.

I also had to ask maintainers for triggering regression tests for my
first patches, it is not a nice experience. Anything we can do to
improve the experience for (new) developers should be done, delaying
(auotmated) feedback isnt a step in the right direction.

> We can model our new workflow on those of OpenStack[1] and
> Wikimedia[2]. The existing Gerrit plugin for Jenkins doesn't provide
> the features necessary to enable selective triggering based on Gerrit
> flags. Both OpenStack and Wikimedia use a project gating tool called
> Zuul[3], which provides a much better integration with Jenkins and
> Gerrit and more features on top.

More intelligent triggering would be helpful. Unfortunately we have a
stack of xlators and it is difficult to say if there are unintended
side-effects in different, untouched pieces of the code.


> I propose the following work flow,
> 
> - Developer pushes change to Gerrit.
>   - Zuul is notified by Gerrit of new change
> - Zuul runs pre-review checks on Jenkins. This will be the current smoke 
> tests.
>   - Zuul reports back status of the checks to Gerrit.
> - If checks fail, developer will need to resend the change after
> the required fixes. The process starts once more.
> - If the checks pass, the change is now ready for review
> - The change is now reviewed by other developers and maintainers.
> Non-maintainers will be able to give only a +1 review.
>   - On a negative review, the developer will need to rework the change
> and resend it. The process starts once more.
> - The maintainer give a +2 review once he/she is satisfied. The
> maintainers work is done here.
>   - Zuul is notified of the +2 review
> - Zuul runs the regression runs and reports back the status.
>   - If the regression runs fail, the process starts over again.
>   - If the runs pass, the change is ready for acceptance.
> - Zuul will pick the change into the repository.
>   - If the pick fails, Zuul will report back the failure, and the
> process starts once again.

It would be nice if Zuul, in its last step, can pick the change on top
of the latest HEAD, run the build/smoke test again, and only push the
change when all is OK. We have seen patch/merge races where a
function/define was changed, and an other patch used that
function/define. These caused much issues when the branch failed to
compile. Being able to prevent that would be very good.


> Following this flow should,
> 1. Reduce regression wait time

"wait time" for what or who? The merging of the patch would still only
happen after all tests are done. If something fails the last test, more
people (reviewers and maintainer) need to spend additional time.

> 2. Improve change acceptance time
> 3. Reduce unnecessary  wastage of infra resources

We could, and should optimize that in our parallel testing and educating
develpers to only re-run regressions when needed. Splitting up the
regression tests also makes it possible to only re-run a small part of
the tests.

> 4. Improve infra stability.

Not sure if adding an other component and (complex?) configuration adds
to "Improve infra stability". It would be nice to have a very minimal
set of tools, and many people understanding them. With the current
Gerrit and Jenkins configuration we have, we seem to be already very
limited on people that can investigate issues.


> It also brings in drawbacks that we need to maintain one other piece
> of infra (Zuul). This would be an additional maintenance overhead on
> top of Gerrit, Jenkins and the current slaves. But I feel the
> reduction in the upkeep efforts of the slaves would be enough to
> offset this.
> 
> tl;dr
> Current auto-triggering of regression runs is stupid and a waste of
> time and resources. Bring in a project gating system, Zuul, which can
> do a much more intelligent jobs triggering, and use it to
> automatically trigger regression only for changes with Reviewe

Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
Niels de Vos  wrote:

> Maybe, but I hope those issues stay masked when resolving the hostnames
> is more stable. When we have the other servers up and running, we would
> have a better understanding and options to investigate issues like this.

But Jenkins is still unable to launch an agent on e.g. nbslave75.
Perhaps it needs to be restarted?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Niels de Vos
On Wed, Jun 17, 2015 at 03:14:31PM +0200, Michael Scherer wrote:
> Le mercredi 17 juin 2015 à 11:58 +0100, Justin Clift a écrit :
> > On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
> > > Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
> > >> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
> > >>> Venky Shankar  wrote:
> > >>> 
> >  If that's the case, then I'll vote for this even if it takes some time
> >  to get things in workable state.
> > >>> 
> > >>> See my other mail about this: you enter a new slave VM in the DNS and it
> > >>> does not resolve, or somethimes you get 20s delays. I am convinced this
> > >>> is the reason why Jenkins bugs.
> > >> 
> > >> But cloud.gluster.org is handled by rackspace, not sure how much control
> > >> we have for it ( not sure even where to start there ).
> > > 
> > > So I cannot change the DNS destination.
> > > 
> > > What I can do is to create a new dns zone, and then, we can delegate as
> > > we want. And migrate some slaves and not others, and see how it goes ?
> > > 
> > > slaves.gluster.org would be ok for everybody ?
> > 
> > Try it out, and see if it works. :)
> > 
> > On the "scaling the infrastructure" side of things, are the two OSAS servers
> > for Gluster still available?
> 
> They are online.
> $ ssh r...@ci.gluster.org uptime
>  09:13:37 up 33 days, 16:34,  0 users,  load average: 0,00, 0,01, 0,05

Can it run some Jenkins Slave VMs too?

Thanks,
Niels
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Niels de Vos
On Wed, Jun 17, 2015 at 11:48:46AM +0200, Michael Scherer wrote:
> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
> > Venky Shankar  wrote:
> > 
> > > If that's the case, then I'll vote for this even if it takes some time
> > > to get things in workable state.
> > 
> > See my other mail about this: you enter a new slave VM in the DNS and it
> > does not resolve, or somethimes you get 20s delays. I am convinced this
> > is the reason why Jenkins bugs.
> 
> But cloud.gluster.org is handled by rackspace, not sure how much control
> we have for it ( not sure even where to start there ).

On build.gluster.org there now is a /usr/local/bin/get-hosts.py script
(needs to be executed through sude). This pulls down the DNS records
from our cloud.gluster.org domain in Rackspace and proves a /etc/hosts
formatted output.

/etc/hosts on build.gluster.org contains all the current entries. We
could automatically update it with a cron job or something, if needed.
New VMs should get added to /etc/hosts too, either manually or by
executing the script (sudo vim /etc/hosts, :r!/usr/local/bin/get-hosts.py).

> And I think the DNS issues are just a symptom of a bigger network issue,
> having local DNS might just mask the problem and which would then be non
> DNS related ( like tcp connexion not working ).

Maybe, but I hope those issues stay masked when resolving the hostnames
is more stable. When we have the other servers up and running, we would
have a better understanding and options to investigate issues like this.

HTH,
Niels
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Niels de Vos
On Wed, Jun 17, 2015 at 12:13:46PM +, Emmanuel Dreyfus wrote:
> On Wed, Jun 17, 2015 at 07:44:14AM -0400, Vijay Bellur wrote:
> > Do we still have the NFS crash that was causing tests to hang?
> 
> Do we still have it on rebased patchsets?

Yes, the fixes depend on the refcounting change which does not seem as
trivial as I hoped. http://review.gluster.org/11022 for the interested.

http://review.gluster.org/11023 is the fix that should solve the
segfaults in the NFS-server.

Niels
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Reduce regression runs wait time - New gerrit/review work flow

2015-06-17 Thread Michael Scherer
Le lundi 15 juin 2015 à 16:19 +0530, Kaushal M a écrit :
> Hi all,
> 
> The recent rush of reviews being sent due to the release of 3.7 was a
> cause of frustration for many of us because of the regression tests
> (gerrit troubles themselves are another thing).
> 
> W.R.T regression 3 main sources of frustration were,
> 1. Spurious test failures
> 2. Long wait times
> 3. Regression slave troubles
> 
> We've already tackled the spurious failure issue and are quite stable
> now. The trouble with the slave vms is related to the gerrit issues,
> and is mainly due to the network issues we are having between the
> data-centers hosting the slaves and gerrit/jenkins. People have been
> looking into this, but we haven't had much success. This leaves the
> issue of the long wait times.
> 
> The long wait times are because of the long queues of pending jobs,
> some of which take days to get scheduled. Two things cause the long
> queues,
> 1. Automatic regression job triggering for all submissions to gerrit
> 2. Long run time for regression (~2h)
> 
> The long queues coupled with the spurious failure and network
> problems, meant that jobs would fail for no reason after a long wait,
> and would have to be added to the back of the queue to be re-run. This
> meant that developers would have to wait days for their changes to get
> merged, and was one of the causes for the delay in the release of 3.7.
> 
> The solution reduce wait times for regression runs. To reduce wait
> times we should,
> 1. Trigger runs only when required
> 2. Reduce regression run time.
> 
> Raghavendra Talur (rtalur/RaSTar) will soon send out a mail with his
> findings on the regression run times, and we can continue discussion
> on it on that thread.
> 
> Earlier, the regression runs used to be manually triggered by the
> maintainers once they had determined that a change was ready for
> submission. But as there were only two maintainers before (Vijay and
> Avati) auto triggering was brought in to reduce their load. Auto
> triggering worked fine when we had a lower volume of changes being
> submitted to gerrit. But now, with the large volumes we see during the
> release freeze dates, auto triggering just adds to problems.
> 
> I propose that we move back to the old model of starting regression
> runs only once the maintainers are ready to merge. But instead of the
> maintainers manually tiggering the runs, we could automate it.
> 
> We can model our new workflow on those of OpenStack[1] and
> Wikimedia[2]. The existing Gerrit plugin for Jenkins doesn't provide
> the features necessary to enable selective triggering based on Gerrit
> flags. Both OpenStack and Wikimedia use a project gating tool called
> Zuul[3], which provides a much better integration with Jenkins and
> Gerrit and more features on top.
> 
> I propose the following work flow,
> 
> - Developer pushes change to Gerrit.
>   - Zuul is notified by Gerrit of new change
> - Zuul runs pre-review checks on Jenkins. This will be the current smoke 
> tests.
>   - Zuul reports back status of the checks to Gerrit.
> - If checks fail, developer will need to resend the change after
> the required fixes. The process starts once more.
> - If the checks pass, the change is now ready for review
> - The change is now reviewed by other developers and maintainers.
> Non-maintainers will be able to give only a +1 review.
>   - On a negative review, the developer will need to rework the change
> and resend it. The process starts once more.
> - The maintainer give a +2 review once he/she is satisfied. The
> maintainers work is done here.
>   - Zuul is notified of the +2 review
> - Zuul runs the regression runs and reports back the status.
>   - If the regression runs fail, the process starts over again.
>   - If the runs pass, the change is ready for acceptance.
> - Zuul will pick the change into the repository.
>   - If the pick fails, Zuul will report back the failure, and the
> process starts once again.
> 
> Following this flow should,
> 1. Reduce regression wait time
> 2. Improve change acceptance time
> 3. Reduce unnecessary  wastage of infra resources
> 4. Improve infra stability.
> 
> It also brings in drawbacks that we need to maintain one other piece
> of infra (Zuul). This would be an additional maintenance overhead on
> top of Gerrit, Jenkins and the current slaves. But I feel the
> reduction in the upkeep efforts of the slaves would be enough to
> offset this.
> 
> tl;dr
> Current auto-triggering of regression runs is stupid and a waste of
> time and resources. Bring in a project gating system, Zuul, which can
> do a much more intelligent jobs triggering, and use it to
> automatically trigger regression only for changes with Reviewed+2 and
> automatically merge ones that pass.
> 
> What does the community think of this?

Zuul is being packaged for Fedora/EPEL, so it would greatly help to have
it packaged rather that a non sustainable self installation like we had
in the past.
-- 
Micha

Re: [Gluster-infra] Status of nbslave7x

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 03:00:29PM +, Emmanuel Dreyfus wrote:
> Oh no, it did, but nuked them all almost instantly (see below). I 
> disabled it again. Basically we have borken jenkins setups, and DNS
> trouble prevent us from adding new VM. What a mess.

I retriggered most of the jobs, but at soem time the webUI refreshed
and I lose track of what jobs I already retriggered or not.  I left as is.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Status of nbslave7x

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 08:34:06PM +0530, Kaushal M wrote:
> Would restarting jenkins once help? It might help it pick up the newly
> added entries to the hosts file.

Won't it break all running jobs?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Status of nbslave7x

2015-06-17 Thread Kaushal M
Would restarting jenkins once help? It might help it pick up the newly
added entries to the hosts file.

On Wed, Jun 17, 2015 at 8:30 PM, Emmanuel Dreyfus  wrote:
> On Wed, Jun 17, 2015 at 02:57:28PM +, Emmanuel Dreyfus wrote:
>> I re-enabled it and it went online, but it does not seems to pick a job.
>
> Oh no, it did, but nuked them all almost instantly (see below). I
> disabled it again. Basically we have borken jenkins setups, and DNS
> trouble prevent us from adding new VM. What a mess.
>
> Triggered by Gerrit: http://review.gluster.org/11264 in silent mode.
> Building remotely on nbslave71.cloud.gluster.org (netbsd7_regression) in 
> workspace /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered
> java.io.IOException: remote file operation failed: 
> /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered at 
> hudson.remoting.Channel@1f76c8cf:nbslave71.cloud.gluster.org: 
> hudson.remoting.ChannelClosedException: channel is already closed
> at hudson.FilePath.act(FilePath.java:987)
> at hudson.FilePath.act(FilePath.java:969)
> at hudson.FilePath.mkdirs(FilePath.java:1152)
> at hudson.model.AbstractProject.checkout(AbstractProject.java:1269)
> at 
> hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:610)
> at 
> jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
> at 
> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:532)
> at hudson.model.Run.execute(Run.java:1744)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> at hudson.model.ResourceController.execute(ResourceController.java:98)
> at hudson.model.Executor.run(Executor.java:374)
> Caused by: hudson.remoting.ChannelClosedException: channel is already closed
> at hudson.remoting.Channel.send(Channel.java:550)
> at hudson.remoting.Request.call(Request.java:129)
> at hudson.remoting.Channel.call(Channel.java:752)
> at hudson.FilePath.act(FilePath.java:980)
> ... 10 more
> Caused by: java.io.IOException
> at hudson.remoting.Channel.close(Channel.java:1110)
> at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118)
> at hudson.remoting.PingThread.ping(PingThread.java:126)
> at hudson.remoting.PingThread.run(PingThread.java:85)
> Caused by: java.util.concurrent.TimeoutException: Ping started at 
> 1433860950328 hasn't completed by 1433861190328
> ... 2 more
> Finished: FAILURE
>
>
> --
> Emmanuel Dreyfus
> m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Status of nbslave7x

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 02:57:28PM +, Emmanuel Dreyfus wrote:
> I re-enabled it and it went online, but it does not seems to pick a job.

Oh no, it did, but nuked them all almost instantly (see below). I 
disabled it again. Basically we have borken jenkins setups, and DNS
trouble prevent us from adding new VM. What a mess.

Triggered by Gerrit: http://review.gluster.org/11264 in silent mode.
Building remotely on nbslave71.cloud.gluster.org (netbsd7_regression) in 
workspace /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered
java.io.IOException: remote file operation failed: 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered at 
hudson.remoting.Channel@1f76c8cf:nbslave71.cloud.gluster.org: 
hudson.remoting.ChannelClosedException: channel is already closed
at hudson.FilePath.act(FilePath.java:987)
at hudson.FilePath.act(FilePath.java:969)
at hudson.FilePath.mkdirs(FilePath.java:1152)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1269)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:610)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:532)
at hudson.model.Run.execute(Run.java:1744)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:374)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:550)
at hudson.remoting.Request.call(Request.java:129)
at hudson.remoting.Channel.call(Channel.java:752)
at hudson.FilePath.act(FilePath.java:980)
... 10 more
Caused by: java.io.IOException
at hudson.remoting.Channel.close(Channel.java:1110)
at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118)
at hudson.remoting.PingThread.ping(PingThread.java:126)
at hudson.remoting.PingThread.run(PingThread.java:85)
Caused by: java.util.concurrent.TimeoutException: Ping started at 1433860950328 
hasn't completed by 1433861190328
... 2 more
Finished: FAILURE


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Status of nbslave7x

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 07:39:06PM +0530, Kaushal M wrote:
> nbslave7{d..f} were the entries created by Vijay last week, which were
> resolving to nbslave71; there were no actual vms on rackspace. I had
> disabled nbslave71 at that point in time to reboot it, but I think I
> forgot to re-enable it.

I re-enabled it and it went online, but it does not seems to pick a job.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Status of nbslave7x

2015-06-17 Thread Kaushal M
nbslave7{d..f} were the entries created by Vijay last week, which were
resolving to nbslave71; there were no actual vms on rackspace. I had
disabled nbslave71 at that point in time to reboot it, but I think I
forgot to re-enable it.

~kaushal

On Wed, Jun 17, 2015 at 7:21 PM, Emmanuel Dreyfus  wrote:
> Status of NetBSD slave VM:
>
> 1 booked: nbslave71
>   It is noted to be disconnected by amarts. Is usage over?
>
> 3 removed from rackspace but still in jenkins: nbslave7d, nbslave7e, nbslave7f
>
> 6 active: nbslave72, nbslave77, nbslave7c, nbslave7g, nbslave7i, nbslave7j
>
> 3 offline: nbslave74 nbslave75 nbslave79
>   The 3 DNS records do not resolve (timeout) from build.gluster.org,
>   while they do at mine. Adding them to /etc/hosts helps a lot on the
>   command line, and it becomes possible to connect to port 22.
>   But jenkins is still unable to connect and launch the agent.
>   tcpdump on build.gluster;org shows it does not even tries.
>
> Perhaps there is a name cache in jenkisn and it needs to be restarted?
> I am leaving the /etc/hosts file loaded with nbslave74 nbslave75 nbslave79
>
> --
> Emmanuel Dreyfus
> m...@netbsd.org
> ___
> Gluster-infra mailing list
> Gluster-infra@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-infra
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


[Gluster-infra] Status of nbslave7x

2015-06-17 Thread Emmanuel Dreyfus
Status of NetBSD slave VM:

1 booked: nbslave71 
  It is noted to be disconnected by amarts. Is usage over?

3 removed from rackspace but still in jenkins: nbslave7d, nbslave7e, nbslave7f

6 active: nbslave72, nbslave77, nbslave7c, nbslave7g, nbslave7i, nbslave7j

3 offline: nbslave74 nbslave75 nbslave79
  The 3 DNS records do not resolve (timeout) from build.gluster.org, 
  while they do at mine. Adding them to /etc/hosts helps a lot on the
  command line, and it becomes possible to connect to port 22.
  But jenkins is still unable to connect and launch the agent.
  tcpdump on build.gluster;org shows it does not even tries.

Perhaps there is a name cache in jenkisn and it needs to be restarted?
I am leaving the /etc/hosts file loaded with nbslave74 nbslave75 nbslave79

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Michael Scherer
Le mercredi 17 juin 2015 à 11:58 +0100, Justin Clift a écrit :
> On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
> > Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
> >> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
> >>> Venky Shankar  wrote:
> >>> 
>  If that's the case, then I'll vote for this even if it takes some time
>  to get things in workable state.
> >>> 
> >>> See my other mail about this: you enter a new slave VM in the DNS and it
> >>> does not resolve, or somethimes you get 20s delays. I am convinced this
> >>> is the reason why Jenkins bugs.
> >> 
> >> But cloud.gluster.org is handled by rackspace, not sure how much control
> >> we have for it ( not sure even where to start there ).
> > 
> > So I cannot change the DNS destination.
> > 
> > What I can do is to create a new dns zone, and then, we can delegate as
> > we want. And migrate some slaves and not others, and see how it goes ?
> > 
> > slaves.gluster.org would be ok for everybody ?
> 
> Try it out, and see if it works. :)
> 
> On the "scaling the infrastructure" side of things, are the two OSAS servers
> for Gluster still available?

They are online.
$ ssh r...@ci.gluster.org uptime
 09:13:37 up 33 days, 16:34,  0 users,  load average: 0,00, 0,01, 0,05


> If so, we should get them online ASAP, as that will give us ~40 new VMs
> + get us out of iWeb (which I suspect is the problem).

I suspect too. But then that mean migrating jenkins and everything, and
I would prefer a quick fix. I am looking at the dns solution.
-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS



signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Vijay Bellur

On Wednesday 17 June 2015 08:13 AM, Emmanuel Dreyfus wrote:

On Wed, Jun 17, 2015 at 07:44:14AM -0400, Vijay Bellur wrote:

Do we still have the NFS crash that was causing tests to hang?


Do we still have it on rebased patchsets?



I am not certain. I am still trying to come to terms with my email 
backlog and hence seeking a quick opinion here to see if we need to 
address it asap.


-Vijay
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 07:44:14AM -0400, Vijay Bellur wrote:
> Do we still have the NFS crash that was causing tests to hang?

Do we still have it on rebased patchsets?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Vijay Bellur

On Wednesday 17 June 2015 05:20 AM, Emmanuel Dreyfus wrote:

On Wed, Jun 17, 2015 at 11:05:38AM +0200, Niels de Vos wrote:

I've already scripted the reboot-vm job to use Rackspace API, the DNS
requesting and formatting the results into some file can't be that
difficult. Let me know if a /etc/hosts format would do, or if you expect
something else.


Perhaps a /etc/hosts would do it: jenkins launches the ssh command,
and ssh should use /etc/hosts before the DNS.



Why don't we try this out while we find an alternate solution? Given 
that there are plenty of patches awaiting NetBSD regression, anything 
that we can do to alleviate the situation would be more than welcome!


Do we still have the NFS crash that was causing tests to hang?

Thanks,
Vijay
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Kaushal M
Just moving Gerrit and Jenkins out of iWeb should help a lot.

On Wed, Jun 17, 2015 at 4:28 PM, Justin Clift  wrote:
> On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
>> Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
>>> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
 Venky Shankar  wrote:

> If that's the case, then I'll vote for this even if it takes some time
> to get things in workable state.

 See my other mail about this: you enter a new slave VM in the DNS and it
 does not resolve, or somethimes you get 20s delays. I am convinced this
 is the reason why Jenkins bugs.
>>>
>>> But cloud.gluster.org is handled by rackspace, not sure how much control
>>> we have for it ( not sure even where to start there ).
>>
>> So I cannot change the DNS destination.
>>
>> What I can do is to create a new dns zone, and then, we can delegate as
>> we want. And migrate some slaves and not others, and see how it goes ?
>>
>> slaves.gluster.org would be ok for everybody ?
>
> Try it out, and see if it works. :)
>
> On the "scaling the infrastructure" side of things, are the two OSAS servers
> for Gluster still available?
>
> If so, we should get them online ASAP, as that will give us ~40 new VMs
> + get us out of iWeb (which I suspect is the problem).
>
> Regards and best wishes,
>
> Justin Clift
>
> --
> GlusterFS - http://www.gluster.org
>
> An open source, distributed file system scaling to several
> petabytes, and handling thousands of clients.
>
> My personal twitter: twitter.com/realjustinclift
>
> ___
> Gluster-infra mailing list
> Gluster-infra@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-infra
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Justin Clift
On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
> Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
>> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
>>> Venky Shankar  wrote:
>>> 
 If that's the case, then I'll vote for this even if it takes some time
 to get things in workable state.
>>> 
>>> See my other mail about this: you enter a new slave VM in the DNS and it
>>> does not resolve, or somethimes you get 20s delays. I am convinced this
>>> is the reason why Jenkins bugs.
>> 
>> But cloud.gluster.org is handled by rackspace, not sure how much control
>> we have for it ( not sure even where to start there ).
> 
> So I cannot change the DNS destination.
> 
> What I can do is to create a new dns zone, and then, we can delegate as
> we want. And migrate some slaves and not others, and see how it goes ?
> 
> slaves.gluster.org would be ok for everybody ?

Try it out, and see if it works. :)

On the "scaling the infrastructure" side of things, are the two OSAS servers
for Gluster still available?

If so, we should get them online ASAP, as that will give us ~40 new VMs
+ get us out of iWeb (which I suspect is the problem).

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Justin Clift
On 17 Jun 2015, at 07:29, Kaushal M  wrote:
> cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is
> no readily available option to do zone transfers from it. We might
> have to contact the Rackspace support to find out if they can do it as
> a special request.

Contacting Rackspace support is very easy, and they're normally
very responsive.  They have an online support ticket submission thing
in the Rackspace UI.  Often they get back to us with meaningful
responses in less than 15-20 minutes.

Please go ahead and submit a ticket. :)

(Btw - I suspect the DNS issue is likely related to the hardware
firewall in the iWeb infrastructure.  It's probably acting up. :<).

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Venky Shankar
On Wed, Jun 17, 2015 at 9:50 AM, Atin Mukherjee  wrote:
>
>
> On 06/11/2015 08:04 PM, Emmanuel Dreyfus wrote:
>> On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote:
>>> Michael installed and configured dnsmasq on build.gluster.org yesterday.
>>> If that does not help today, we need other ideas...
>>
>> Just to confirm the problem:
>>
>> [manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org
>> ;; connection timed out; trying next origin
>> ;; connection timed out; no servers could be reached
>>
>>
>> real0m20.013s
>> user0m0.002s
>> sys 0m0.012s
>>
>> Having a local cache does not help because upstream DNS service is
>> weak. Without the local cache, individual processes crave for a reply,
>> and with the local server, the local server crave itself crave for
>> a reply.
>>
>> And here upstream DNS is really at fault: at mine I get a reply in
>> 0.29s.
>>
>> We need to configure a local authoritative secondary DNS for the zone,
>> so that the answer is always available locally wihtout having to rely
>> on outside's infrastructure.
> I am not sure whether we have any improvements on this front. I still
> see patches are waiting for ages to get their turn for the regression
> run and hence delaying merges and effecting the release process.
>
> I still feel we don't need to wait for NetBSD's vote for merging patches
> on a temporary basis till we fix the infrastructure problem. This is the
> only quick solution which I can think of now.

That *might* result in lots of NetBSD regression failures later on and
we may end up with another round of fixups.

I can't think of a quick solution either.

>
> Thoughts?
>
> ~Atin
>>
>
> --
> ~Atin
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Venky Shankar
On Wed, Jun 17, 2015 at 10:13 AM, Emmanuel Dreyfus  wrote:
> Atin Mukherjee  wrote:
>
>> > That *might* result in lots of NetBSD regression failures later on and
>> > we may end up with another round of fixups.
>> Agreed, that's the known risk but we don't have any other alternatives atm.
>
> I strongly disagree, we have a good alternative: configure a secondary
> DNS on build.gluster.org for the cloud.gluster.org zone. I could do the
> local configuration, but someone with administrative access will have to
> touch primary configuration to allow zone transfer (and enable
> notifications).

If that's the case, then I'll vote for this even if it takes some time
to get things in workable state.
I think Kaushal/Niels/Justin could surely help here.

>
> The current situation is that we have 14 NetBSD VM online and only 5 are
> capable of running jobs because of various infrastructure configuration
> problems, broken DNS being the first offender.
>
> Another issue is the hanging NFS mounts (ps -axl shows dd stuck in wchan
> tstile), for which I had a change merged that should fix the problem,
> but only for rebased changes.
>
>
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Rajesh Joseph


- Original Message -
> From: "Kaushal M" 
> To: "Emmanuel Dreyfus" 
> Cc: "Gluster Devel" , "gluster-infra" 
> 
> Sent: Wednesday, 17 June, 2015 11:59:22 AM
> Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regressions not being 
> triggered for patches
> 
> cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is
> no readily available option to do zone transfers from it. We might
> have to contact the Rackspace support to find out if they can do it as
> a special request.
> 

If this is going to take time then I prefer not to block patches for NetBSD. We 
can address
any NetBSD regression caused by patches as a separate bug. Otherwise our 
regression queue will 
continue to grow.

> 
> On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus  wrote:
> > Venky Shankar  wrote:
> >
> >> If that's the case, then I'll vote for this even if it takes some time
> >> to get things in workable state.
> >
> > See my other mail about this: you enter a new slave VM in the DNS and it
> > does not resolve, or somethimes you get 20s delays. I am convinced this
> > is the reason why Jenkins bugs.
> >
> > --
> > Emmanuel Dreyfus
> > http://hcpnet.free.fr/pubz
> > m...@netbsd.org
> > ___
> > Gluster-infra mailing list
> > Gluster-infra@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-infra
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Nithya Balachandran
- Original Message -
> From: "Avra Sengupta" 
> To: "Rajesh Joseph" , "Kaushal M" 
> Cc: "Gluster Devel" , "gluster-infra" 
> 
> Sent: Wednesday, June 17, 2015 1:42:25 PM
> Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regressions not being 
> triggered for patches
> 
> On 06/17/2015 12:12 PM, Rajesh Joseph wrote:
> >
> > - Original Message -
> >> From: "Kaushal M" 
> >> To: "Emmanuel Dreyfus" 
> >> Cc: "Gluster Devel" , "gluster-infra"
> >> 
> >> Sent: Wednesday, 17 June, 2015 11:59:22 AM
> >> Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regressions not being
> >> triggered for patches
> >>
> >> cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is
> >> no readily available option to do zone transfers from it. We might
> >> have to contact the Rackspace support to find out if they can do it as
> >> a special request.
> >>
> > If this is going to take time then I prefer not to block patches for
> > NetBSD. We can address
> > any NetBSD regression caused by patches as a separate bug. Otherwise our
> > regression queue will
> > continue to grow.
> +1 for this. We shouldn't be blocking patches for NetBSD regression till
> the infra scales enough to handle the kind of load we are throwing at
> it. Once the regression framework is scalable enough, we can fix any
> regressions (if any) introduced. This will bring down the turnaround
> time, for the patch acceptance.

+1


> >
> >> On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus 
> >> wrote:
> >>> Venky Shankar  wrote:
> >>>
>  If that's the case, then I'll vote for this even if it takes some time
>  to get things in workable state.
> >>> See my other mail about this: you enter a new slave VM in the DNS and it
> >>> does not resolve, or somethimes you get 20s delays. I am convinced this
> >>> is the reason why Jenkins bugs.
> >>>
> >>> --
> >>> Emmanuel Dreyfus
> >>> http://hcpnet.free.fr/pubz
> >>> m...@netbsd.org
> >>> ___
> >>> Gluster-infra mailing list
> >>> Gluster-infra@gluster.org
> >>> http://www.gluster.org/mailman/listinfo/gluster-infra
> >> ___
> >> Gluster-devel mailing list
> >> gluster-de...@gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Atin Mukherjee


On 06/17/2015 09:57 AM, Venky Shankar wrote:
> On Wed, Jun 17, 2015 at 9:50 AM, Atin Mukherjee  wrote:
>>
>>
>> On 06/11/2015 08:04 PM, Emmanuel Dreyfus wrote:
>>> On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote:
 Michael installed and configured dnsmasq on build.gluster.org yesterday.
 If that does not help today, we need other ideas...
>>>
>>> Just to confirm the problem:
>>>
>>> [manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org
>>> ;; connection timed out; trying next origin
>>> ;; connection timed out; no servers could be reached
>>>
>>>
>>> real0m20.013s
>>> user0m0.002s
>>> sys 0m0.012s
>>>
>>> Having a local cache does not help because upstream DNS service is
>>> weak. Without the local cache, individual processes crave for a reply,
>>> and with the local server, the local server crave itself crave for
>>> a reply.
>>>
>>> And here upstream DNS is really at fault: at mine I get a reply in
>>> 0.29s.
>>>
>>> We need to configure a local authoritative secondary DNS for the zone,
>>> so that the answer is always available locally wihtout having to rely
>>> on outside's infrastructure.
>> I am not sure whether we have any improvements on this front. I still
>> see patches are waiting for ages to get their turn for the regression
>> run and hence delaying merges and effecting the release process.
>>
>> I still feel we don't need to wait for NetBSD's vote for merging patches
>> on a temporary basis till we fix the infrastructure problem. This is the
>> only quick solution which I can think of now.
> 
> That *might* result in lots of NetBSD regression failures later on and
> we may end up with another round of fixups.
Agreed, that's the known risk but we don't have any other alternatives atm.
> 
> I can't think of a quick solution either.
> 
>>
>> Thoughts?
>>
>> ~Atin
>>>
>>
>> --
>> ~Atin
>> ___
>> Gluster-devel mailing list
>> gluster-de...@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
~Atin
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Atin Mukherjee


On 06/11/2015 08:04 PM, Emmanuel Dreyfus wrote:
> On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote:
>> Michael installed and configured dnsmasq on build.gluster.org yesterday.
>> If that does not help today, we need other ideas...
> 
> Just to confirm the problem:
> 
> [manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org
> ;; connection timed out; trying next origin
> ;; connection timed out; no servers could be reached
> 
> 
> real0m20.013s
> user0m0.002s
> sys 0m0.012s
> 
> Having a local cache does not help because upstream DNS service is 
> weak. Without the local cache, individual processes crave for a reply, 
> and with the local server, the local server crave itself crave for
> a reply.
> 
> And here upstream DNS is really at fault: at mine I get a reply in 
> 0.29s.
> 
> We need to configure a local authoritative secondary DNS for the zone, 
> so that the answer is always available locally wihtout having to rely
> on outside's infrastructure.
I am not sure whether we have any improvements on this front. I still
see patches are waiting for ages to get their turn for the regression
run and hence delaying merges and effecting the release process.

I still feel we don't need to wait for NetBSD's vote for merging patches
on a temporary basis till we fix the infrastructure problem. This is the
only quick solution which I can think of now.

Thoughts?

~Atin
> 

-- 
~Atin
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Michael Scherer
Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
> > Venky Shankar  wrote:
> > 
> > > If that's the case, then I'll vote for this even if it takes some time
> > > to get things in workable state.
> > 
> > See my other mail about this: you enter a new slave VM in the DNS and it
> > does not resolve, or somethimes you get 20s delays. I am convinced this
> > is the reason why Jenkins bugs.
> 
> But cloud.gluster.org is handled by rackspace, not sure how much control
> we have for it ( not sure even where to start there ).

So I cannot change the DNS destination.

What I can do is to create a new dns zone, and then, we can delegate as
we want. And migrate some slaves and not others, and see how it goes ?

slaves.gluster.org would be ok for everybody ?

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS



signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 11:48:46AM +0200, Michael Scherer wrote:
> And I think the DNS issues are just a symptom of a bigger network issue,
> having local DNS might just mask the problem and which would then be non
> DNS related ( like tcp connexion not working ).

Well, if it is lost packets, TCP is more resistant, and if it is an
overloaded DNS server, the problem is only for DNS.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Michael Scherer
Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
> Venky Shankar  wrote:
> 
> > If that's the case, then I'll vote for this even if it takes some time
> > to get things in workable state.
> 
> See my other mail about this: you enter a new slave VM in the DNS and it
> does not resolve, or somethimes you get 20s delays. I am convinced this
> is the reason why Jenkins bugs.

But cloud.gluster.org is handled by rackspace, not sure how much control
we have for it ( not sure even where to start there ).

And I think the DNS issues are just a symptom of a bigger network issue,
having local DNS might just mask the problem and which would then be non
DNS related ( like tcp connexion not working ).

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS



signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 11:05:38AM +0200, Niels de Vos wrote:
> I've already scripted the reboot-vm job to use Rackspace API, the DNS
> requesting and formatting the results into some file can't be that
> difficult. Let me know if a /etc/hosts format would do, or if you expect
> something else.

Perhaps a /etc/hosts would do it: jenkins launches the ssh command,
and ssh should use /etc/hosts before the DNS.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Niels de Vos
On Wed, Jun 17, 2015 at 11:59:22AM +0530, Kaushal M wrote:
> cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is
> no readily available option to do zone transfers from it. We might
> have to contact the Rackspace support to find out if they can do it as
> a special request.

Not sure about zone transfers, but we can request the DNS records
through the Rackspace DNS API:


http://docs.rackspace.com/cdns/api/v1.0/cdns-getting-started/content/List_Domain_Details.html

The IP addresses of the VMs do not change often, so a regular fetching
of the records would be sufficient. We could even have a Jenkins job
that downloads an updated /etc/hosts to a slave.

I've already scripted the reboot-vm job to use Rackspace API, the DNS
requesting and formatting the results into some file can't be that
difficult. Let me know if a /etc/hosts format would do, or if you expect
something else.

Thanks,
Niels

> 
> 
> On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus  wrote:
> > Venky Shankar  wrote:
> >
> >> If that's the case, then I'll vote for this even if it takes some time
> >> to get things in workable state.
> >
> > See my other mail about this: you enter a new slave VM in the DNS and it
> > does not resolve, or somethimes you get 20s delays. I am convinced this
> > is the reason why Jenkins bugs.
> >
> > --
> > Emmanuel Dreyfus
> > http://hcpnet.free.fr/pubz
> > m...@netbsd.org
> > ___
> > Gluster-infra mailing list
> > Gluster-infra@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-infra
> ___
> Gluster-infra mailing list
> Gluster-infra@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-infra
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Avra Sengupta

On 06/17/2015 12:12 PM, Rajesh Joseph wrote:


- Original Message -

From: "Kaushal M" 
To: "Emmanuel Dreyfus" 
Cc: "Gluster Devel" , "gluster-infra" 

Sent: Wednesday, 17 June, 2015 11:59:22 AM
Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regressions not being 
triggered for patches

cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is
no readily available option to do zone transfers from it. We might
have to contact the Rackspace support to find out if they can do it as
a special request.


If this is going to take time then I prefer not to block patches for NetBSD. We 
can address
any NetBSD regression caused by patches as a separate bug. Otherwise our 
regression queue will
continue to grow.
+1 for this. We shouldn't be blocking patches for NetBSD regression till 
the infra scales enough to handle the kind of load we are throwing at 
it. Once the regression framework is scalable enough, we can fix any 
regressions (if any) introduced. This will bring down the turnaround 
time, for the patch acceptance.



On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus  wrote:

Venky Shankar  wrote:


If that's the case, then I'll vote for this even if it takes some time
to get things in workable state.

See my other mail about this: you enter a new slave VM in the DNS and it
does not resolve, or somethimes you get 20s delays. I am convinced this
is the reason why Jenkins bugs.

--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

___
Gluster-devel mailing list
gluster-de...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
gluster-de...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 11:59:22AM +0530, Kaushal M wrote:
> cloud.gluster.org is served by Rackspace Cloud DNS

Perhaps we can change that and setup a DNS for the zone? 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra