Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Emmanuel Dreyfus
Vijay Bellur  wrote:

> I did dare just now and have rebooted Jenkins :). Let us see how this
> iteration works out.

Excellent! That fixed the Jenkins resolution problem, and we now have 10
NetBSD slave VM online. 

So we have two problems and their fixes available, for adding new VM:
- Weak upstream DNS service: worked around by /etc/hosts (a secondary
DNS would be more automatic, but at least it works)
- Jenkins has a DNS cache and needs a restart

How do ongoing jobs behaved on Jenkins restart? Did you have to restart
them all or did Jenkins care of it?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Vijay Bellur

On Thursday 18 June 2015 02:52 PM, Emmanuel Dreyfus wrote:

Justin Clift  wrote:


If the DNS problem does turn out to be the dodgy iWeb hardware firewall,
then this fixes the DNS issue. (if not... well damn!)


The DNS problem was worked around by installing a /etc/hosts, but
jenkins does not realize it is there. It should probably be restarted,
but nobody dare to try.



I did dare just now and have rebooted Jenkins :). Let us see how this 
iteration works out.


-Vijay
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Emmanuel Dreyfus
Justin Clift  wrote:

> If the DNS problem does turn out to be the dodgy iWeb hardware firewall,
> then this fixes the DNS issue. (if not... well damn!)

The DNS problem was worked around by installing a /etc/hosts, but
jenkins does not realize it is there. It should probably be restarted,
but nobody dare to try.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Justin Clift
On 18 Jun 2015, at 16:57, Emmanuel Dreyfus  wrote:
> Niels de Vos  wrote:
> 
>> I'm not sure what limitation you mean. Did we reach the limit of slaves
>> that Jenkins can reasonably address?
> 
> No I mean its inability to catch a new DNS record.

Priority wise, my suggestion would be to first get Gerrit and Jenkins
migrated to one of the two new servers. (probably put them in separate
VM's)

If the DNS problem does turn out to be the dodgy iWeb hardware firewall,
then this fixes the DNS issue. (if not... well damn!)

Assuming that does work :), then getting the other server set up with
new VM's and such would be the next thing to do.

That's my thinking anyway.

For reference, these are the main hardware specs for the two boxes:

  formicary.gluster.org   <-- for Gerrit/Jenkins/whatever
  *

   * 2 x Intel Xeon CPU E5-2640 v3 @ 2.60GHz (8 physical cores per cpu)
   * 32GB ECC RAM
   * 2 x ~560GB SAS HDD's
   * 1 x Intel 2P X520/2P I350 rNDC network card
  * Seems to be a 4 port 10GbE card.  The mgmt console says 2 ports
are up, and two down.  Guessing this means only two ports are
cabled up.


  ci.gluster.org  <-- for VMs
  **

   * 2 x Intel Xeon E5-2650 v3 @ 2.30GHz (10 physical cores per cpu)
   * 96GB ECC RAM
   * 4 x ~560GB SAS HDD's
   * 1 x Intel 2P X520/2P I350 rNDC network card
  * Seems to be a 4 port 10GbE card.  The mgmt console says 2 ports
are up, and two down.  Guessing this means only two ports are
cabled up.

Hope this is useful info. ;)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Emmanuel Dreyfus
Niels de Vos  wrote:

> I'm not sure what limitation you mean. Did we reach the limit of slaves
> that Jenkins can reasonably address?

No I mean its inability to catch a new DNS record.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Michael Scherer
Le jeudi 18 juin 2015 à 17:57 +0200, Emmanuel Dreyfus a écrit :
> Niels de Vos  wrote:
> 
> > I'm not sure what limitation you mean. Did we reach the limit of slaves
> > that Jenkins can reasonably address?
> 
> No I mean its inability to catch a new DNS record.

It might be a glibc limitation and/or design decision. We could restart
the jenkins instance and be done with it.

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS



signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Niels de Vos
On Thu, Jun 18, 2015 at 12:29:14PM +, Emmanuel Dreyfus wrote:
> On Thu, Jun 18, 2015 at 10:19:27AM +0200, Niels de Vos wrote:
> > Good to know, but it would be much more helpful if someone could install
> > VMs there and add them to the Jenkins instance... Who can do that, or
> > who can guide someone else to get it done?
> 
> How  will that help, since we are having problems with Jenkin's
> ability to get more hosts?

I'm not sure what limitation you mean. Did we reach the limit of slaves
that Jenkins can reasonably address?

I understood we are limited in the Rackspace account because we exceed
the sponsored budget. We should be able to run VMs on the servers that
are made available for the project. If they can host ~40 VMs, we can
reduce the Rackspace costs *and* have more VMs for testing.

Niels
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Emmanuel Dreyfus
On Thu, Jun 18, 2015 at 10:19:27AM +0200, Niels de Vos wrote:
> Good to know, but it would be much more helpful if someone could install
> VMs there and add them to the Jenkins instance... Who can do that, or
> who can guide someone else to get it done?

How  will that help, since we are having problems with Jenkin's
ability to get more hosts?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Michael Scherer
Le jeudi 18 juin 2015 à 10:19 +0200, Niels de Vos a écrit :
> On Thu, Jun 18, 2015 at 12:57:05AM +0100, Justin Clift wrote:
> > On 17 Jun 2015, at 20:14, Niels de Vos  wrote:
> > > On Wed, Jun 17, 2015 at 03:14:31PM +0200, Michael Scherer wrote:
> > >> Le mercredi 17 juin 2015 à 11:58 +0100, Justin Clift a écrit :
> > >>> On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
> >  Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
> > > Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
> > >> Venky Shankar  wrote:
> > >> 
> > >>> If that's the case, then I'll vote for this even if it takes some 
> > >>> time
> > >>> to get things in workable state.
> > >> 
> > >> See my other mail about this: you enter a new slave VM in the DNS 
> > >> and it
> > >> does not resolve, or somethimes you get 20s delays. I am convinced 
> > >> this
> > >> is the reason why Jenkins bugs.
> > > 
> > > But cloud.gluster.org is handled by rackspace, not sure how much 
> > > control
> > > we have for it ( not sure even where to start there ).
> >  
> >  So I cannot change the DNS destination.
> >  
> >  What I can do is to create a new dns zone, and then, we can delegate as
> >  we want. And migrate some slaves and not others, and see how it goes ?
> >  
> >  slaves.gluster.org would be ok for everybody ?
> > >>> 
> > >>> Try it out, and see if it works. :)
> > >>> 
> > >>> On the "scaling the infrastructure" side of things, are the two OSAS 
> > >>> servers
> > >>> for Gluster still available?
> > >> 
> > >> They are online.
> > >> $ ssh r...@ci.gluster.org uptime
> > >> 09:13:37 up 33 days, 16:34,  0 users,  load average: 0,00, 0,01, 0,05
> > > 
> > > Can it run some Jenkins Slave VMs too?
> > 
> > There are two boxes.  A pretty beefy one for running Jenkins slave VM's 
> > (probably
> > about 40 VM's simultaneously), and a slightly less beefy one for running
> > Jenkins/Gerrit/whatever.
> 
> Good to know, but it would be much more helpful if someone could install
> VMs there and add them to the Jenkins instance... Who can do that, or
> who can guide someone else to get it done?

Justin and I have our keys automatically added to all salt managed
servers and this one is one of them ( and I plan to make sure it stay
one of them, so any change should be pushed to salt ).

The issue is getting more public IP for the VMs. Since we have a model
where jenkins push to the slaves, they need to communicate. Either we
set a VPN, or we try to find public ips. 
But since we are in a hurry, I will take the VPN road.

Installing the VM in some automated way shouldn't be too hard, 90% of it
is in salt now. The biggest issue is the virtualisation infra to setup
in a proper way, and what kind of slave do we want.

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS



signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Justin Clift
On 18 Jun 2015, at 09:19, Niels de Vos  wrote:
> On Thu, Jun 18, 2015 at 12:57:05AM +0100, Justin Clift wrote:
>> On 17 Jun 2015, at 20:14, Niels de Vos  wrote:
>>> On Wed, Jun 17, 2015 at 03:14:31PM +0200, Michael Scherer wrote:
 Le mercredi 17 juin 2015 à 11:58 +0100, Justin Clift a écrit :
> On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
>> Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
>>> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
 Venky Shankar  wrote:
 
> If that's the case, then I'll vote for this even if it takes some time
> to get things in workable state.
 
 See my other mail about this: you enter a new slave VM in the DNS and 
 it
 does not resolve, or somethimes you get 20s delays. I am convinced this
 is the reason why Jenkins bugs.
>>> 
>>> But cloud.gluster.org is handled by rackspace, not sure how much control
>>> we have for it ( not sure even where to start there ).
>> 
>> So I cannot change the DNS destination.
>> 
>> What I can do is to create a new dns zone, and then, we can delegate as
>> we want. And migrate some slaves and not others, and see how it goes ?
>> 
>> slaves.gluster.org would be ok for everybody ?
> 
> Try it out, and see if it works. :)
> 
> On the "scaling the infrastructure" side of things, are the two OSAS 
> servers
> for Gluster still available?
 
 They are online.
 $ ssh r...@ci.gluster.org uptime
 09:13:37 up 33 days, 16:34,  0 users,  load average: 0,00, 0,01, 0,05
>>> 
>>> Can it run some Jenkins Slave VMs too?
>> 
>> There are two boxes.  A pretty beefy one for running Jenkins slave VM's 
>> (probably
>> about 40 VM's simultaneously), and a slightly less beefy one for running
>> Jenkins/Gerrit/whatever.
> 
> Good to know, but it would be much more helpful if someone could install
> VMs there and add them to the Jenkins instance... Who can do that, or
> who can guide someone else to get it done?

Misc has the keys. :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Niels de Vos
On Thu, Jun 18, 2015 at 12:57:05AM +0100, Justin Clift wrote:
> On 17 Jun 2015, at 20:14, Niels de Vos  wrote:
> > On Wed, Jun 17, 2015 at 03:14:31PM +0200, Michael Scherer wrote:
> >> Le mercredi 17 juin 2015 à 11:58 +0100, Justin Clift a écrit :
> >>> On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
>  Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
> > Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
> >> Venky Shankar  wrote:
> >> 
> >>> If that's the case, then I'll vote for this even if it takes some time
> >>> to get things in workable state.
> >> 
> >> See my other mail about this: you enter a new slave VM in the DNS and 
> >> it
> >> does not resolve, or somethimes you get 20s delays. I am convinced this
> >> is the reason why Jenkins bugs.
> > 
> > But cloud.gluster.org is handled by rackspace, not sure how much control
> > we have for it ( not sure even where to start there ).
>  
>  So I cannot change the DNS destination.
>  
>  What I can do is to create a new dns zone, and then, we can delegate as
>  we want. And migrate some slaves and not others, and see how it goes ?
>  
>  slaves.gluster.org would be ok for everybody ?
> >>> 
> >>> Try it out, and see if it works. :)
> >>> 
> >>> On the "scaling the infrastructure" side of things, are the two OSAS 
> >>> servers
> >>> for Gluster still available?
> >> 
> >> They are online.
> >> $ ssh r...@ci.gluster.org uptime
> >> 09:13:37 up 33 days, 16:34,  0 users,  load average: 0,00, 0,01, 0,05
> > 
> > Can it run some Jenkins Slave VMs too?
> 
> There are two boxes.  A pretty beefy one for running Jenkins slave VM's 
> (probably
> about 40 VM's simultaneously), and a slightly less beefy one for running
> Jenkins/Gerrit/whatever.

Good to know, but it would be much more helpful if someone could install
VMs there and add them to the Jenkins instance... Who can do that, or
who can guide someone else to get it done?

Thanks,
Niels
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Justin Clift
On 17 Jun 2015, at 20:14, Niels de Vos  wrote:
> On Wed, Jun 17, 2015 at 03:14:31PM +0200, Michael Scherer wrote:
>> Le mercredi 17 juin 2015 à 11:58 +0100, Justin Clift a écrit :
>>> On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
 Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
>> Venky Shankar  wrote:
>> 
>>> If that's the case, then I'll vote for this even if it takes some time
>>> to get things in workable state.
>> 
>> See my other mail about this: you enter a new slave VM in the DNS and it
>> does not resolve, or somethimes you get 20s delays. I am convinced this
>> is the reason why Jenkins bugs.
> 
> But cloud.gluster.org is handled by rackspace, not sure how much control
> we have for it ( not sure even where to start there ).
 
 So I cannot change the DNS destination.
 
 What I can do is to create a new dns zone, and then, we can delegate as
 we want. And migrate some slaves and not others, and see how it goes ?
 
 slaves.gluster.org would be ok for everybody ?
>>> 
>>> Try it out, and see if it works. :)
>>> 
>>> On the "scaling the infrastructure" side of things, are the two OSAS servers
>>> for Gluster still available?
>> 
>> They are online.
>> $ ssh r...@ci.gluster.org uptime
>> 09:13:37 up 33 days, 16:34,  0 users,  load average: 0,00, 0,01, 0,05
> 
> Can it run some Jenkins Slave VMs too?

There are two boxes.  A pretty beefy one for running Jenkins slave VM's 
(probably
about 40 VM's simultaneously), and a slightly less beefy one for running
Jenkins/Gerrit/whatever.

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Vijay Bellur

On Wednesday 17 June 2015 04:23 PM, Niels de Vos wrote:

On Wed, Jun 17, 2015 at 09:56:32PM +0200, Emmanuel Dreyfus wrote:

Niels de Vos  wrote:


Maybe, but I hope those issues stay masked when resolving the hostnames
is more stable. When we have the other servers up and running, we would
have a better understanding and options to investigate issues like this.


But Jenkins is still unable to launch an agent on e.g. nbslave75.
Perhaps it needs to be restarted?


Yes, a Jenkins restart might be good. But, I do not know how it gets
stopped safely, or started.



The only downside of a Jenkins restart is that we would need to manually 
re-trigger all existing jobs.


Shall we just do that now?

-Vijay

___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Niels de Vos
On Wed, Jun 17, 2015 at 09:56:32PM +0200, Emmanuel Dreyfus wrote:
> Niels de Vos  wrote:
> 
> > Maybe, but I hope those issues stay masked when resolving the hostnames
> > is more stable. When we have the other servers up and running, we would
> > have a better understanding and options to investigate issues like this.
> 
> But Jenkins is still unable to launch an agent on e.g. nbslave75.
> Perhaps it needs to be restarted?

Yes, a Jenkins restart might be good. But, I do not know how it gets
stopped safely, or started.

Niels
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
Niels de Vos  wrote:

> Maybe, but I hope those issues stay masked when resolving the hostnames
> is more stable. When we have the other servers up and running, we would
> have a better understanding and options to investigate issues like this.

But Jenkins is still unable to launch an agent on e.g. nbslave75.
Perhaps it needs to be restarted?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Niels de Vos
On Wed, Jun 17, 2015 at 03:14:31PM +0200, Michael Scherer wrote:
> Le mercredi 17 juin 2015 à 11:58 +0100, Justin Clift a écrit :
> > On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
> > > Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
> > >> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
> > >>> Venky Shankar  wrote:
> > >>> 
> >  If that's the case, then I'll vote for this even if it takes some time
> >  to get things in workable state.
> > >>> 
> > >>> See my other mail about this: you enter a new slave VM in the DNS and it
> > >>> does not resolve, or somethimes you get 20s delays. I am convinced this
> > >>> is the reason why Jenkins bugs.
> > >> 
> > >> But cloud.gluster.org is handled by rackspace, not sure how much control
> > >> we have for it ( not sure even where to start there ).
> > > 
> > > So I cannot change the DNS destination.
> > > 
> > > What I can do is to create a new dns zone, and then, we can delegate as
> > > we want. And migrate some slaves and not others, and see how it goes ?
> > > 
> > > slaves.gluster.org would be ok for everybody ?
> > 
> > Try it out, and see if it works. :)
> > 
> > On the "scaling the infrastructure" side of things, are the two OSAS servers
> > for Gluster still available?
> 
> They are online.
> $ ssh r...@ci.gluster.org uptime
>  09:13:37 up 33 days, 16:34,  0 users,  load average: 0,00, 0,01, 0,05

Can it run some Jenkins Slave VMs too?

Thanks,
Niels
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Niels de Vos
On Wed, Jun 17, 2015 at 11:48:46AM +0200, Michael Scherer wrote:
> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
> > Venky Shankar  wrote:
> > 
> > > If that's the case, then I'll vote for this even if it takes some time
> > > to get things in workable state.
> > 
> > See my other mail about this: you enter a new slave VM in the DNS and it
> > does not resolve, or somethimes you get 20s delays. I am convinced this
> > is the reason why Jenkins bugs.
> 
> But cloud.gluster.org is handled by rackspace, not sure how much control
> we have for it ( not sure even where to start there ).

On build.gluster.org there now is a /usr/local/bin/get-hosts.py script
(needs to be executed through sude). This pulls down the DNS records
from our cloud.gluster.org domain in Rackspace and proves a /etc/hosts
formatted output.

/etc/hosts on build.gluster.org contains all the current entries. We
could automatically update it with a cron job or something, if needed.
New VMs should get added to /etc/hosts too, either manually or by
executing the script (sudo vim /etc/hosts, :r!/usr/local/bin/get-hosts.py).

> And I think the DNS issues are just a symptom of a bigger network issue,
> having local DNS might just mask the problem and which would then be non
> DNS related ( like tcp connexion not working ).

Maybe, but I hope those issues stay masked when resolving the hostnames
is more stable. When we have the other servers up and running, we would
have a better understanding and options to investigate issues like this.

HTH,
Niels
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Niels de Vos
On Wed, Jun 17, 2015 at 12:13:46PM +, Emmanuel Dreyfus wrote:
> On Wed, Jun 17, 2015 at 07:44:14AM -0400, Vijay Bellur wrote:
> > Do we still have the NFS crash that was causing tests to hang?
> 
> Do we still have it on rebased patchsets?

Yes, the fixes depend on the refcounting change which does not seem as
trivial as I hoped. http://review.gluster.org/11022 for the interested.

http://review.gluster.org/11023 is the fix that should solve the
segfaults in the NFS-server.

Niels
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Michael Scherer
Le mercredi 17 juin 2015 à 11:58 +0100, Justin Clift a écrit :
> On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
> > Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
> >> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
> >>> Venky Shankar  wrote:
> >>> 
>  If that's the case, then I'll vote for this even if it takes some time
>  to get things in workable state.
> >>> 
> >>> See my other mail about this: you enter a new slave VM in the DNS and it
> >>> does not resolve, or somethimes you get 20s delays. I am convinced this
> >>> is the reason why Jenkins bugs.
> >> 
> >> But cloud.gluster.org is handled by rackspace, not sure how much control
> >> we have for it ( not sure even where to start there ).
> > 
> > So I cannot change the DNS destination.
> > 
> > What I can do is to create a new dns zone, and then, we can delegate as
> > we want. And migrate some slaves and not others, and see how it goes ?
> > 
> > slaves.gluster.org would be ok for everybody ?
> 
> Try it out, and see if it works. :)
> 
> On the "scaling the infrastructure" side of things, are the two OSAS servers
> for Gluster still available?

They are online.
$ ssh r...@ci.gluster.org uptime
 09:13:37 up 33 days, 16:34,  0 users,  load average: 0,00, 0,01, 0,05


> If so, we should get them online ASAP, as that will give us ~40 new VMs
> + get us out of iWeb (which I suspect is the problem).

I suspect too. But then that mean migrating jenkins and everything, and
I would prefer a quick fix. I am looking at the dns solution.
-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS



signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Vijay Bellur

On Wednesday 17 June 2015 08:13 AM, Emmanuel Dreyfus wrote:

On Wed, Jun 17, 2015 at 07:44:14AM -0400, Vijay Bellur wrote:

Do we still have the NFS crash that was causing tests to hang?


Do we still have it on rebased patchsets?



I am not certain. I am still trying to come to terms with my email 
backlog and hence seeking a quick opinion here to see if we need to 
address it asap.


-Vijay
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 07:44:14AM -0400, Vijay Bellur wrote:
> Do we still have the NFS crash that was causing tests to hang?

Do we still have it on rebased patchsets?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Vijay Bellur

On Wednesday 17 June 2015 05:20 AM, Emmanuel Dreyfus wrote:

On Wed, Jun 17, 2015 at 11:05:38AM +0200, Niels de Vos wrote:

I've already scripted the reboot-vm job to use Rackspace API, the DNS
requesting and formatting the results into some file can't be that
difficult. Let me know if a /etc/hosts format would do, or if you expect
something else.


Perhaps a /etc/hosts would do it: jenkins launches the ssh command,
and ssh should use /etc/hosts before the DNS.



Why don't we try this out while we find an alternate solution? Given 
that there are plenty of patches awaiting NetBSD regression, anything 
that we can do to alleviate the situation would be more than welcome!


Do we still have the NFS crash that was causing tests to hang?

Thanks,
Vijay
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Kaushal M
Just moving Gerrit and Jenkins out of iWeb should help a lot.

On Wed, Jun 17, 2015 at 4:28 PM, Justin Clift  wrote:
> On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
>> Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
>>> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
 Venky Shankar  wrote:

> If that's the case, then I'll vote for this even if it takes some time
> to get things in workable state.

 See my other mail about this: you enter a new slave VM in the DNS and it
 does not resolve, or somethimes you get 20s delays. I am convinced this
 is the reason why Jenkins bugs.
>>>
>>> But cloud.gluster.org is handled by rackspace, not sure how much control
>>> we have for it ( not sure even where to start there ).
>>
>> So I cannot change the DNS destination.
>>
>> What I can do is to create a new dns zone, and then, we can delegate as
>> we want. And migrate some slaves and not others, and see how it goes ?
>>
>> slaves.gluster.org would be ok for everybody ?
>
> Try it out, and see if it works. :)
>
> On the "scaling the infrastructure" side of things, are the two OSAS servers
> for Gluster still available?
>
> If so, we should get them online ASAP, as that will give us ~40 new VMs
> + get us out of iWeb (which I suspect is the problem).
>
> Regards and best wishes,
>
> Justin Clift
>
> --
> GlusterFS - http://www.gluster.org
>
> An open source, distributed file system scaling to several
> petabytes, and handling thousands of clients.
>
> My personal twitter: twitter.com/realjustinclift
>
> ___
> Gluster-infra mailing list
> Gluster-infra@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-infra
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Justin Clift
On 17 Jun 2015, at 10:53, Michael Scherer  wrote:
> Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
>> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
>>> Venky Shankar  wrote:
>>> 
 If that's the case, then I'll vote for this even if it takes some time
 to get things in workable state.
>>> 
>>> See my other mail about this: you enter a new slave VM in the DNS and it
>>> does not resolve, or somethimes you get 20s delays. I am convinced this
>>> is the reason why Jenkins bugs.
>> 
>> But cloud.gluster.org is handled by rackspace, not sure how much control
>> we have for it ( not sure even where to start there ).
> 
> So I cannot change the DNS destination.
> 
> What I can do is to create a new dns zone, and then, we can delegate as
> we want. And migrate some slaves and not others, and see how it goes ?
> 
> slaves.gluster.org would be ok for everybody ?

Try it out, and see if it works. :)

On the "scaling the infrastructure" side of things, are the two OSAS servers
for Gluster still available?

If so, we should get them online ASAP, as that will give us ~40 new VMs
+ get us out of iWeb (which I suspect is the problem).

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Justin Clift
On 17 Jun 2015, at 07:29, Kaushal M  wrote:
> cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is
> no readily available option to do zone transfers from it. We might
> have to contact the Rackspace support to find out if they can do it as
> a special request.

Contacting Rackspace support is very easy, and they're normally
very responsive.  They have an online support ticket submission thing
in the Rackspace UI.  Often they get back to us with meaningful
responses in less than 15-20 minutes.

Please go ahead and submit a ticket. :)

(Btw - I suspect the DNS issue is likely related to the hardware
firewall in the iWeb infrastructure.  It's probably acting up. :<).

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Venky Shankar
On Wed, Jun 17, 2015 at 9:50 AM, Atin Mukherjee  wrote:
>
>
> On 06/11/2015 08:04 PM, Emmanuel Dreyfus wrote:
>> On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote:
>>> Michael installed and configured dnsmasq on build.gluster.org yesterday.
>>> If that does not help today, we need other ideas...
>>
>> Just to confirm the problem:
>>
>> [manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org
>> ;; connection timed out; trying next origin
>> ;; connection timed out; no servers could be reached
>>
>>
>> real0m20.013s
>> user0m0.002s
>> sys 0m0.012s
>>
>> Having a local cache does not help because upstream DNS service is
>> weak. Without the local cache, individual processes crave for a reply,
>> and with the local server, the local server crave itself crave for
>> a reply.
>>
>> And here upstream DNS is really at fault: at mine I get a reply in
>> 0.29s.
>>
>> We need to configure a local authoritative secondary DNS for the zone,
>> so that the answer is always available locally wihtout having to rely
>> on outside's infrastructure.
> I am not sure whether we have any improvements on this front. I still
> see patches are waiting for ages to get their turn for the regression
> run and hence delaying merges and effecting the release process.
>
> I still feel we don't need to wait for NetBSD's vote for merging patches
> on a temporary basis till we fix the infrastructure problem. This is the
> only quick solution which I can think of now.

That *might* result in lots of NetBSD regression failures later on and
we may end up with another round of fixups.

I can't think of a quick solution either.

>
> Thoughts?
>
> ~Atin
>>
>
> --
> ~Atin
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Venky Shankar
On Wed, Jun 17, 2015 at 10:13 AM, Emmanuel Dreyfus  wrote:
> Atin Mukherjee  wrote:
>
>> > That *might* result in lots of NetBSD regression failures later on and
>> > we may end up with another round of fixups.
>> Agreed, that's the known risk but we don't have any other alternatives atm.
>
> I strongly disagree, we have a good alternative: configure a secondary
> DNS on build.gluster.org for the cloud.gluster.org zone. I could do the
> local configuration, but someone with administrative access will have to
> touch primary configuration to allow zone transfer (and enable
> notifications).

If that's the case, then I'll vote for this even if it takes some time
to get things in workable state.
I think Kaushal/Niels/Justin could surely help here.

>
> The current situation is that we have 14 NetBSD VM online and only 5 are
> capable of running jobs because of various infrastructure configuration
> problems, broken DNS being the first offender.
>
> Another issue is the hanging NFS mounts (ps -axl shows dd stuck in wchan
> tstile), for which I had a change merged that should fix the problem,
> but only for rebased changes.
>
>
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Rajesh Joseph


- Original Message -
> From: "Kaushal M" 
> To: "Emmanuel Dreyfus" 
> Cc: "Gluster Devel" , "gluster-infra" 
> 
> Sent: Wednesday, 17 June, 2015 11:59:22 AM
> Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regressions not being 
> triggered for patches
> 
> cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is
> no readily available option to do zone transfers from it. We might
> have to contact the Rackspace support to find out if they can do it as
> a special request.
> 

If this is going to take time then I prefer not to block patches for NetBSD. We 
can address
any NetBSD regression caused by patches as a separate bug. Otherwise our 
regression queue will 
continue to grow.

> 
> On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus  wrote:
> > Venky Shankar  wrote:
> >
> >> If that's the case, then I'll vote for this even if it takes some time
> >> to get things in workable state.
> >
> > See my other mail about this: you enter a new slave VM in the DNS and it
> > does not resolve, or somethimes you get 20s delays. I am convinced this
> > is the reason why Jenkins bugs.
> >
> > --
> > Emmanuel Dreyfus
> > http://hcpnet.free.fr/pubz
> > m...@netbsd.org
> > ___
> > Gluster-infra mailing list
> > Gluster-infra@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-infra
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Nithya Balachandran
- Original Message -
> From: "Avra Sengupta" 
> To: "Rajesh Joseph" , "Kaushal M" 
> Cc: "Gluster Devel" , "gluster-infra" 
> 
> Sent: Wednesday, June 17, 2015 1:42:25 PM
> Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regressions not being 
> triggered for patches
> 
> On 06/17/2015 12:12 PM, Rajesh Joseph wrote:
> >
> > - Original Message -
> >> From: "Kaushal M" 
> >> To: "Emmanuel Dreyfus" 
> >> Cc: "Gluster Devel" , "gluster-infra"
> >> 
> >> Sent: Wednesday, 17 June, 2015 11:59:22 AM
> >> Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regressions not being
> >> triggered for patches
> >>
> >> cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is
> >> no readily available option to do zone transfers from it. We might
> >> have to contact the Rackspace support to find out if they can do it as
> >> a special request.
> >>
> > If this is going to take time then I prefer not to block patches for
> > NetBSD. We can address
> > any NetBSD regression caused by patches as a separate bug. Otherwise our
> > regression queue will
> > continue to grow.
> +1 for this. We shouldn't be blocking patches for NetBSD regression till
> the infra scales enough to handle the kind of load we are throwing at
> it. Once the regression framework is scalable enough, we can fix any
> regressions (if any) introduced. This will bring down the turnaround
> time, for the patch acceptance.

+1


> >
> >> On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus 
> >> wrote:
> >>> Venky Shankar  wrote:
> >>>
>  If that's the case, then I'll vote for this even if it takes some time
>  to get things in workable state.
> >>> See my other mail about this: you enter a new slave VM in the DNS and it
> >>> does not resolve, or somethimes you get 20s delays. I am convinced this
> >>> is the reason why Jenkins bugs.
> >>>
> >>> --
> >>> Emmanuel Dreyfus
> >>> http://hcpnet.free.fr/pubz
> >>> m...@netbsd.org
> >>> ___
> >>> Gluster-infra mailing list
> >>> Gluster-infra@gluster.org
> >>> http://www.gluster.org/mailman/listinfo/gluster-infra
> >> ___
> >> Gluster-devel mailing list
> >> gluster-de...@gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Atin Mukherjee


On 06/17/2015 09:57 AM, Venky Shankar wrote:
> On Wed, Jun 17, 2015 at 9:50 AM, Atin Mukherjee  wrote:
>>
>>
>> On 06/11/2015 08:04 PM, Emmanuel Dreyfus wrote:
>>> On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote:
 Michael installed and configured dnsmasq on build.gluster.org yesterday.
 If that does not help today, we need other ideas...
>>>
>>> Just to confirm the problem:
>>>
>>> [manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org
>>> ;; connection timed out; trying next origin
>>> ;; connection timed out; no servers could be reached
>>>
>>>
>>> real0m20.013s
>>> user0m0.002s
>>> sys 0m0.012s
>>>
>>> Having a local cache does not help because upstream DNS service is
>>> weak. Without the local cache, individual processes crave for a reply,
>>> and with the local server, the local server crave itself crave for
>>> a reply.
>>>
>>> And here upstream DNS is really at fault: at mine I get a reply in
>>> 0.29s.
>>>
>>> We need to configure a local authoritative secondary DNS for the zone,
>>> so that the answer is always available locally wihtout having to rely
>>> on outside's infrastructure.
>> I am not sure whether we have any improvements on this front. I still
>> see patches are waiting for ages to get their turn for the regression
>> run and hence delaying merges and effecting the release process.
>>
>> I still feel we don't need to wait for NetBSD's vote for merging patches
>> on a temporary basis till we fix the infrastructure problem. This is the
>> only quick solution which I can think of now.
> 
> That *might* result in lots of NetBSD regression failures later on and
> we may end up with another round of fixups.
Agreed, that's the known risk but we don't have any other alternatives atm.
> 
> I can't think of a quick solution either.
> 
>>
>> Thoughts?
>>
>> ~Atin
>>>
>>
>> --
>> ~Atin
>> ___
>> Gluster-devel mailing list
>> gluster-de...@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
~Atin
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Atin Mukherjee


On 06/11/2015 08:04 PM, Emmanuel Dreyfus wrote:
> On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote:
>> Michael installed and configured dnsmasq on build.gluster.org yesterday.
>> If that does not help today, we need other ideas...
> 
> Just to confirm the problem:
> 
> [manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org
> ;; connection timed out; trying next origin
> ;; connection timed out; no servers could be reached
> 
> 
> real0m20.013s
> user0m0.002s
> sys 0m0.012s
> 
> Having a local cache does not help because upstream DNS service is 
> weak. Without the local cache, individual processes crave for a reply, 
> and with the local server, the local server crave itself crave for
> a reply.
> 
> And here upstream DNS is really at fault: at mine I get a reply in 
> 0.29s.
> 
> We need to configure a local authoritative secondary DNS for the zone, 
> so that the answer is always available locally wihtout having to rely
> on outside's infrastructure.
I am not sure whether we have any improvements on this front. I still
see patches are waiting for ages to get their turn for the regression
run and hence delaying merges and effecting the release process.

I still feel we don't need to wait for NetBSD's vote for merging patches
on a temporary basis till we fix the infrastructure problem. This is the
only quick solution which I can think of now.

Thoughts?

~Atin
> 

-- 
~Atin
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Michael Scherer
Le mercredi 17 juin 2015 à 11:48 +0200, Michael Scherer a écrit :
> Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
> > Venky Shankar  wrote:
> > 
> > > If that's the case, then I'll vote for this even if it takes some time
> > > to get things in workable state.
> > 
> > See my other mail about this: you enter a new slave VM in the DNS and it
> > does not resolve, or somethimes you get 20s delays. I am convinced this
> > is the reason why Jenkins bugs.
> 
> But cloud.gluster.org is handled by rackspace, not sure how much control
> we have for it ( not sure even where to start there ).

So I cannot change the DNS destination.

What I can do is to create a new dns zone, and then, we can delegate as
we want. And migrate some slaves and not others, and see how it goes ?

slaves.gluster.org would be ok for everybody ?

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS



signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 11:48:46AM +0200, Michael Scherer wrote:
> And I think the DNS issues are just a symptom of a bigger network issue,
> having local DNS might just mask the problem and which would then be non
> DNS related ( like tcp connexion not working ).

Well, if it is lost packets, TCP is more resistant, and if it is an
overloaded DNS server, the problem is only for DNS.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Michael Scherer
Le mercredi 17 juin 2015 à 08:20 +0200, Emmanuel Dreyfus a écrit :
> Venky Shankar  wrote:
> 
> > If that's the case, then I'll vote for this even if it takes some time
> > to get things in workable state.
> 
> See my other mail about this: you enter a new slave VM in the DNS and it
> does not resolve, or somethimes you get 20s delays. I am convinced this
> is the reason why Jenkins bugs.

But cloud.gluster.org is handled by rackspace, not sure how much control
we have for it ( not sure even where to start there ).

And I think the DNS issues are just a symptom of a bigger network issue,
having local DNS might just mask the problem and which would then be non
DNS related ( like tcp connexion not working ).

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS



signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 11:05:38AM +0200, Niels de Vos wrote:
> I've already scripted the reboot-vm job to use Rackspace API, the DNS
> requesting and formatting the results into some file can't be that
> difficult. Let me know if a /etc/hosts format would do, or if you expect
> something else.

Perhaps a /etc/hosts would do it: jenkins launches the ssh command,
and ssh should use /etc/hosts before the DNS.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Niels de Vos
On Wed, Jun 17, 2015 at 11:59:22AM +0530, Kaushal M wrote:
> cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is
> no readily available option to do zone transfers from it. We might
> have to contact the Rackspace support to find out if they can do it as
> a special request.

Not sure about zone transfers, but we can request the DNS records
through the Rackspace DNS API:


http://docs.rackspace.com/cdns/api/v1.0/cdns-getting-started/content/List_Domain_Details.html

The IP addresses of the VMs do not change often, so a regular fetching
of the records would be sufficient. We could even have a Jenkins job
that downloads an updated /etc/hosts to a slave.

I've already scripted the reboot-vm job to use Rackspace API, the DNS
requesting and formatting the results into some file can't be that
difficult. Let me know if a /etc/hosts format would do, or if you expect
something else.

Thanks,
Niels

> 
> 
> On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus  wrote:
> > Venky Shankar  wrote:
> >
> >> If that's the case, then I'll vote for this even if it takes some time
> >> to get things in workable state.
> >
> > See my other mail about this: you enter a new slave VM in the DNS and it
> > does not resolve, or somethimes you get 20s delays. I am convinced this
> > is the reason why Jenkins bugs.
> >
> > --
> > Emmanuel Dreyfus
> > http://hcpnet.free.fr/pubz
> > m...@netbsd.org
> > ___
> > Gluster-infra mailing list
> > Gluster-infra@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-infra
> ___
> Gluster-infra mailing list
> Gluster-infra@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-infra
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Avra Sengupta

On 06/17/2015 12:12 PM, Rajesh Joseph wrote:


- Original Message -

From: "Kaushal M" 
To: "Emmanuel Dreyfus" 
Cc: "Gluster Devel" , "gluster-infra" 

Sent: Wednesday, 17 June, 2015 11:59:22 AM
Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regressions not being 
triggered for patches

cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is
no readily available option to do zone transfers from it. We might
have to contact the Rackspace support to find out if they can do it as
a special request.


If this is going to take time then I prefer not to block patches for NetBSD. We 
can address
any NetBSD regression caused by patches as a separate bug. Otherwise our 
regression queue will
continue to grow.
+1 for this. We shouldn't be blocking patches for NetBSD regression till 
the infra scales enough to handle the kind of load we are throwing at 
it. Once the regression framework is scalable enough, we can fix any 
regressions (if any) introduced. This will bring down the turnaround 
time, for the patch acceptance.



On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus  wrote:

Venky Shankar  wrote:


If that's the case, then I'll vote for this even if it takes some time
to get things in workable state.

See my other mail about this: you enter a new slave VM in the DNS and it
does not resolve, or somethimes you get 20s delays. I am convinced this
is the reason why Jenkins bugs.

--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

___
Gluster-devel mailing list
gluster-de...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
gluster-de...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 11:59:22AM +0530, Kaushal M wrote:
> cloud.gluster.org is served by Rackspace Cloud DNS

Perhaps we can change that and setup a DNS for the zone? 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-16 Thread Kaushal M
cloud.gluster.org is served by Rackspace Cloud DNS. AFAICT, there is
no readily available option to do zone transfers from it. We might
have to contact the Rackspace support to find out if they can do it as
a special request.


On Wed, Jun 17, 2015 at 11:50 AM, Emmanuel Dreyfus  wrote:
> Venky Shankar  wrote:
>
>> If that's the case, then I'll vote for this even if it takes some time
>> to get things in workable state.
>
> See my other mail about this: you enter a new slave VM in the DNS and it
> does not resolve, or somethimes you get 20s delays. I am convinced this
> is the reason why Jenkins bugs.
>
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> m...@netbsd.org
> ___
> Gluster-infra mailing list
> Gluster-infra@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-infra
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-16 Thread Emmanuel Dreyfus
Venky Shankar  wrote:

> If that's the case, then I'll vote for this even if it takes some time
> to get things in workable state.

See my other mail about this: you enter a new slave VM in the DNS and it
does not resolve, or somethimes you get 20s delays. I am convinced this
is the reason why Jenkins bugs.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-16 Thread Emmanuel Dreyfus
Atin Mukherjee  wrote:

> > That *might* result in lots of NetBSD regression failures later on and
> > we may end up with another round of fixups.
> Agreed, that's the known risk but we don't have any other alternatives atm.

I strongly disagree, we have a good alternative: configure a secondary
DNS on build.gluster.org for the cloud.gluster.org zone. I could do the
local configuration, but someone with administrative access will have to
touch primary configuration to allow zone transfer (and enable
notifications).

The current situation is that we have 14 NetBSD VM online and only 5 are
capable of running jobs because of various infrastructure configuration
problems, broken DNS being the first offender.

Another issue is the hanging NFS mounts (ps -axl shows dd stuck in wchan
tstile), for which I had a change merged that should fix the problem,
but only for rebased changes.


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-12 Thread Michael Scherer
Le vendredi 12 juin 2015 à 12:02 +0200, Michael Scherer a écrit :
> Le jeudi 11 juin 2015 à 14:34 +, Emmanuel Dreyfus a écrit :
> > On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote:
> > > Michael installed and configured dnsmasq on build.gluster.org yesterday.
> > > If that does not help today, we need other ideas...
> > 
> > Just to confirm the problem:
> > 
> > [manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org
> > ;; connection timed out; trying next origin
> > ;; connection timed out; no servers could be reached
> > 
> > 
> > real0m20.013s
> > user0m0.002s
> > sys 0m0.012s
> > 
> > Having a local cache does not help because upstream DNS service is 
> > weak. Without the local cache, individual processes crave for a reply, 
> > and with the local server, the local server crave itself crave for
> > a reply.
> > 
> > And here upstream DNS is really at fault: at mine I get a reply in 
> > 0.29s.
> > 
> > We need to configure a local authoritative secondary DNS for the zone, 
> > so that the answer is always available locally wihtout having to rely
> > on outside's infrastructure.
> 
> So, I am not sure I fully follow the issue.
> 
> Here, the time nslookup is fast (for now). 
> And the dnsmsq cache use google dns rather than iweb provided one.
> 
> So when you say "upstream dns server is bad", which one do you mean
> exactly ?
> 
> ( cloud.gluster.org is handled by rackspace, we could move to named
> managed locally, but then, this will not be self service anymore )

So, I suspect some device between build.gluster;org and the DNS, because
what I see is indeed that on a regular basis, jenkins dns resolution
fail without a obvious pattern.

IMHO, the long term solution is to move out of iweb, or to fix the
device/network.

Short term, I guess having a local dump of cloud.gluster.org would work;
This requires scripting I guess. 

http://docs.rackspace.com/cdns/api/v1.0/cdns-getting-started/content/DNS_Overview.html

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS



signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-12 Thread Michael Scherer
Le jeudi 11 juin 2015 à 14:34 +, Emmanuel Dreyfus a écrit :
> On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote:
> > Michael installed and configured dnsmasq on build.gluster.org yesterday.
> > If that does not help today, we need other ideas...
> 
> Just to confirm the problem:
> 
> [manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org
> ;; connection timed out; trying next origin
> ;; connection timed out; no servers could be reached
> 
> 
> real0m20.013s
> user0m0.002s
> sys 0m0.012s
> 
> Having a local cache does not help because upstream DNS service is 
> weak. Without the local cache, individual processes crave for a reply, 
> and with the local server, the local server crave itself crave for
> a reply.
> 
> And here upstream DNS is really at fault: at mine I get a reply in 
> 0.29s.
> 
> We need to configure a local authoritative secondary DNS for the zone, 
> so that the answer is always available locally wihtout having to rely
> on outside's infrastructure.

So, I am not sure I fully follow the issue.

Here, the time nslookup is fast (for now). 
And the dnsmsq cache use google dns rather than iweb provided one.

So when you say "upstream dns server is bad", which one do you mean
exactly ?

( cloud.gluster.org is handled by rackspace, we could move to named
managed locally, but then, this will not be self service anymore )

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS



signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Emmanuel Dreyfus
On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote:
> Michael installed and configured dnsmasq on build.gluster.org yesterday.
> If that does not help today, we need other ideas...

Just to confirm the problem:

[manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached


real0m20.013s
user0m0.002s
sys 0m0.012s

Having a local cache does not help because upstream DNS service is 
weak. Without the local cache, individual processes crave for a reply, 
and with the local server, the local server crave itself crave for
a reply.

And here upstream DNS is really at fault: at mine I get a reply in 
0.29s.

We need to configure a local authoritative secondary DNS for the zone, 
so that the answer is always available locally wihtout having to rely
on outside's infrastructure.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Niels de Vos
On Thu, Jun 11, 2015 at 12:25:13PM +, Emmanuel Dreyfus wrote:
> On Thu, Jun 11, 2015 at 12:51:52PM +0200, Niels de Vos wrote:
> > I've just checked the online NetBSD slaves again, but they seem to have
> > been configured correctly... Maybe we are hitting a Jenkins bug, or
> > there was a (temporary?) issue with DNS resolution?
> 
> DNS resolution is wrecked on build.gluster.org: I tried a tcpdump
> to diagnose the problem and:
> tcpdump: unknown host 'nbslave71.cloud.gluster.org'
> 
> Another attmpt gives me the correct answer after more than 5 seconds.
> 
> I am almost convinced that a local named on build.gluster.org would
> help a lot.

Michael installed and configured dnsmasq on build.gluster.org yesterday.
If that does not help today, we need other ideas...

Niels
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Emmanuel Dreyfus
On Thu, Jun 11, 2015 at 12:51:52PM +0200, Niels de Vos wrote:
> I've just checked the online NetBSD slaves again, but they seem to have
> been configured correctly... Maybe we are hitting a Jenkins bug, or
> there was a (temporary?) issue with DNS resolution?

DNS resolution is wrecked on build.gluster.org: I tried a tcpdump
to diagnose the problem and:
tcpdump: unknown host 'nbslave71.cloud.gluster.org'

Another attmpt gives me the correct answer after more than 5 seconds.

I am almost convinced that a local named on build.gluster.org would
help a lot.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Atin Mukherjee


On 06/11/2015 09:31 AM, Avra Sengupta wrote:
> Hi,
> 
> New patches being submitted are not getting NetBSD regressions run on
> them. Even manually they are not getting triggered. Is anyone aware of
> this?
Can we start merging patches with out NetBSD's vote? Currently we have
so many patches waiting for NetBSD's vote and it seems like no vms are
apparently running as well. This is blocking us to move forward.

Thoughts?

~Atin
> 
> Regards,
> Avra
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
~Atin
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Niels de Vos
On Thu, Jun 11, 2015 at 09:48:22AM +, Emmanuel Dreyfus wrote:
> On Thu, Jun 11, 2015 at 07:26:00AM +, Emmanuel Dreyfus wrote:
> > In my opinion the fix to this problem is to start new VM. I was busy 
> > on other fronts hence I did not watched the situation, but it is still
> > grim, with most NetBSD slaves been in screwed state. We need to spin 
> > more.
> 
> Launching the slave on the new VM fails, but for once we have a 
> maningful error: either DNS names are duplicated, or jenkins has a bug.

There were some Jenkins slaves that all connected to
nbslave71.cloud.gluster.org. I thought I disabled all of those, and only
kept the Jenkins slave nbslave71.cloud.gluster.org enabled. Maybe I
missed a wrongly configured Jenkins slave and there are still more
Jenkins slaves using the single nbslave71.cloud.gluster.org VM?

I've just checked the online NetBSD slaves again, but they seem to have
been configured correctly... Maybe we are hitting a Jenkins bug, or
there was a (temporary?) issue with DNS resolution?

Niels

> 
> <===[JENKINS REMOTING CAPACITY]===>ERROR: Unexpected error in launching a 
> slave. This is probably a bug in Jenkins.
> java.lang.IllegalStateException: Already connected
>   at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:466)
>   at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:371)
>   at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:945)
>   at hudson.plugins.sshslaves.SSHLauncher.access$400(SSHLauncher.java:133)
>   at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
>   at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:696)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> [06/11/15 02:37:46] Launch failed - cleaning up connection
> [06/11/15 02:37:46] [SSH] Connection closed.
> 
> 
> -- 
> Emmanuel Dreyfus
> m...@netbsd.org
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Emmanuel Dreyfus
On Thu, Jun 11, 2015 at 07:26:00AM +, Emmanuel Dreyfus wrote:
> In my opinion the fix to this problem is to start new VM. I was busy 
> on other fronts hence I did not watched the situation, but it is still
> grim, with most NetBSD slaves been in screwed state. We need to spin 
> more.

Launching the slave on the new VM fails, but for once we have a 
maningful error: either DNS names are duplicated, or jenkins has a bug.

<===[JENKINS REMOTING CAPACITY]===>ERROR: Unexpected error in launching a 
slave. This is probably a bug in Jenkins.
java.lang.IllegalStateException: Already connected
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:466)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:371)
at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:945)
at hudson.plugins.sshslaves.SSHLauncher.access$400(SSHLauncher.java:133)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:696)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
[06/11/15 02:37:46] Launch failed - cleaning up connection
[06/11/15 02:37:46] [SSH] Connection closed.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Emmanuel Dreyfus
On Thu, Jun 11, 2015 at 12:57:58PM +0530, Kaushal M wrote:
> The problem was nbslave71. It used to be picked first for all changes
> and would fail instantly. I've disabled it now. The other slaves are
> working correctly.

Saddly the Jenkins upgrade did not help here. Last time I investigated
the failure was caused by the master breaking connexion, but I was not
able to understand why. 

I w once able to receover a VM by fiddeling the jenkins configuration
in web UI, but experimenting is not easy, as a miss will drain all the 
queue into complete failures.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Kaushal M
The problem was nbslave71. It used to be picked first for all changes
and would fail instantly. I've disabled it now. The other slaves are
working correctly.

~kaushal

On Thu, Jun 11, 2015 at 12:56 PM, Emmanuel Dreyfus  wrote:
> On Thu, Jun 11, 2015 at 12:39:43PM +0530, Atin Mukherjee wrote:
>> Can we start merging patches with out NetBSD's vote? Currently we have
>> so many patches waiting for NetBSD's vote and it seems like no vms are
>> apparently running as well. This is blocking us to move forward.
>
> In my opinion the fix to this problem is to start new VM. I was busy
> on other fronts hence I did not watched the situation, but it is still
> grim, with most NetBSD slaves been in screwed state. We need to spin
> more.
>
> --
> Emmanuel Dreyfus
> m...@netbsd.org
> ___
> Gluster-infra mailing list
> Gluster-infra@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-infra
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Emmanuel Dreyfus
On Thu, Jun 11, 2015 at 12:39:43PM +0530, Atin Mukherjee wrote:
> Can we start merging patches with out NetBSD's vote? Currently we have
> so many patches waiting for NetBSD's vote and it seems like no vms are
> apparently running as well. This is blocking us to move forward.

In my opinion the fix to this problem is to start new VM. I was busy 
on other fronts hence I did not watched the situation, but it is still
grim, with most NetBSD slaves been in screwed state. We need to spin 
more.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra