Re: [Gluster-infra] Can I update /opt/qa for NetBSD?

2016-08-23 Thread Emmanuel Dreyfus
Nigel Babu  wrote:

> Practically, I only need someone to look at one line:

Where is this fs used? 

For instance, mount(8) knows about ffs, not UFS:
$ mount
/dev/raid0a on / type ffs (log, local)
/dev/raid0e on /mail type ffs (log, nodev, nosuid, local, with quotas)
/dev/raid1a on /ssd type ffs (log, nodev, nosuid, local)
kernfs on /kern type kernfs (local)


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] NetBSD jobs hang on umount

2016-08-19 Thread Emmanuel Dreyfus
Nigel Babu  wrote:

> Atin says we've noticed this in the past and somehow fixed it. Do you
> recall what we did to fix it?

Is it the same problem? The key test is ps -axl and observe the WCHAN
column for stuck umount process. If it is tstile then this is the
ancient bug.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Can I update /opt/qa for NetBSD?

2016-08-09 Thread Emmanuel Dreyfus
Nigel Babu  wrote:

> I'll definitely appreciate any feedback you can have in terms
> of code when it's ready for review.

No problem. But regresson infrastructure will catch any issue better
that I would, anyway. :-)

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Can I update /opt/qa for NetBSD?

2016-08-06 Thread Emmanuel Dreyfus
Nigel Babu  wrote:

> * A bunch of cleanup scripts are in build.sh. I think they can move into the
>   Jenkins job itself. That's what we do for linux boxes.

Feel free to do it, but is there a benefit? If it is not broken, do not
fix it...

> * Some test files have been made to 0644. Is this relevant or are these
>   accidental changes?

Probable accidental change.

> * Can the Python path and stuff be declared before the build.sh script is
>   called in the Jenkins script? I don't know if this will work, I'll have to
>   test.

You need to run build.sh so that configure is invoked and env.rc is
created with appropriate @BUILD_PYTHON_SITE_PACKAGES@ set
(see tests/env.rc.in)

> * I think we now have a standard way of skipping tests in the test runner
>   itself rather than deleting the tests from the checkout. If not, I'll drive
>   these fixes.

It may not scale when you want to skip all the bugs directory.

> * The check for whether there's two jobs assigned to the same machine can
> be controlled from Jenkins and I plan to do that, so we can probably
> remove that code as well.

That seems better, as the current setting bugs when a job is manually
cancelled and retriggered (here is a point to fix!)

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] netbsd7.cloud.gluster.org

2016-07-26 Thread Emmanuel Dreyfus
Nigel Babu  wrote:

> Oh, it's in the pool for netbsd-7 smoke, which we don't run anymore. Shall the
> I kill the machine, then?

No problem for me.
 
> The smoke is perhaps just a build, which we do during regressions on netbsd7
> anyway.

And we do smoke on netbsd-6 anyway.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] netbsd7.cloud.gluster.org

2016-07-25 Thread Emmanuel Dreyfus
Nigel Babu  wrote:

> I notice that this machine is a bit different from the others in terms of
> partition and also runs 7.0 BETA. Is this intentional or does it make sense to
> re-imagine this machine with one of the other machines? I've had the new
> partition created on all the other NetBSD 7 machines and it seems to be going
> well so far.

IIRC it was used for netbsd-7 smoke tests but jenkins setup broke at
some point, leaving us with netbsd-6 smoke test on
netbsd0.cloud.gluster.org and netbsd-7 regression on
nbslave7x.cloud.gluster.org

I am not sure netbsd7.cloud.gluster.org is used anymore.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Netbsd folders filling up

2016-07-18 Thread Emmanuel Dreyfus
Nigel Babu  wrote:

> Thank you. I'll give that a shot. I also want to setup ntp

Add ntpd=YES in /etc/rc.conf and run /etc/rc.d/ntpd start
Note that the ntpdcurrently installed needs a secrity update.

> and change passwords for all the machines in one go.

Do it on one machine using vipw, copy /etc/master.passwd and run
pwd_mkdb -p /etc/master.passwd eveyrwhere to regenerate /etc/passwd 


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Netbsd folders filling up

2016-07-18 Thread Emmanuel Dreyfus
Nigel Babu  wrote:

> Oh, can I apply this to all the machines in one go?

disklabel as is works with an interactive editor, but you can ealso
disklabel xbd0 > protofile then tweak the file and use  disklabel -R
xbd0 protofile to load is in a batch.

Or you can just modify nbslave70, image it and deploy to other machines,
it would not hurt.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Netbsd folders filling up

2016-07-18 Thread Emmanuel Dreyfus
On Mon, Jul 18, 2016 at 09:37:19AM +, Emmanuel Dreyfus wrote:
> On Mon, Jul 18, 2016 at 10:35:45AM +0530, Nigel Babu wrote:
> > Would it be problematic if I added 20GB of block storage per machine for the
> > /build, /home/jenkins and /archives folder? That should easily sort out our
> > disk space troubles.
> 
> No, but first check that current image does not have some spare space
> beyond the / partition

That is the case: diskabel xbd0 says
#sizeoffset fstype [fsize bsize cpg/sgs]
 a:  1992288163 4.2BSD   2048 16384 0  # (Cyl.  0*-   9727)
 b:   4194304  19922944   swap # (Cyl.   9728 -  11775)
 c:  2097145763 unused  0 0# (Cyl.  0*-  10239)
 d:  83886080 0 unused  0 0# (Cyl.  0 -  40959)
 e:   8388608  24117248 4.2BSD   2048 16384 0  # (Cyl.  11776 -  15871)

NetBSD has some historic curiosity: c is the NetBSD partition in MBR, d is
the whole disk. This means you have 51380224 sectors of 512 bytes left after
partiton e: 24 GB.

Run disklabel -e xbd0 and add a f line:
 f: 51380161   32505856 4.2BSD   2048 16384 0

While there it will not hurt to resize c (for the sake of clarity)
 c: 83886017 63 unused  0 0  

And still while there, fdisk -iau xbd0 to ajust NetBSD partiton size in MBR.

Then you can 
newfs /dev/rxbd0f  
add /dev/xbd0f in :etc/fstab
mount /dev/xbd0f


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Netbsd folders filling up

2016-07-18 Thread Emmanuel Dreyfus
On Mon, Jul 18, 2016 at 10:35:45AM +0530, Nigel Babu wrote:
> Would it be problematic if I added 20GB of block storage per machine for the
> /build, /home/jenkins and /archives folder? That should easily sort out our
> disk space troubles.

No, but first check that current image does not have some spare space
beyond the / partition

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Netbsd folders filling up

2016-07-15 Thread Emmanuel Dreyfus
On Fri, Jul 15, 2016 at 03:04:39PM +0530, Nigel Babu wrote:
> Would it be okay to write a cron to clean up anything older than 15 days in
> /build/install and /archives?

You have to cleanup after some time. How is it handled on Linux boxen?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Netbsd folders filling up

2016-07-15 Thread Emmanuel Dreyfus
On Fri, Jul 15, 2016 at 07:19:17AM +, Emmanuel Dreyfus wrote:
> On Fri, Jul 15, 2016 at 10:59:04AM +0530, Nigel Babu wrote:
> > nbslave77.cloud.gluster.org
> 
> That one has 1,6 Go of logs in /build/install/var/log/glusterfs

And if you look for free space, you can wipe /usr/pkgsrc (2,6 Go)
which is not used.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Netbsd folders filling up

2016-07-15 Thread Emmanuel Dreyfus
On Fri, Jul 15, 2016 at 10:59:04AM +0530, Nigel Babu wrote:
> nbslave77.cloud.gluster.org

That one has 1,6 Go of logs in /build/install/var/log/glusterfs

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Netbsd folders filling up

2016-07-14 Thread Emmanuel Dreyfus
Nigel Babu  wrote:

> Does anyone know which folders are usually filled up quickly on netbsd
> machines? A lot of the machines are offline merely because they're out of
> diskspace. I'm working through them. I've cleared out old files in /build
> and /archives. Is there anywhere else I should be looking?

What are the offending machines?

Core files are configured to go in /var/crash, but /var/* should be a
good place to look at.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Putting the netbsd builders in the ansible pool ?

2016-06-09 Thread Emmanuel Dreyfus
Michael Scherer  wrote:

> The only issue I face is that you flagged most of /usr as unchangeable,
> and I do not know how cleanly it would be to remove the flags before
> applying changes and apply that again with the current layout of our
> ansible roles. But I will figure something out.

I did this because of a glusterfs bug that overrote random file with
logs.

I tend to use it that way to overrite a file:
cat hosts | ssh root@host "chflags nouchg /etc/hosts; cat > /etc/hosts;
chflags uchg /etc/hosts"

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Putting the netbsd builders in the ansible pool ?

2016-06-08 Thread Emmanuel Dreyfus
Michael Scherer  wrote:

> I connected to it from rackspace and stopped rpcbind in a hurry after
> being paged, but I would like to make sure that the netbsd builders are
> a bit more hardened (even if they are already well hardened from what I
> did see, even if there is no firewall), as it seems most of them are
> also running rpcbind (and sockstat show they are not listening only on
> localhost).

I created minimal filtering rules in /etc/ipf.conf and restarted
rpcbind. I did the same for  others NetBSD vm.

> Emmanuel, would you be ok if we start to manage them with ansible like
> we do for the Centos ones ? 

I have no problem with it, but I must confess a complete lack of
experience with this tool.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] Requesting for a NetBSD setup

2016-05-02 Thread Emmanuel Dreyfus
On Mon, May 02, 2016 at 01:55:43PM +0530, Manikandan Selvaganesh wrote:
> Could you please provide us a NetBSD machine as the test cases are failing
> and we need to have a look on it?

nbslave72.cloud.gluster.org was put offline for some jenkins breakage
that todes not seems to be slave-related: I gave it a quick try, 
and it is able to build and run tests.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] hang in netbsd regression while building

2016-04-30 Thread Emmanuel Dreyfus
Raghavendra Gowdappa  wrote:

> While trying to get netbsd regressions passed on [1], I am seeing hangs in
> building glusterfs. I had seen similar behavior for other patches
> earlier too, but Kaushal had fixed it. Any help is appreciated. Also, if
> you let me know a procedure to fix this issue if I encounter the same in
> future, I can do it myself.

The machine is stuck in a bad corner case from previous run, and cannot
cleanup for the new run. ps -axl shows many umount processes in tstile
wchan. reboot -n is adivsed in such a situation.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] Requesting for NetBSD setup

2016-04-29 Thread Emmanuel Dreyfus
On Fri, Apr 29, 2016 at 12:40:19PM +0530, Kaushal M wrote:
> I often disconnect machines that aren't in a working state, and reboot them.
> If I've left something in the disconnected state, most likely those
> machines didn't get back to a working state after the reboot.
> Or it could be that I just forgot.

I just checked out master onnbslave74, built and ran tests, it seems
fine.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] Requesting for NetBSD setup

2016-04-29 Thread Emmanuel Dreyfus
On Fri, Apr 29, 2016 at 01:28:53AM -0400, Karthik Subrahmanya wrote:
> I would like to ask for a NetBSD setup

nbslave7[4gh] are disabled in Jenkins right now. They are labeled 
"Disconnected by kaushal", but I don't kno why. Once it is confirmed
that they are not alread used for testing, you could pick one. 

I still does not know who is the password guardian at Rehat, though.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] regression machines reporting slowly ? here is the reason ...

2016-04-25 Thread Emmanuel Dreyfus
On Sun, Apr 24, 2016 at 03:59:40PM +0200, Niels de Vos wrote:
> Well, slaves go into offline, and should be woken up when needed.
> However it seems that Jenkins fails to connect to many slaves :-/

Nothing new here. I tracked this kind of toruble with NetBSD slaves
and only got frustration as the result.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Different version of run-tests.sh in jenkin slaves?

2016-01-28 Thread Emmanuel Dreyfus
On Thu, Jan 28, 2016 at 12:17:58PM +0530, Raghavendra Talur wrote:
> Where do I find config in NetBSD which decides which location to dump core
> in?

I crafted the patch below, bbut it is probably much simplier to just
set kern.defcorename to /%n-%p.core on all VM slaves. I will do it.

diff --git a/xlators/storage/posix/src/posix.c 
b/xlators/storage/posix/src/posix.c
index 272d08f..2fd2d7d 100644
--- a/xlators/storage/posix/src/posix.c
+++ b/xlators/storage/posix/src/posix.c
@@ -29,6 +29,10 @@
 #include 
 #endif /* HAVE_LINKAT */
 
+#ifdef __NetBSD__
+#include 
+#endif /* __NetBSD__ */
+
 #include "glusterfs.h"
 #include "checksum.h"
 #include "dict.h"
@@ -6631,6 +6635,8 @@ init (xlator_t *this)
 _private->path_max = pathconf(_private->base_path, _PC_PATH_MAX);
 if (_private->path_max != -1 &&
 _XOPEN_PATH_MAX + _private->base_path_length > _private->path_max) 
{
+char corename[] = "/%n-%p.core";
+
 ret = chdir(_private->base_path);
 if (ret) {
 gf_msg (this->name, GF_LOG_ERROR, 0,
@@ -6639,7 +6645,15 @@ init (xlator_t *this)
 _private->base_path);
 goto out;
 }
+
 #ifdef __NetBSD__
+/* 
+ * Make sure cores go to the root and not in current 
+ * directory
+ */
+(void)sysctlbyname("proc.curproc.corename", NULL, NULL, 
+   corename, strlen(corename) + 1);
+
 /*
  * At least on NetBSD, the chdir() above uncovers a
  * race condition which cause file lookup to fail


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] Different version of run-tests.sh in jenkin slaves?

2016-01-28 Thread Emmanuel Dreyfus
On Thu, Jan 28, 2016 at 12:10:49PM +0530, Atin Mukherjee wrote:
> So does that mean we never analyzed any core reported by NetBSD
> regression failure? That's strange.

We got the cores from / but not from d/backends/*/ as I understand.

I am glad someone figured out the mystery. 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Different version of run-tests.sh in jenkin slaves?

2016-01-28 Thread Emmanuel Dreyfus
On Thu, Jan 28, 2016 at 12:17:58PM +0530, Raghavendra Talur wrote:
> Where do I find config in NetBSD which decides which location to dump core
> in?

sysctl kern.defcorename for the default location and name. It can be
overriden per process using  sysctl proc.$$.corename

> Any particular reason you added /d/backends/*/*.core to list of path to
> search for core?

Yes, this is required for standard compliance of the exposed glusterfs
filesystem in the case of low system PATH_MAX. See in posix.c:

/*  
 * _XOPEN_PATH_MAX is the longest file path len we MUST 
 * support according to POSIX standard. When prepended
 * by the brick base path it may exceed backed filesystem
 * capacity (which MAY be bigger than _XOPEN_PATH_MAX). If
 * this is the case, chdir() to the brick base path and
 * use relative paths when they are too long. See also
 * MAKE_REAL_PATH in posix-handle.h   
  */  
_private->path_max = pathconf(_private->base_path, _PC_PATH_MAX);
if (_private->path_max != -1 &&   
_XOPEN_PATH_MAX + _private->base_path_length > _private->path_max) {
ret = chdir(_private->base_path); 
if (ret) {
gf_msg (this->name, GF_LOG_ERROR, 0,
P_MSG_BASEPATH_CHDIR_FAILED,
"chdir() to \"%s\" failed",
_private->base_path);
goto out;
}
And the core goes in current directory by default. We could use
sysctl(3) to change that if we need.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] Netbsd regressions are failing because of connection problems?

2016-01-21 Thread Emmanuel Dreyfus
Michael Scherer  wrote:

> Depend, if they exhausted FD or something ? I am not a java specialist.

It is not the same errno, AFAIK.
 
> Could also just be too long to answer due to the load, but it was not
> loaded :/

High loads give timeouts. I may be wrong, but I beleive connection
refused is really when it gets a TCP RST.


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] Netbsd regressions are failing because of connection problems?

2016-01-21 Thread Emmanuel Dreyfus
On Thu, Jan 21, 2016 at 04:49:28PM +0100, Michael Scherer wrote:
> > review.gluster.org[0: 184.107.76.10]: errno=Connection refused
> 
> SO I found nothing in gerrit nor netbsd. ANd not the DNS, since it
> managed to resolve stuff fine.
> 
> I suspect the problem was on gerrit, nor on netbsd. Did it happened
> again ?

I could imagine problems with exhausted system resources, but it would
not produce a "Connection refused".

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] Netbsd regressions are failing because of connection problems?

2016-01-20 Thread Emmanuel Dreyfus
Vijay Bellur  wrote:

> Does not look like a DNS problem. It is happening to me outside of
> rackspace too.

I mean I have already seen rackspace VM failing to initiate connexions
because rackspace DNS failed to answer DNS requests. This was the cause
of failed regression at some time.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] Netbsd regressions are failing because of connection problems?

2016-01-20 Thread Emmanuel Dreyfus
Vijay Bellur  wrote:

> There is some problem with review.gluster.org now. git clone/pull fails
> for me consistently.

First check DNS is working. I recall seing rackspace DNS failing to
answer.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regression fixes

2016-01-18 Thread Emmanuel Dreyfus
Hi all

I have the followif changes awaiting code review/merge:
http://review.gluster.org/13204
http://review.gluster.org/13205
http://review.gluster.org/13245
http://review.gluster.org/13247

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] NetBSD regression fixes

2016-01-16 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> But I just realized the change is wrong, since running tests "new way"
> stops on first failed test. My change just retry the failed test and
> considers the regression run to be good on success, without running next
> tests.
> 
> I will post an update shortly.

Done:
http://review.gluster.org/13245
http://review.gluster.org/13247
-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] NetBSD regression fixes

2016-01-16 Thread Emmanuel Dreyfus
Niels de Vos  wrote:

> > 2) Spurious failures
> > I added a retry-failed-test-once feature so that we get less regression
> > failures because of spurious failures. It is not used right now because
> > it does not play nicely with bad tests blacklist.
> > 
> > This will be fixed by that changes:
> > http://review.gluster.org/13245
> > http://review.gluster.org/13247
> > 
> > I have been looping failure-free regression for a while with that trick.
> 
> Nice, thanks for these improvements!

But I just realized the change is wrong, since running tests "new way"
stops on first failed test. My change just retry the failed test and
considers the regression run to be good on success, without running next
tests.

I will post an update shortly.

> Could you send a pull request for the regression.sh script on
> https://github.com/gluster/glusterfs-patch-acceptance-tests/ ? Or, if
> you dont use GitHub, send the patch by email and we'll take care of
> pushing it for you.

Sure, but let me settle on something that works first.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


[Gluster-infra] NetBSD regression fixes

2016-01-16 Thread Emmanuel Dreyfus
Hello all

Here are the problems identified in NetBSD regression so far:

1) Before starting regression, slave compains about "vnconfig:
VNDIOCGET: Bad file descriptor" and fails the run.

This will be fixed by that changes:
http://review.gluster.org/13204
http://review.gluster.org/13205


2) Spurious failures
I added a retry-failed-test-once feature so that we get less regression
failures because of spurious failures. It is not used right now because
it does not play nicely with bad tests blacklist.

This will be fixed by that changes:
http://review.gluster.org/13245
http://review.gluster.org/13247

I have been looping failure-free regression for a while with that trick.


3) Stale state from previous regression
We sometime have processes stuck from previous regression, awaiting
vnode locks for destroyed NFS filesystems. This cause starting cleanup
scripts to hang before starting regression and we get a timeout.

I modified slave's /opt/qa/regression.sh to check for stuck processes
and reboot the system if we find them. That will fail the current
regression run, but at least the next ones coming after reboot will be
safe.

This fix is not deployed yet, I await the fixes from point 2 to be
merged


4) Jenkins casts concurent runs on the same slave
We observed Jenkins sometimes runs two jobs on the same slave at once,
which of course can only lead to horrible failure.

I modified slave's /opt/qa/regression.sh to add a lock file so that this
situation is detected early and reported. The second regression will
fail, but the idea is to get a better understanding of how that can
occur.

This fix is not deployed yet, I await the fixes from point 2 to be
merged

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-10 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> While trying to reproduce the problem in
> ./tests/basic/afr/arbiter-statfs.t, I came to many failures here:
> 
> [03:53:07] ./tests/basic/afr/split-brain-resolution.t 

I was running tests from wrong directory :-/
This one is fine with HEAD.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-10 Thread Emmanuel Dreyfus
Pranith Kumar Karampuri  wrote:

> I tried to look into 3 instances of this failure:
(...)
> same issue as above, two tests are running in parallel.

How is it possible? A & that sends a job in the background?
Are we sure it is the same regression test run? Or is it two regression
test runs that are scheduled simultaneously?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-09 Thread Emmanuel Dreyfus
Pranith Kumar Karampuri  wrote:

> tests/basic/afr/arbiter-statfs.t

I posted patches to fix this one (but it seems Jenkins is down? No
regression is running)

> tests/basic/afr/self-heal.t
> tests/basic/afr/entry-self-heal.t

That two ones are still to be investigated, and it seems
tests/basic/afr/split-brain-resolution.t is now reliabily broken as
well.

> tests/basic/quota-nfs.t 

That one is marked as bad test and should not cause harm on spurious
failure as its result is ignored.

I am trying to reproduce a spurious VM reboot during tests by looping on
the whole test suite on nbslave70, with reboot on panic disabled (it
will drop into kernel debugger instead). No result so far.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
Pranith Kumar Karampuri  wrote:

> With your support I think we can make things better. To avoid 
> duplication of work, did you take any tests that you are already 
> investigating? If not that is the first thing I will try to find out.

While trying to reproduce the problem in
./tests/basic/afr/arbiter-statfs.t, I came to many failures here:

[03:53:07] ./tests/basic/afr/split-brain-resolution.t .. 20/43 getfattr:
Removin
g leading '/' from absolute path names
cat: /mnt/glusterfs/0/data-split-brain.txt: Input/output error
not ok 25 Got "" instead of "brick0_alive"
cat: /mnt/glusterfs/0/data-split-brain.txt: Input/output error
not ok 27 Got "" instead of "brick1_alive"
getfattr: Removing leading '/' from absolute path namesnot ok 30 Got ""
instead of "brick0"
not ok 32 Got "" instead of "brick1"

It is not in the lists posted here. Is it only at mine?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> > With your support I think we can make things better. To avoid duplication of
> > work, did you take any tests that you are already investigating? If not that
> > is the first thing I will try to find out.
> 
> I will look at the ./tests/basic/afr/arbiter-statfs.t problem with
> loopback device.

I tracked it down: vnconfig -l complains about "vnconfig: VNDIOCGET: Bad
file descriptor" when we had a configured loopback device with the
backing store on a filesystem we unmounted.

# dd if=/dev/zero of=/scratch/backend bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 3.034 secs (34560843 bytes/sec)
# vnconfig vnd0 /scratch/backend
# vnconfig -l
vnd0: /scratch (/dev/xbd1a) inode 6
vnd1: not in use
vnd2: not in use
vnd3: not in use
# umount -f /scratch/
# vnconfig -l 
vnconfig: VNDIOCGET: Bad file descriptor

But it seems the workaround is easy:
# vnconfig -u vnd0
# vnconfig -l  
vnd0: not in use
vnd1: not in use
vnd2: not in use
vnd3: not in use

Here is my fixes:
http://review.gluster.org/13204 (master)
http://review.gluster.org/13205 (release-3.7)

And while there, a portability fix in rfc.sh:
http://review.gluster.org/13206 (master)
That bug is not present in release-3.7.
 
-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
On Fri, Jan 08, 2016 at 09:57:01PM +0530, Pranith Kumar Karampuri wrote:
> >Next step is to look for loopback devices which backing store are in $B0
> >and unconfigure them.
> Oops, wrong code reading. Is it possible to have loopback devices not in
> use, that we miss out on destroying? Could be a stupid question but still
> asking.

Well the kernel tells us it is not in use. I am not sure what you mean.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
On Fri, Jan 08, 2016 at 08:37:16PM +0530, Pranith Kumar Karampuri wrote:
> NetBSD)
> vnd=`vnconfig -l | \
>  awk '!/not in use/{printf("%s%s:%d ", $1, $2, $5);}'`
> 
> Can there be Loopback devices that are in use when this piece of the code is
> executed, which can lead to the problems we ran into? I may be completely
> wrong. It is a wild guess about something I don't completely understand.

This lists loopback devices in use. For instance:
vnd0:/d:180225 vnd1:/d:180226 vnd2:/d:180227

Next step is to look for loopback devices which backing store are in $B0
and unconfigure them.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
On Fri, Jan 08, 2016 at 10:56:22AM +, Emmanuel Dreyfus wrote:
> On Fri, Jan 08, 2016 at 03:18:02PM +0530, Pranith Kumar Karampuri wrote:
> > With your support I think we can make things better. To avoid duplication of
> > work, did you take any tests that you are already investigating? If not that
> > is the first thing I will try to find out.
> 
> I will look at the ./tests/basic/afr/arbiter-statfs.t problem with
> loopback device.

800 rusn so far without a hitch. I suspect the problem is caused by the leftover
of another test.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
On Fri, Jan 08, 2016 at 03:18:02PM +0530, Pranith Kumar Karampuri wrote:
> With your support I think we can make things better. To avoid duplication of
> work, did you take any tests that you are already investigating? If not that
> is the first thing I will try to find out.

I will look at the ./tests/basic/afr/arbiter-statfs.t problem with
loopback device.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
On Fri, Jan 08, 2016 at 05:11:22AM -0500, Jeff Darcy wrote:
> [08:45:57] ./tests/basic/afr/arbiter-statfs.t .. 
> [08:43:03] ./tests/basic/afr/arbiter-statfs.t .. 
> [08:40:06] ./tests/basic/afr/arbiter-statfs.t .. 
> [08:08:51] ./tests/basic/afr/arbiter-statfs.t .. 
> [08:06:44] ./tests/basic/afr/arbiter-statfs.t .. 
> [08:00:54] ./tests/basic/afr/self-heal.t .. 
> [07:59:56] ./tests/basic/afr/entry-self-heal.t .. 
> [18:05:23] ./tests/basic/quota-anon-fd-nfs.t .. 
> [18:06:37] ./tests/basic/quota-nfs.t .. 
> [18:49:32] ./tests/basic/quota-anon-fd-nfs.t .. 
> [18:51:46] ./tests/basic/quota-nfs.t .. 
> [14:25:37] ./tests/basic/quota-anon-fd-nfs.t .. 
> [14:26:44] ./tests/basic/quota-nfs.t .. 
> [14:45:13] ./tests/basic/tier/record-metadata-heat.t .. 

That is 6 tests, they could be disabled or ignored.

> So some of us *have* done that work, in a repeatable way.  Note that the
> list doesn't include tests which *hang* instead of failing cleanly,
> which has recently been causing the entire NetBSD queue to get stuck
> until someone manually stops those jobs.  What I find disturbing is the
> idea that a feature with no consistently-available owner or identifiable
> users can be allowed to slow or block every release unless every
> developer devotes extra time to its maintenance.  Even if NetBSD itself
> is worth it, I think that's an unhealthy precedent to set for the
> project as a whole.

For that point, we could start the regression script by:
( sleep 7200 && /sbin/reboot -n ) &

And end it with:
kill %1

Does it seems reasonable? That way nothing can hang more than 2 hours.
-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
On Fri, Jan 08, 2016 at 03:18:02PM +0530, Pranith Kumar Karampuri wrote:
> Should the cleanup script needs to be manually executed on the NetBSD
> machine?

You can run the script manually, but if the goal is to restore a 
misbehaving machine, rebooting is probbaly the fastest way to sort 
the issue.

While thinking about it, I suspect there may be some benefit
into rebooting the machine if the regression does not finish 
within a sane amount of time. 

> >First step could be to parse jenkins logs and find which test fail or hang
> >most often in NetBSD regression
> 
> This work is under way. I will have to change some of the scripts I wrote to
> get this information.

Great.

> To avoid duplication of work, did you take any tests that you are 
> already investigating? If not that is the first thing I will try to find out.

No, I did not started investigating right now because I have no idea where 
I should look at. Your input will be very valuable.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
On Fri, Jan 08, 2016 at 12:42:36PM +0530, Sachidananda URS wrote:
> I have a NetBSD 7.0 installation which I can share with you, to get
> started.
> Once manu@ gets back on a specific version, I can set that up too.

NetBSD 7.0 is fine and has everything required in GENERIC kernel.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
On Fri, Jan 08, 2016 at 11:45:20AM +0530, Pranith Kumar Karampuri wrote:
> 1) How to set up NetBSD VMs on my laptop which is of exact version as the
> ones that are run on build systems.

Well, the easier way is to pick the VM image we run at rackspace, which
relies on Xen. If you use an hardware virtualization system, we just need
to change the kernel and use NetBSD-7.0 GENERIC one. What hypervisor 
do you use?

Alternatively it is easy to make a fresh NetBSD install. The only trap 
is that glusterfs backing store filesystem must be formatted in FFFv1
format to get extended attrbiute support (this is obtained by  newfs -O1).

> 2) How to prevent NetBSD machines hang when things crash (At least I used to
> see that the machines hang when fuse crashes before, not sure if this is
> still the case)? (This failure needs manual intervention at the moment on
> NetBSD regressions, if we make it report failures and pick next job that
> would be the best way forward)

It depends what we are talking about. If this is a moint point that does 
not want to unmount, killing the perfused daemon (which is the bridge 
between FUSE and native PUFFS) will help. The cleanup script does it.
Do you have a hang example?

> 3) We should come up with a list of known problems and how to troubleshoot
> those problems, when things are not going smooth in NetBSD. Again, we really
> need to make things automatic, this should be last resort. Our top goal
> should be to make NetBSD machines report failures and go to execute next
> job.

This is the frustrating point for me: we have complains that things go bad,
but we do not have data about chat tests caused troubles. Fixing the problem
underlying unbacked complains means we will have to gather data on our own.

First step could be to parse jenkins logs and find which test fail or hang
most often in NetBSD regression. 

> 4) How can we make debugging better in NetBSD? In the worst case we can make
> all tests execute in trace/debug mode on NetBSD.
> 
> I really want to appreciate the fine job you have done so far in making sure
> glusterfs is stable on NetBSD.

Thanks! I must confess the idea of having the NetBSD port demoted is a bit
depressing given the amount of work I invested in it.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-07 Thread Emmanuel Dreyfus
Ravishankar N  wrote:

> > I am a bit disturbed by the fact that people raise the
> > "NetBSD regression ruins my life" issue without doing the work of
> > listing the actual issues encountered.
> I already did earlier- the lack of infrastructure to even find out what
> caused the issue in the first place.  

I meant: what test exhibited spurious failure or hang? You can see that
from the regression test run. Previous experence makes me suspect we
will narrow the problem to a few tests that can be disabled.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-07 Thread Emmanuel Dreyfus
Jeff Darcy  wrote:

> > Now what is the policy on post-merge regression failure?  What happens
> > if original submitter is now willing to investigate?
> 
> Then regressions will continue to fail on NetBSD, as they do now, but
> without impacting work on other platforms. 

Well from previous experience of maintaining NetBSD support without
mandatory regression, I am almost certain that it will quickly break.
The only relief in the post-merge regression scheme is that we will have
a precise idea of the change that caused the regression. 

In my opinion the best way forward would be to identify what tests cause
frequent NetBSD spurious failures and disable them for NetBSD
regression. I am a bit disturbed by the fact that people raise the
"NetBSD regression ruins my life" issue without doing the work of
listing the actual issues encountered.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-07 Thread Emmanuel Dreyfus
Avra Sengupta  wrote:

> Agree with your point. If we are ready to make exceptions, then we might
> as well not block all the patches. As Jeff suggested, triaging the 
> nightly/weekly results manually and making any serious issues a blocker
> should suffice. 

How are you going to make a serious issue a blocker? The serious issue
will be related to multiple patches, it will be impossible to tell which
one is the offender.

If we go that way, we need to run a regression for each merged patch,
which will be much less load than today.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

2016-01-07 Thread Emmanuel Dreyfus
On Thu, Jan 07, 2016 at 03:01:41PM +0530, Avra Sengupta wrote:
> Why is this a bad idea?
 
Because each week you will have multiple regressins introduced. Few
people will be willing to investigate wether they pushed a patch that
caused a regression, and that few people will have to deal the others
regressions they did not cause. Being in the situation to fix the 
regression test will be a rare event, and the whole thing will quickly rot.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] Lot of Netbsd regressions 'Waiting for the next available executor'

2015-12-31 Thread Emmanuel Dreyfus
On Thu, Dec 31, 2015 at 03:57:15PM +0530, Raghavendra Talur wrote:
> You can log in. I think the HUP signal did not cause any
> change in process state. I still see it in I state.
> pid is 10967.

That one is perl runninf proce quota.t. I beleive 15221 is the
stuck one.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] Lot of Netbsd regressions 'Waiting for the next available executor'

2015-12-31 Thread Emmanuel Dreyfus
On Thu, Dec 31, 2015 at 03:22:54PM +0530, Raghavendra Talur wrote:
> Manu, this seems to be a bug in libperfuse and not in Gluster.
> The machine is nbslave75.cloud.gluster.org. You will have to rerun
> quota.t couple of times to hit the bug. The test would hang in line 62(TEST
> 24).

There is a jenkins job running on that machine. May I proceed? Where
is the relevant test suite?

A nice way of handling over the bug to someone else could be to run in 
the screen utility.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] Lot of Netbsd regressions 'Waiting for the next available executor'

2015-12-24 Thread Emmanuel Dreyfus
On Thu, Dec 24, 2015 at 01:44:21AM -0500, Raghavendra Gowdappa wrote:
> > Seems to be hung. May be a hung syscall? I've tried to kill it, but seems
> > like its not dead. May be patch #12594 is causing some issues on netbsd. It
> > has passed gluster regression.

ps -axl shows PID 1394 (umount) waiting on tstile, which is used for
spinlocks. No process should sit there for long, unless there is a 
kernel locking problem (which may be a userland locking problem 
thanks to FUSE).

Using crash(8) I can see umount is awaiting for a vnode lock. There
is certainly something to investigate but I lack time for now. I 
issued a reboot. Please tell me if you can reproduce it.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regression failures

2015-10-23 Thread Emmanuel Dreyfus
On Fri, Oct 23, 2015 at 01:52:59PM +0530, Ravishankar N wrote:
> All arbiter-statfs.t tests that are failing are on
> nbslave74.cloud.gluster.org.
> Loopback mounts are not happening on that slave. Perhaps it needs to be
> rebooted.

Indeed: the test passes on nbslave70. Someone reboot nbslave74?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] NetBSD tests stuck at the same place

2015-09-24 Thread Emmanuel Dreyfus
On Thu, Sep 24, 2015 at 05:57:45AM -0400, Krutika Dhananjay wrote:
> No matter how many times I (re)trigger the regression tests in NetBSD for 
> http://review.gluster.org/#/c/12213/ , they seem to get stuck at the same 
> point every time. 
> See 
> https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/10473/console
>  for instance. 

I will be able to look at this in a few hours. In the meantime, check 
that the filesystems of the test node are not full.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Recent changes in regression.sh

2015-08-31 Thread Emmanuel Dreyfus
On Mon, Aug 31, 2015 at 07:53:49PM +0530, Raghavendra Talur wrote:
> We started seeing errors like "/opt/qa/regression.sh: 120: Syntax error:
> end of file unexpected (expecting ")"" in jenkins runs today.

What run?

> I suspect it is one of the recent changes in regression.sh which might have
> caused it.

Yes, I am surely the culprit.

> Comparing the version at github and one at nbslave77.cloud.gluster.org I
> found quite a few differences. If someone is aware of recent changes need
> help in fixing it.

What difference do you have?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] An attempt to thwart G_LOG corruption

2015-08-23 Thread Emmanuel Dreyfus
Niels de Vos  wrote:

> Great idea! I was thinking of something like SElinux, but that is
> obviously not available for NetBSD.

There is some similar stuff on NetBSD I never learnt about. Immutable
flag is simple and will work.

I still have to decide if I include it in the image or if I prepare an
install script to "freeze" the setup once the VM is created.
-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] An attempt to thwart G_LOG corruption

2015-08-23 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> Let me know if it is too wide and causes trouble.

It was, I had to remove the immutable flag on:

/usr/pkg/lib/python2.7/site-packages/gluster/
=> we install glupy.py there

/etc/openssl
=> ssl-authz.t create key and cert there

And that lets a job pass regression while we have some guarantee that the
tests cannot easily corrupt the system.
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/9580/

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] An attempt to thwart G_LOG corruption

2015-08-23 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> Let me know if it is too wide and causes trouble. I


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


[Gluster-infra] An attempt to thwart G_LOG corruption

2015-08-22 Thread Emmanuel Dreyfus
Hello

We have a rogue test that appends log data to an incorrect open file
descriptors, clobebring various system and library files with logs. That
quickly renders regression slaves unusable.

I tried an exepriment to thwart that threat: NetBSD FFS filesystem
features an immutable flag, which tells even root cannot modify the
file. I applied it on nbslave7[1-j]  for the following files and
directories (and their children)
/.cshrc /.profile /altroot /bin /boot /boot.cfg /etc /grub /lib /libdata
/libexec /netbsd /netbsd7-XEN3PAE_DOMU /opt /rescue /root /sbin /stand
/usr

Let me know if it is too wide and causes trouble. If anyone wants to
experiment:
Recursively (-R) installs the flag in /usr:
  chflags -R uchg /usr
Recursively remove it:
  chflags -R nouchg /usr

We also have schg/noschg, which can be set at any time but can only be
removed by root in a single-user shell. I ruled out this because I am
not sure rackspace console access lets us use single user mode. 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Fresh NetBSD regression failures

2015-08-21 Thread Emmanuel Dreyfus
Avra Sengupta  wrote:

> All NetBSD regression failures are again failing (more like refusing to
> build), with the following error.

Random files clobbered by G_LOG?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regression failures

2015-08-19 Thread Emmanuel Dreyfus
On Wed, Aug 19, 2015 at 07:35:30AM -0400, Kotresh Hiremath Ravishankar wrote:
> 1. geo-rep does lazy umount of gluster volume which needs to be modified
>to use 'gf_umount_lazy' provided by libglusterfs, correct?

Yes, that spawns umountd, which does its best to emulate lazy unmount,,
by trying to unmount at regular times.

> 2. geo-rep uses lgetxattr, it is throwing 'undefined error', I tried searching
>for man page for lgetxattr in netBSD but couldn't find. Is there a known
>portability issue with it?

No but if you can provide me a simple test in C showing the problem, 
I will  be glad to fix the implementation.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regression failures

2015-08-18 Thread Emmanuel Dreyfus
Kotresh Hiremath Ravishankar  wrote:

> Since the geo-rep regression tests are failing only in NetBSD, Is there
> a way we can mask it's run only in NetBSD and let it run in linux?
> I am working on geo-rep issues with NetBSD. Once these are fixed we can
> enable on NetBSD as well.

Yes, I can wipe them from regression.sh before running the tests, like
we do for tests/bugs (never ported), tests/basic/tier/tier.t  and
tests/basic/ec (the two later used to pass but started exhibiting too
much spurious failures).

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] nbslave77 disabled

2015-08-17 Thread Emmanuel Dreyfus
On Mon, Aug 17, 2015 at 07:37:56AM +, Emmanuel Dreyfus wrote:
> > Michael/Manu, could you have a look at that?
> nbslave71 and nbslave79 seems very sick too.

I restored nbslave7[179]
-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] nbslave77 disabled

2015-08-17 Thread Emmanuel Dreyfus
On Mon, Aug 17, 2015 at 09:23:41AM +0200, Niels de Vos wrote:
> nbslave77 does not respond at all anymore (some errors related to
> pam_start, see screenshot). I have disabled it again, it probably needs
> a complete rebuild.
> 
> Michael/Manu, could you have a look at that?

Interesting: almost all files in /root and /etc were corupted with
glusterfs regression log messages appended to them.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] nbslave77 disabled

2015-08-17 Thread Emmanuel Dreyfus
On Mon, Aug 17, 2015 at 09:23:41AM +0200, Niels de Vos wrote:
> nbslave77 does not respond at all anymore (some errors related to
> pam_start, see screenshot). I have disabled it again, it probably needs
> a complete rebuild.
> 
> Michael/Manu, could you have a look at that?

nbslave71 and nbslave79 seems very sick too.



-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] nbslave7h.cloud.gluster.org fails to post voting to Gerrit

2015-07-23 Thread Emmanuel Dreyfus
Vijay Bellur  wrote:

> This required a gerrit db update and I have done that. Can you please
> check now?

Yes, it works. The test is to run as jenkins user:
ssh nb7bu...@review.gluster.org 'gerrit --help'

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] nbslave7h.cloud.gluster.org fails to post voting to Gerrit

2015-07-21 Thread Emmanuel Dreyfus
Niels de Vos  wrote:

> If you explain the steps that are needed to be done, and the location of
> the ssh keys, others should be able to fix this in the future too.

I confirm I have accidentaly overwritten the keys in the new image.
$ ssh r...@nbslave7h.cloud.gluster.org
# su -l jenkins
$ ssh nb7bu...@review.gluster.org gerrit --help
Permission denied (publickey).
$ ls -l .ssh
total 40
-rw---  1 jenkins  wheel   1675 Apr 14 13:47 id_rsa
-rw-r--r--  1 jenkins  wheel422 Apr 14 13:47 id_rsa.pub
-rw---  1 jenkins  wheel   1675 Dec 19  2014 id_rsa2048
-rw-r--r--  1 jenkins  wheel417 Dec 19  2014 id_rsa2048.oub
-rw-r--r--  1 jenkins  wheel  10508 Apr 14 13:47 known_hosts

The simpliest fix is to copy nbslave7h:/home/jenkins/.ssh/id_rsa in
review.gluster.org:~nb7build/.ssh/authorized_keys but I cannot do that.


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


[Gluster-infra] bluild.gluster.org self signed cert

2015-07-18 Thread Emmanuel Dreyfus
Hi

build.gluster.org presented me a self-signed certificate. I accepted it,
but please someone confirm it is on purpose.

While there, startssl offers free certs...

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Public key problem on new vms for NetBSD

2015-06-19 Thread Emmanuel Dreyfus
On Fri, Jun 19, 2015 at 11:45:04AM +0530, Pranith Kumar Karampuri wrote:
> I see that NetBSD regressions are passing but not able to give +1
> because of following problem:
> + ssh 'nb7bu...@review.gluster.org' gerrit review --message 
> ''\''http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/7046/consoleFull
> : SUCCESS'\''' --project=glusterfs --code-review=0 '--verified=+1'
> 276ba2dbd076a2c4b86e8afd0eaf2db7376ea2a8
> Permission denied (publickey).

Someone renegerated ~jenkins/.ssh/id_rsa  on a few nodes. I removed 
the new id_rsa and id_rsa.pub and replaced id_rsa by the right one
copied from a machine where it worked. 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Emmanuel Dreyfus
Vijay Bellur  wrote:

> I did dare just now and have rebooted Jenkins :). Let us see how this
> iteration works out.

Excellent! That fixed the Jenkins resolution problem, and we now have 10
NetBSD slave VM online. 

So we have two problems and their fixes available, for adding new VM:
- Weak upstream DNS service: worked around by /etc/hosts (a secondary
DNS would be more automatic, but at least it works)
- Jenkins has a DNS cache and needs a restart

How do ongoing jobs behaved on Jenkins restart? Did you have to restart
them all or did Jenkins care of it?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Emmanuel Dreyfus
Justin Clift  wrote:

> If the DNS problem does turn out to be the dodgy iWeb hardware firewall,
> then this fixes the DNS issue. (if not... well damn!)

The DNS problem was worked around by installing a /etc/hosts, but
jenkins does not realize it is there. It should probably be restarted,
but nobody dare to try.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Emmanuel Dreyfus
Niels de Vos  wrote:

> I'm not sure what limitation you mean. Did we reach the limit of slaves
> that Jenkins can reasonably address?

No I mean its inability to catch a new DNS record.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-18 Thread Emmanuel Dreyfus
On Thu, Jun 18, 2015 at 10:19:27AM +0200, Niels de Vos wrote:
> Good to know, but it would be much more helpful if someone could install
> VMs there and add them to the Jenkins instance... Who can do that, or
> who can guide someone else to get it done?

How  will that help, since we are having problems with Jenkin's
ability to get more hosts?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
Niels de Vos  wrote:

> Maybe, but I hope those issues stay masked when resolving the hostnames
> is more stable. When we have the other servers up and running, we would
> have a better understanding and options to investigate issues like this.

But Jenkins is still unable to launch an agent on e.g. nbslave75.
Perhaps it needs to be restarted?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Status of nbslave7x

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 03:00:29PM +, Emmanuel Dreyfus wrote:
> Oh no, it did, but nuked them all almost instantly (see below). I 
> disabled it again. Basically we have borken jenkins setups, and DNS
> trouble prevent us from adding new VM. What a mess.

I retriggered most of the jobs, but at soem time the webUI refreshed
and I lose track of what jobs I already retriggered or not.  I left as is.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Status of nbslave7x

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 08:34:06PM +0530, Kaushal M wrote:
> Would restarting jenkins once help? It might help it pick up the newly
> added entries to the hosts file.

Won't it break all running jobs?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Status of nbslave7x

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 02:57:28PM +, Emmanuel Dreyfus wrote:
> I re-enabled it and it went online, but it does not seems to pick a job.

Oh no, it did, but nuked them all almost instantly (see below). I 
disabled it again. Basically we have borken jenkins setups, and DNS
trouble prevent us from adding new VM. What a mess.

Triggered by Gerrit: http://review.gluster.org/11264 in silent mode.
Building remotely on nbslave71.cloud.gluster.org (netbsd7_regression) in 
workspace /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered
java.io.IOException: remote file operation failed: 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered at 
hudson.remoting.Channel@1f76c8cf:nbslave71.cloud.gluster.org: 
hudson.remoting.ChannelClosedException: channel is already closed
at hudson.FilePath.act(FilePath.java:987)
at hudson.FilePath.act(FilePath.java:969)
at hudson.FilePath.mkdirs(FilePath.java:1152)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1269)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:610)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:532)
at hudson.model.Run.execute(Run.java:1744)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:374)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:550)
at hudson.remoting.Request.call(Request.java:129)
at hudson.remoting.Channel.call(Channel.java:752)
at hudson.FilePath.act(FilePath.java:980)
... 10 more
Caused by: java.io.IOException
at hudson.remoting.Channel.close(Channel.java:1110)
at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118)
at hudson.remoting.PingThread.ping(PingThread.java:126)
at hudson.remoting.PingThread.run(PingThread.java:85)
Caused by: java.util.concurrent.TimeoutException: Ping started at 1433860950328 
hasn't completed by 1433861190328
... 2 more
Finished: FAILURE


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Status of nbslave7x

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 07:39:06PM +0530, Kaushal M wrote:
> nbslave7{d..f} were the entries created by Vijay last week, which were
> resolving to nbslave71; there were no actual vms on rackspace. I had
> disabled nbslave71 at that point in time to reboot it, but I think I
> forgot to re-enable it.

I re-enabled it and it went online, but it does not seems to pick a job.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


[Gluster-infra] Status of nbslave7x

2015-06-17 Thread Emmanuel Dreyfus
Status of NetBSD slave VM:

1 booked: nbslave71 
  It is noted to be disconnected by amarts. Is usage over?

3 removed from rackspace but still in jenkins: nbslave7d, nbslave7e, nbslave7f

6 active: nbslave72, nbslave77, nbslave7c, nbslave7g, nbslave7i, nbslave7j

3 offline: nbslave74 nbslave75 nbslave79
  The 3 DNS records do not resolve (timeout) from build.gluster.org, 
  while they do at mine. Adding them to /etc/hosts helps a lot on the
  command line, and it becomes possible to connect to port 22.
  But jenkins is still unable to connect and launch the agent.
  tcpdump on build.gluster;org shows it does not even tries.

Perhaps there is a name cache in jenkisn and it needs to be restarted?
I am leaving the /etc/hosts file loaded with nbslave74 nbslave75 nbslave79

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 07:44:14AM -0400, Vijay Bellur wrote:
> Do we still have the NFS crash that was causing tests to hang?

Do we still have it on rebased patchsets?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 11:48:46AM +0200, Michael Scherer wrote:
> And I think the DNS issues are just a symptom of a bigger network issue,
> having local DNS might just mask the problem and which would then be non
> DNS related ( like tcp connexion not working ).

Well, if it is lost packets, TCP is more resistant, and if it is an
overloaded DNS server, the problem is only for DNS.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 11:05:38AM +0200, Niels de Vos wrote:
> I've already scripted the reboot-vm job to use Rackspace API, the DNS
> requesting and formatting the results into some file can't be that
> difficult. Let me know if a /etc/hosts format would do, or if you expect
> something else.

Perhaps a /etc/hosts would do it: jenkins launches the ssh command,
and ssh should use /etc/hosts before the DNS.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-17 Thread Emmanuel Dreyfus
On Wed, Jun 17, 2015 at 11:59:22AM +0530, Kaushal M wrote:
> cloud.gluster.org is served by Rackspace Cloud DNS

Perhaps we can change that and setup a DNS for the zone? 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-16 Thread Emmanuel Dreyfus
Venky Shankar  wrote:

> If that's the case, then I'll vote for this even if it takes some time
> to get things in workable state.

See my other mail about this: you enter a new slave VM in the DNS and it
does not resolve, or somethimes you get 20s delays. I am convinced this
is the reason why Jenkins bugs.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-16 Thread Emmanuel Dreyfus
Atin Mukherjee  wrote:

> > That *might* result in lots of NetBSD regression failures later on and
> > we may end up with another round of fixups.
> Agreed, that's the known risk but we don't have any other alternatives atm.

I strongly disagree, we have a good alternative: configure a secondary
DNS on build.gluster.org for the cloud.gluster.org zone. I could do the
local configuration, but someone with administrative access will have to
touch primary configuration to allow zone transfer (and enable
notifications).

The current situation is that we have 14 NetBSD VM online and only 5 are
capable of running jobs because of various infrastructure configuration
problems, broken DNS being the first offender.

Another issue is the hanging NFS mounts (ps -axl shows dd stuck in wchan
tstile), for which I had a change merged that should fix the problem,
but only for rebased changes.


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] Reduce regression runs wait time - New gerrit/review work flow

2015-06-15 Thread Emmanuel Dreyfus
On Mon, Jun 15, 2015 at 10:09:33AM -0400, Jeff Darcy wrote:
> As long as there's some visible marking on the summary pages to
> distinguish patches that have passed smoke vs. those that haven't, I
> think we're good.

gerrit manual says you can add more comulns like revview and verified.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Emmanuel Dreyfus
On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote:
> Michael installed and configured dnsmasq on build.gluster.org yesterday.
> If that does not help today, we need other ideas...

Just to confirm the problem:

[manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached


real0m20.013s
user0m0.002s
sys 0m0.012s

Having a local cache does not help because upstream DNS service is 
weak. Without the local cache, individual processes crave for a reply, 
and with the local server, the local server crave itself crave for
a reply.

And here upstream DNS is really at fault: at mine I get a reply in 
0.29s.

We need to configure a local authoritative secondary DNS for the zone, 
so that the answer is always available locally wihtout having to rely
on outside's infrastructure.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Emmanuel Dreyfus
On Thu, Jun 11, 2015 at 12:51:52PM +0200, Niels de Vos wrote:
> I've just checked the online NetBSD slaves again, but they seem to have
> been configured correctly... Maybe we are hitting a Jenkins bug, or
> there was a (temporary?) issue with DNS resolution?

DNS resolution is wrecked on build.gluster.org: I tried a tcpdump
to diagnose the problem and:
tcpdump: unknown host 'nbslave71.cloud.gluster.org'

Another attmpt gives me the correct answer after more than 5 seconds.

I am almost convinced that a local named on build.gluster.org would
help a lot.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Emmanuel Dreyfus
On Thu, Jun 11, 2015 at 07:26:00AM +, Emmanuel Dreyfus wrote:
> In my opinion the fix to this problem is to start new VM. I was busy 
> on other fronts hence I did not watched the situation, but it is still
> grim, with most NetBSD slaves been in screwed state. We need to spin 
> more.

Launching the slave on the new VM fails, but for once we have a 
maningful error: either DNS names are duplicated, or jenkins has a bug.

<===[JENKINS REMOTING CAPACITY]===>ERROR: Unexpected error in launching a 
slave. This is probably a bug in Jenkins.
java.lang.IllegalStateException: Already connected
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:466)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:371)
at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:945)
at hudson.plugins.sshslaves.SSHLauncher.access$400(SSHLauncher.java:133)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:696)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
[06/11/15 02:37:46] Launch failed - cleaning up connection
[06/11/15 02:37:46] [SSH] Connection closed.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Emmanuel Dreyfus
On Thu, Jun 11, 2015 at 12:57:58PM +0530, Kaushal M wrote:
> The problem was nbslave71. It used to be picked first for all changes
> and would fail instantly. I've disabled it now. The other slaves are
> working correctly.

Saddly the Jenkins upgrade did not help here. Last time I investigated
the failure was caused by the master breaking connexion, but I was not
able to understand why. 

I w once able to receover a VM by fiddeling the jenkins configuration
in web UI, but experimenting is not easy, as a miss will drain all the 
queue into complete failures.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches

2015-06-11 Thread Emmanuel Dreyfus
On Thu, Jun 11, 2015 at 12:39:43PM +0530, Atin Mukherjee wrote:
> Can we start merging patches with out NetBSD's vote? Currently we have
> so many patches waiting for NetBSD's vote and it seems like no vms are
> apparently running as well. This is blocking us to move forward.

In my opinion the fix to this problem is to start new VM. I was busy 
on other fronts hence I did not watched the situation, but it is still
grim, with most NetBSD slaves been in screwed state. We need to spin 
more.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] NetBSD slaves

2015-06-09 Thread Emmanuel Dreyfus
Vijay Bellur  wrote:

> This certainly does explain the baffling behavior 

I just had a look, nbslave7[1cde] are stuck.

nbslave71 does not accept SSH connexionx
I rebooted nbslave7c
nbslave[de] are not in the DNS. 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] NetBSD slaves

2015-06-09 Thread Emmanuel Dreyfus
On Tue, Jun 09, 2015 at 01:01:40PM +0530, Kaushal M wrote:
> I cannot find the new VMs nbslave7{d..f} in rackspace. But jenkins can
> still sees them. Anyone have any idea what's happening here?

Botched DNS records? 

Unfortunately, creating VM needs many clics in rackspace web UI in 
order to have DNS set up.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] NetBSD slaves

2015-06-05 Thread Emmanuel Dreyfus
Vijay Bellur  wrote:

> Rebooting is not working effectively for slaves like slave74. I have 
> created 3 new VMs nbslave7{d..f} for load balancing NetBSD regression
> queue. If necessary we can spin a few ephemeral VMs over the weekend to
> drain the queue.

FWIW I planned nbslave7g after nbslave7f :-)
When it does not reboot, a peek at the console may help. I guess this is
a fsck problem.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] DNS issues in Jenkins infrastructure and/or on build.gluster.org

2015-05-19 Thread Emmanuel Dreyfus
On Tue, May 19, 2015 at 11:09:03AM +0200, Niels de Vos wrote:
> Yes, as long as it caches the correct DNS answers. For the occasion that
> the answer from the upstream DNS is incorrect, it will get cached as
> well. This could result in a longer time for problems, I think.

I never saw incorect responses, just lack of response.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] DNS issues in Jenkins infrastructure and/or on build.gluster.org

2015-05-19 Thread Emmanuel Dreyfus
On Tue, May 19, 2015 at 10:00:45AM +0200, Niels de Vos wrote:
> I think there is a plan to decommission the current build.gluster.org
> and move its services to a different server/datacenter. Hopefully DNS
> will be more reliable soon.

A local DNS on build.gluster.org will cache GlusterFS stuff and will
speed up all DNS requests and make them more reliable at the same time.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Downtime for Jenkins

2015-05-17 Thread Emmanuel Dreyfus
Another connectivity failure:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/5420

The slave VM uptime sugests it did not reboot during the build. 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Downtime for Jenkins

2015-05-17 Thread Emmanuel Dreyfus
Vijay Bellur  wrote:

> Around 9:25 UTC.

There is this one that looks like the old bug:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/5410/console

But the same machine (nbslave71) was at least able to run other jobs after this.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] Downtime for Jenkins

2015-05-17 Thread Emmanuel Dreyfus
Vijay Bellur  wrote:

> Manu - can you please verify and report back if the NetBSD slaves work
> better with the upgraded Jenkins master?

At what time the new jenkins started up?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


[Gluster-infra] What is wrong with Jenkins

2015-05-14 Thread Emmanuel Dreyfus
Jenkins decided to throw one more VM in the always failing case. Usual
business, but I found a way to recover:

I changed the way Jenkins was supposed to connect to the VM: through a
command instead of SSH (I created ~/.ssh/id_rsa_nbslave for that)

/usr/bin/ssh -oLogLevel=ERROR -oBatchMode=yes -oStrictHostKeyChecking=no
-oUserKnownHostsFile=/dev/null -i /var/lib/jenkins/.ssh/id_rsa_nbslave
jenk...@nbslave71.cloud.gluster.org "/usr/pkg/java/openjdk7/bin/java
-jar /home/jenkins/root/slave.jar"

It did not work, but revering to using SSH fixed the problem, and the VM
is now able to run jobs again.

But there are other frustrating failures: for instance this one was
disconnected during a run, I still wonder why:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/5380

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


  1   2   >