Re: [Gluster-infra] Can I update /opt/qa for NetBSD?
Nigel Babu wrote: > Practically, I only need someone to look at one line: Where is this fs used? For instance, mount(8) knows about ffs, not UFS: $ mount /dev/raid0a on / type ffs (log, local) /dev/raid0e on /mail type ffs (log, nodev, nosuid, local, with quotas) /dev/raid1a on /ssd type ffs (log, nodev, nosuid, local) kernfs on /kern type kernfs (local) -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] NetBSD jobs hang on umount
Nigel Babu wrote: > Atin says we've noticed this in the past and somehow fixed it. Do you > recall what we did to fix it? Is it the same problem? The key test is ps -axl and observe the WCHAN column for stuck umount process. If it is tstile then this is the ancient bug. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Can I update /opt/qa for NetBSD?
Nigel Babu wrote: > I'll definitely appreciate any feedback you can have in terms > of code when it's ready for review. No problem. But regresson infrastructure will catch any issue better that I would, anyway. :-) -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Can I update /opt/qa for NetBSD?
Nigel Babu wrote: > * A bunch of cleanup scripts are in build.sh. I think they can move into the > Jenkins job itself. That's what we do for linux boxes. Feel free to do it, but is there a benefit? If it is not broken, do not fix it... > * Some test files have been made to 0644. Is this relevant or are these > accidental changes? Probable accidental change. > * Can the Python path and stuff be declared before the build.sh script is > called in the Jenkins script? I don't know if this will work, I'll have to > test. You need to run build.sh so that configure is invoked and env.rc is created with appropriate @BUILD_PYTHON_SITE_PACKAGES@ set (see tests/env.rc.in) > * I think we now have a standard way of skipping tests in the test runner > itself rather than deleting the tests from the checkout. If not, I'll drive > these fixes. It may not scale when you want to skip all the bugs directory. > * The check for whether there's two jobs assigned to the same machine can > be controlled from Jenkins and I plan to do that, so we can probably > remove that code as well. That seems better, as the current setting bugs when a job is manually cancelled and retriggered (here is a point to fix!) -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] netbsd7.cloud.gluster.org
Nigel Babu wrote: > Oh, it's in the pool for netbsd-7 smoke, which we don't run anymore. Shall the > I kill the machine, then? No problem for me. > The smoke is perhaps just a build, which we do during regressions on netbsd7 > anyway. And we do smoke on netbsd-6 anyway. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] netbsd7.cloud.gluster.org
Nigel Babu wrote: > I notice that this machine is a bit different from the others in terms of > partition and also runs 7.0 BETA. Is this intentional or does it make sense to > re-imagine this machine with one of the other machines? I've had the new > partition created on all the other NetBSD 7 machines and it seems to be going > well so far. IIRC it was used for netbsd-7 smoke tests but jenkins setup broke at some point, leaving us with netbsd-6 smoke test on netbsd0.cloud.gluster.org and netbsd-7 regression on nbslave7x.cloud.gluster.org I am not sure netbsd7.cloud.gluster.org is used anymore. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Netbsd folders filling up
Nigel Babu wrote: > Thank you. I'll give that a shot. I also want to setup ntp Add ntpd=YES in /etc/rc.conf and run /etc/rc.d/ntpd start Note that the ntpdcurrently installed needs a secrity update. > and change passwords for all the machines in one go. Do it on one machine using vipw, copy /etc/master.passwd and run pwd_mkdb -p /etc/master.passwd eveyrwhere to regenerate /etc/passwd -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Netbsd folders filling up
Nigel Babu wrote: > Oh, can I apply this to all the machines in one go? disklabel as is works with an interactive editor, but you can ealso disklabel xbd0 > protofile then tweak the file and use disklabel -R xbd0 protofile to load is in a batch. Or you can just modify nbslave70, image it and deploy to other machines, it would not hurt. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Netbsd folders filling up
On Mon, Jul 18, 2016 at 09:37:19AM +, Emmanuel Dreyfus wrote: > On Mon, Jul 18, 2016 at 10:35:45AM +0530, Nigel Babu wrote: > > Would it be problematic if I added 20GB of block storage per machine for the > > /build, /home/jenkins and /archives folder? That should easily sort out our > > disk space troubles. > > No, but first check that current image does not have some spare space > beyond the / partition That is the case: diskabel xbd0 says #sizeoffset fstype [fsize bsize cpg/sgs] a: 1992288163 4.2BSD 2048 16384 0 # (Cyl. 0*- 9727) b: 4194304 19922944 swap # (Cyl. 9728 - 11775) c: 2097145763 unused 0 0# (Cyl. 0*- 10239) d: 83886080 0 unused 0 0# (Cyl. 0 - 40959) e: 8388608 24117248 4.2BSD 2048 16384 0 # (Cyl. 11776 - 15871) NetBSD has some historic curiosity: c is the NetBSD partition in MBR, d is the whole disk. This means you have 51380224 sectors of 512 bytes left after partiton e: 24 GB. Run disklabel -e xbd0 and add a f line: f: 51380161 32505856 4.2BSD 2048 16384 0 While there it will not hurt to resize c (for the sake of clarity) c: 83886017 63 unused 0 0 And still while there, fdisk -iau xbd0 to ajust NetBSD partiton size in MBR. Then you can newfs /dev/rxbd0f add /dev/xbd0f in :etc/fstab mount /dev/xbd0f -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Netbsd folders filling up
On Mon, Jul 18, 2016 at 10:35:45AM +0530, Nigel Babu wrote: > Would it be problematic if I added 20GB of block storage per machine for the > /build, /home/jenkins and /archives folder? That should easily sort out our > disk space troubles. No, but first check that current image does not have some spare space beyond the / partition -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Netbsd folders filling up
On Fri, Jul 15, 2016 at 03:04:39PM +0530, Nigel Babu wrote: > Would it be okay to write a cron to clean up anything older than 15 days in > /build/install and /archives? You have to cleanup after some time. How is it handled on Linux boxen? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Netbsd folders filling up
On Fri, Jul 15, 2016 at 07:19:17AM +, Emmanuel Dreyfus wrote: > On Fri, Jul 15, 2016 at 10:59:04AM +0530, Nigel Babu wrote: > > nbslave77.cloud.gluster.org > > That one has 1,6 Go of logs in /build/install/var/log/glusterfs And if you look for free space, you can wipe /usr/pkgsrc (2,6 Go) which is not used. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Netbsd folders filling up
On Fri, Jul 15, 2016 at 10:59:04AM +0530, Nigel Babu wrote: > nbslave77.cloud.gluster.org That one has 1,6 Go of logs in /build/install/var/log/glusterfs -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Netbsd folders filling up
Nigel Babu wrote: > Does anyone know which folders are usually filled up quickly on netbsd > machines? A lot of the machines are offline merely because they're out of > diskspace. I'm working through them. I've cleared out old files in /build > and /archives. Is there anywhere else I should be looking? What are the offending machines? Core files are configured to go in /var/crash, but /var/* should be a good place to look at. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Putting the netbsd builders in the ansible pool ?
Michael Scherer wrote: > The only issue I face is that you flagged most of /usr as unchangeable, > and I do not know how cleanly it would be to remove the flags before > applying changes and apply that again with the current layout of our > ansible roles. But I will figure something out. I did this because of a glusterfs bug that overrote random file with logs. I tend to use it that way to overrite a file: cat hosts | ssh root@host "chflags nouchg /etc/hosts; cat > /etc/hosts; chflags uchg /etc/hosts" -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Putting the netbsd builders in the ansible pool ?
Michael Scherer wrote: > I connected to it from rackspace and stopped rpcbind in a hurry after > being paged, but I would like to make sure that the netbsd builders are > a bit more hardened (even if they are already well hardened from what I > did see, even if there is no firewall), as it seems most of them are > also running rpcbind (and sockstat show they are not listening only on > localhost). I created minimal filtering rules in /etc/ipf.conf and restarted rpcbind. I did the same for others NetBSD vm. > Emmanuel, would you be ok if we start to manage them with ansible like > we do for the Centos ones ? I have no problem with it, but I must confess a complete lack of experience with this tool. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] Requesting for a NetBSD setup
On Mon, May 02, 2016 at 01:55:43PM +0530, Manikandan Selvaganesh wrote: > Could you please provide us a NetBSD machine as the test cases are failing > and we need to have a look on it? nbslave72.cloud.gluster.org was put offline for some jenkins breakage that todes not seems to be slave-related: I gave it a quick try, and it is able to build and run tests. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] hang in netbsd regression while building
Raghavendra Gowdappa wrote: > While trying to get netbsd regressions passed on [1], I am seeing hangs in > building glusterfs. I had seen similar behavior for other patches > earlier too, but Kaushal had fixed it. Any help is appreciated. Also, if > you let me know a procedure to fix this issue if I encounter the same in > future, I can do it myself. The machine is stuck in a bad corner case from previous run, and cannot cleanup for the new run. ps -axl shows many umount processes in tstile wchan. reboot -n is adivsed in such a situation. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] Requesting for NetBSD setup
On Fri, Apr 29, 2016 at 12:40:19PM +0530, Kaushal M wrote: > I often disconnect machines that aren't in a working state, and reboot them. > If I've left something in the disconnected state, most likely those > machines didn't get back to a working state after the reboot. > Or it could be that I just forgot. I just checked out master onnbslave74, built and ran tests, it seems fine. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] Requesting for NetBSD setup
On Fri, Apr 29, 2016 at 01:28:53AM -0400, Karthik Subrahmanya wrote: > I would like to ask for a NetBSD setup nbslave7[4gh] are disabled in Jenkins right now. They are labeled "Disconnected by kaushal", but I don't kno why. Once it is confirmed that they are not alread used for testing, you could pick one. I still does not know who is the password guardian at Rehat, though. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] regression machines reporting slowly ? here is the reason ...
On Sun, Apr 24, 2016 at 03:59:40PM +0200, Niels de Vos wrote: > Well, slaves go into offline, and should be woken up when needed. > However it seems that Jenkins fails to connect to many slaves :-/ Nothing new here. I tracked this kind of toruble with NetBSD slaves and only got frustration as the result. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Different version of run-tests.sh in jenkin slaves?
On Thu, Jan 28, 2016 at 12:17:58PM +0530, Raghavendra Talur wrote: > Where do I find config in NetBSD which decides which location to dump core > in? I crafted the patch below, bbut it is probably much simplier to just set kern.defcorename to /%n-%p.core on all VM slaves. I will do it. diff --git a/xlators/storage/posix/src/posix.c b/xlators/storage/posix/src/posix.c index 272d08f..2fd2d7d 100644 --- a/xlators/storage/posix/src/posix.c +++ b/xlators/storage/posix/src/posix.c @@ -29,6 +29,10 @@ #include #endif /* HAVE_LINKAT */ +#ifdef __NetBSD__ +#include +#endif /* __NetBSD__ */ + #include "glusterfs.h" #include "checksum.h" #include "dict.h" @@ -6631,6 +6635,8 @@ init (xlator_t *this) _private->path_max = pathconf(_private->base_path, _PC_PATH_MAX); if (_private->path_max != -1 && _XOPEN_PATH_MAX + _private->base_path_length > _private->path_max) { +char corename[] = "/%n-%p.core"; + ret = chdir(_private->base_path); if (ret) { gf_msg (this->name, GF_LOG_ERROR, 0, @@ -6639,7 +6645,15 @@ init (xlator_t *this) _private->base_path); goto out; } + #ifdef __NetBSD__ +/* + * Make sure cores go to the root and not in current + * directory + */ +(void)sysctlbyname("proc.curproc.corename", NULL, NULL, + corename, strlen(corename) + 1); + /* * At least on NetBSD, the chdir() above uncovers a * race condition which cause file lookup to fail -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] Different version of run-tests.sh in jenkin slaves?
On Thu, Jan 28, 2016 at 12:10:49PM +0530, Atin Mukherjee wrote: > So does that mean we never analyzed any core reported by NetBSD > regression failure? That's strange. We got the cores from / but not from d/backends/*/ as I understand. I am glad someone figured out the mystery. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Different version of run-tests.sh in jenkin slaves?
On Thu, Jan 28, 2016 at 12:17:58PM +0530, Raghavendra Talur wrote: > Where do I find config in NetBSD which decides which location to dump core > in? sysctl kern.defcorename for the default location and name. It can be overriden per process using sysctl proc.$$.corename > Any particular reason you added /d/backends/*/*.core to list of path to > search for core? Yes, this is required for standard compliance of the exposed glusterfs filesystem in the case of low system PATH_MAX. See in posix.c: /* * _XOPEN_PATH_MAX is the longest file path len we MUST * support according to POSIX standard. When prepended * by the brick base path it may exceed backed filesystem * capacity (which MAY be bigger than _XOPEN_PATH_MAX). If * this is the case, chdir() to the brick base path and * use relative paths when they are too long. See also * MAKE_REAL_PATH in posix-handle.h */ _private->path_max = pathconf(_private->base_path, _PC_PATH_MAX); if (_private->path_max != -1 && _XOPEN_PATH_MAX + _private->base_path_length > _private->path_max) { ret = chdir(_private->base_path); if (ret) { gf_msg (this->name, GF_LOG_ERROR, 0, P_MSG_BASEPATH_CHDIR_FAILED, "chdir() to \"%s\" failed", _private->base_path); goto out; } And the core goes in current directory by default. We could use sysctl(3) to change that if we need. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] Netbsd regressions are failing because of connection problems?
Michael Scherer wrote: > Depend, if they exhausted FD or something ? I am not a java specialist. It is not the same errno, AFAIK. > Could also just be too long to answer due to the load, but it was not > loaded :/ High loads give timeouts. I may be wrong, but I beleive connection refused is really when it gets a TCP RST. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] Netbsd regressions are failing because of connection problems?
On Thu, Jan 21, 2016 at 04:49:28PM +0100, Michael Scherer wrote: > > review.gluster.org[0: 184.107.76.10]: errno=Connection refused > > SO I found nothing in gerrit nor netbsd. ANd not the DNS, since it > managed to resolve stuff fine. > > I suspect the problem was on gerrit, nor on netbsd. Did it happened > again ? I could imagine problems with exhausted system resources, but it would not produce a "Connection refused". -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] Netbsd regressions are failing because of connection problems?
Vijay Bellur wrote: > Does not look like a DNS problem. It is happening to me outside of > rackspace too. I mean I have already seen rackspace VM failing to initiate connexions because rackspace DNS failed to answer DNS requests. This was the cause of failed regression at some time. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] Netbsd regressions are failing because of connection problems?
Vijay Bellur wrote: > There is some problem with review.gluster.org now. git clone/pull fails > for me consistently. First check DNS is working. I recall seing rackspace DNS failing to answer. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regression fixes
Hi all I have the followif changes awaiting code review/merge: http://review.gluster.org/13204 http://review.gluster.org/13205 http://review.gluster.org/13245 http://review.gluster.org/13247 -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] NetBSD regression fixes
Emmanuel Dreyfus wrote: > But I just realized the change is wrong, since running tests "new way" > stops on first failed test. My change just retry the failed test and > considers the regression run to be good on success, without running next > tests. > > I will post an update shortly. Done: http://review.gluster.org/13245 http://review.gluster.org/13247 -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] NetBSD regression fixes
Niels de Vos wrote: > > 2) Spurious failures > > I added a retry-failed-test-once feature so that we get less regression > > failures because of spurious failures. It is not used right now because > > it does not play nicely with bad tests blacklist. > > > > This will be fixed by that changes: > > http://review.gluster.org/13245 > > http://review.gluster.org/13247 > > > > I have been looping failure-free regression for a while with that trick. > > Nice, thanks for these improvements! But I just realized the change is wrong, since running tests "new way" stops on first failed test. My change just retry the failed test and considers the regression run to be good on success, without running next tests. I will post an update shortly. > Could you send a pull request for the regression.sh script on > https://github.com/gluster/glusterfs-patch-acceptance-tests/ ? Or, if > you dont use GitHub, send the patch by email and we'll take care of > pushing it for you. Sure, but let me settle on something that works first. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] NetBSD regression fixes
Hello all Here are the problems identified in NetBSD regression so far: 1) Before starting regression, slave compains about "vnconfig: VNDIOCGET: Bad file descriptor" and fails the run. This will be fixed by that changes: http://review.gluster.org/13204 http://review.gluster.org/13205 2) Spurious failures I added a retry-failed-test-once feature so that we get less regression failures because of spurious failures. It is not used right now because it does not play nicely with bad tests blacklist. This will be fixed by that changes: http://review.gluster.org/13245 http://review.gluster.org/13247 I have been looping failure-free regression for a while with that trick. 3) Stale state from previous regression We sometime have processes stuck from previous regression, awaiting vnode locks for destroyed NFS filesystems. This cause starting cleanup scripts to hang before starting regression and we get a timeout. I modified slave's /opt/qa/regression.sh to check for stuck processes and reboot the system if we find them. That will fail the current regression run, but at least the next ones coming after reboot will be safe. This fix is not deployed yet, I await the fixes from point 2 to be merged 4) Jenkins casts concurent runs on the same slave We observed Jenkins sometimes runs two jobs on the same slave at once, which of course can only lead to horrible failure. I modified slave's /opt/qa/regression.sh to add a lock file so that this situation is detected early and reported. The second regression will fail, but the idea is to get a better understanding of how that can occur. This fix is not deployed yet, I await the fixes from point 2 to be merged -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
Emmanuel Dreyfus wrote: > While trying to reproduce the problem in > ./tests/basic/afr/arbiter-statfs.t, I came to many failures here: > > [03:53:07] ./tests/basic/afr/split-brain-resolution.t I was running tests from wrong directory :-/ This one is fine with HEAD. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
Pranith Kumar Karampuri wrote: > I tried to look into 3 instances of this failure: (...) > same issue as above, two tests are running in parallel. How is it possible? A & that sends a job in the background? Are we sure it is the same regression test run? Or is it two regression test runs that are scheduled simultaneously? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
Pranith Kumar Karampuri wrote: > tests/basic/afr/arbiter-statfs.t I posted patches to fix this one (but it seems Jenkins is down? No regression is running) > tests/basic/afr/self-heal.t > tests/basic/afr/entry-self-heal.t That two ones are still to be investigated, and it seems tests/basic/afr/split-brain-resolution.t is now reliabily broken as well. > tests/basic/quota-nfs.t That one is marked as bad test and should not cause harm on spurious failure as its result is ignored. I am trying to reproduce a spurious VM reboot during tests by looping on the whole test suite on nbslave70, with reboot on panic disabled (it will drop into kernel debugger instead). No result so far. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
Pranith Kumar Karampuri wrote: > With your support I think we can make things better. To avoid > duplication of work, did you take any tests that you are already > investigating? If not that is the first thing I will try to find out. While trying to reproduce the problem in ./tests/basic/afr/arbiter-statfs.t, I came to many failures here: [03:53:07] ./tests/basic/afr/split-brain-resolution.t .. 20/43 getfattr: Removin g leading '/' from absolute path names cat: /mnt/glusterfs/0/data-split-brain.txt: Input/output error not ok 25 Got "" instead of "brick0_alive" cat: /mnt/glusterfs/0/data-split-brain.txt: Input/output error not ok 27 Got "" instead of "brick1_alive" getfattr: Removing leading '/' from absolute path namesnot ok 30 Got "" instead of "brick0" not ok 32 Got "" instead of "brick1" It is not in the lists posted here. Is it only at mine? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
Emmanuel Dreyfus wrote: > > With your support I think we can make things better. To avoid duplication of > > work, did you take any tests that you are already investigating? If not that > > is the first thing I will try to find out. > > I will look at the ./tests/basic/afr/arbiter-statfs.t problem with > loopback device. I tracked it down: vnconfig -l complains about "vnconfig: VNDIOCGET: Bad file descriptor" when we had a configured loopback device with the backing store on a filesystem we unmounted. # dd if=/dev/zero of=/scratch/backend bs=1024k count=100 100+0 records in 100+0 records out 104857600 bytes transferred in 3.034 secs (34560843 bytes/sec) # vnconfig vnd0 /scratch/backend # vnconfig -l vnd0: /scratch (/dev/xbd1a) inode 6 vnd1: not in use vnd2: not in use vnd3: not in use # umount -f /scratch/ # vnconfig -l vnconfig: VNDIOCGET: Bad file descriptor But it seems the workaround is easy: # vnconfig -u vnd0 # vnconfig -l vnd0: not in use vnd1: not in use vnd2: not in use vnd3: not in use Here is my fixes: http://review.gluster.org/13204 (master) http://review.gluster.org/13205 (release-3.7) And while there, a portability fix in rfc.sh: http://review.gluster.org/13206 (master) That bug is not present in release-3.7. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
On Fri, Jan 08, 2016 at 09:57:01PM +0530, Pranith Kumar Karampuri wrote: > >Next step is to look for loopback devices which backing store are in $B0 > >and unconfigure them. > Oops, wrong code reading. Is it possible to have loopback devices not in > use, that we miss out on destroying? Could be a stupid question but still > asking. Well the kernel tells us it is not in use. I am not sure what you mean. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
On Fri, Jan 08, 2016 at 08:37:16PM +0530, Pranith Kumar Karampuri wrote: > NetBSD) > vnd=`vnconfig -l | \ > awk '!/not in use/{printf("%s%s:%d ", $1, $2, $5);}'` > > Can there be Loopback devices that are in use when this piece of the code is > executed, which can lead to the problems we ran into? I may be completely > wrong. It is a wild guess about something I don't completely understand. This lists loopback devices in use. For instance: vnd0:/d:180225 vnd1:/d:180226 vnd2:/d:180227 Next step is to look for loopback devices which backing store are in $B0 and unconfigure them. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
On Fri, Jan 08, 2016 at 10:56:22AM +, Emmanuel Dreyfus wrote: > On Fri, Jan 08, 2016 at 03:18:02PM +0530, Pranith Kumar Karampuri wrote: > > With your support I think we can make things better. To avoid duplication of > > work, did you take any tests that you are already investigating? If not that > > is the first thing I will try to find out. > > I will look at the ./tests/basic/afr/arbiter-statfs.t problem with > loopback device. 800 rusn so far without a hitch. I suspect the problem is caused by the leftover of another test. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
On Fri, Jan 08, 2016 at 03:18:02PM +0530, Pranith Kumar Karampuri wrote: > With your support I think we can make things better. To avoid duplication of > work, did you take any tests that you are already investigating? If not that > is the first thing I will try to find out. I will look at the ./tests/basic/afr/arbiter-statfs.t problem with loopback device. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
On Fri, Jan 08, 2016 at 05:11:22AM -0500, Jeff Darcy wrote: > [08:45:57] ./tests/basic/afr/arbiter-statfs.t .. > [08:43:03] ./tests/basic/afr/arbiter-statfs.t .. > [08:40:06] ./tests/basic/afr/arbiter-statfs.t .. > [08:08:51] ./tests/basic/afr/arbiter-statfs.t .. > [08:06:44] ./tests/basic/afr/arbiter-statfs.t .. > [08:00:54] ./tests/basic/afr/self-heal.t .. > [07:59:56] ./tests/basic/afr/entry-self-heal.t .. > [18:05:23] ./tests/basic/quota-anon-fd-nfs.t .. > [18:06:37] ./tests/basic/quota-nfs.t .. > [18:49:32] ./tests/basic/quota-anon-fd-nfs.t .. > [18:51:46] ./tests/basic/quota-nfs.t .. > [14:25:37] ./tests/basic/quota-anon-fd-nfs.t .. > [14:26:44] ./tests/basic/quota-nfs.t .. > [14:45:13] ./tests/basic/tier/record-metadata-heat.t .. That is 6 tests, they could be disabled or ignored. > So some of us *have* done that work, in a repeatable way. Note that the > list doesn't include tests which *hang* instead of failing cleanly, > which has recently been causing the entire NetBSD queue to get stuck > until someone manually stops those jobs. What I find disturbing is the > idea that a feature with no consistently-available owner or identifiable > users can be allowed to slow or block every release unless every > developer devotes extra time to its maintenance. Even if NetBSD itself > is worth it, I think that's an unhealthy precedent to set for the > project as a whole. For that point, we could start the regression script by: ( sleep 7200 && /sbin/reboot -n ) & And end it with: kill %1 Does it seems reasonable? That way nothing can hang more than 2 hours. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
On Fri, Jan 08, 2016 at 03:18:02PM +0530, Pranith Kumar Karampuri wrote: > Should the cleanup script needs to be manually executed on the NetBSD > machine? You can run the script manually, but if the goal is to restore a misbehaving machine, rebooting is probbaly the fastest way to sort the issue. While thinking about it, I suspect there may be some benefit into rebooting the machine if the regression does not finish within a sane amount of time. > >First step could be to parse jenkins logs and find which test fail or hang > >most often in NetBSD regression > > This work is under way. I will have to change some of the scripts I wrote to > get this information. Great. > To avoid duplication of work, did you take any tests that you are > already investigating? If not that is the first thing I will try to find out. No, I did not started investigating right now because I have no idea where I should look at. Your input will be very valuable. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
On Fri, Jan 08, 2016 at 12:42:36PM +0530, Sachidananda URS wrote: > I have a NetBSD 7.0 installation which I can share with you, to get > started. > Once manu@ gets back on a specific version, I can set that up too. NetBSD 7.0 is fine and has everything required in GENERIC kernel. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
On Fri, Jan 08, 2016 at 11:45:20AM +0530, Pranith Kumar Karampuri wrote: > 1) How to set up NetBSD VMs on my laptop which is of exact version as the > ones that are run on build systems. Well, the easier way is to pick the VM image we run at rackspace, which relies on Xen. If you use an hardware virtualization system, we just need to change the kernel and use NetBSD-7.0 GENERIC one. What hypervisor do you use? Alternatively it is easy to make a fresh NetBSD install. The only trap is that glusterfs backing store filesystem must be formatted in FFFv1 format to get extended attrbiute support (this is obtained by newfs -O1). > 2) How to prevent NetBSD machines hang when things crash (At least I used to > see that the machines hang when fuse crashes before, not sure if this is > still the case)? (This failure needs manual intervention at the moment on > NetBSD regressions, if we make it report failures and pick next job that > would be the best way forward) It depends what we are talking about. If this is a moint point that does not want to unmount, killing the perfused daemon (which is the bridge between FUSE and native PUFFS) will help. The cleanup script does it. Do you have a hang example? > 3) We should come up with a list of known problems and how to troubleshoot > those problems, when things are not going smooth in NetBSD. Again, we really > need to make things automatic, this should be last resort. Our top goal > should be to make NetBSD machines report failures and go to execute next > job. This is the frustrating point for me: we have complains that things go bad, but we do not have data about chat tests caused troubles. Fixing the problem underlying unbacked complains means we will have to gather data on our own. First step could be to parse jenkins logs and find which test fail or hang most often in NetBSD regression. > 4) How can we make debugging better in NetBSD? In the worst case we can make > all tests execute in trace/debug mode on NetBSD. > > I really want to appreciate the fine job you have done so far in making sure > glusterfs is stable on NetBSD. Thanks! I must confess the idea of having the NetBSD port demoted is a bit depressing given the amount of work I invested in it. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
Ravishankar N wrote: > > I am a bit disturbed by the fact that people raise the > > "NetBSD regression ruins my life" issue without doing the work of > > listing the actual issues encountered. > I already did earlier- the lack of infrastructure to even find out what > caused the issue in the first place. I meant: what test exhibited spurious failure or hang? You can see that from the regression test run. Previous experence makes me suspect we will narrow the problem to a few tests that can be disabled. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
Jeff Darcy wrote: > > Now what is the policy on post-merge regression failure? What happens > > if original submitter is now willing to investigate? > > Then regressions will continue to fail on NetBSD, as they do now, but > without impacting work on other platforms. Well from previous experience of maintaining NetBSD support without mandatory regression, I am almost certain that it will quickly break. The only relief in the post-merge regression scheme is that we will have a precise idea of the change that caused the regression. In my opinion the best way forward would be to identify what tests cause frequent NetBSD spurious failures and disable them for NetBSD regression. I am a bit disturbed by the fact that people raise the "NetBSD regression ruins my life" issue without doing the work of listing the actual issues encountered. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
Avra Sengupta wrote: > Agree with your point. If we are ready to make exceptions, then we might > as well not block all the patches. As Jeff suggested, triaging the > nightly/weekly results manually and making any serious issues a blocker > should suffice. How are you going to make a serious issue a blocker? The serious issue will be related to multiple patches, it will be impossible to tell which one is the offender. If we go that way, we need to run a regression for each merged patch, which will be much less load than today. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.
On Thu, Jan 07, 2016 at 03:01:41PM +0530, Avra Sengupta wrote: > Why is this a bad idea? Because each week you will have multiple regressins introduced. Few people will be willing to investigate wether they pushed a patch that caused a regression, and that few people will have to deal the others regressions they did not cause. Being in the situation to fix the regression test will be a rare event, and the whole thing will quickly rot. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] Lot of Netbsd regressions 'Waiting for the next available executor'
On Thu, Dec 31, 2015 at 03:57:15PM +0530, Raghavendra Talur wrote: > You can log in. I think the HUP signal did not cause any > change in process state. I still see it in I state. > pid is 10967. That one is perl runninf proce quota.t. I beleive 15221 is the stuck one. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] Lot of Netbsd regressions 'Waiting for the next available executor'
On Thu, Dec 31, 2015 at 03:22:54PM +0530, Raghavendra Talur wrote: > Manu, this seems to be a bug in libperfuse and not in Gluster. > The machine is nbslave75.cloud.gluster.org. You will have to rerun > quota.t couple of times to hit the bug. The test would hang in line 62(TEST > 24). There is a jenkins job running on that machine. May I proceed? Where is the relevant test suite? A nice way of handling over the bug to someone else could be to run in the screen utility. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] Lot of Netbsd regressions 'Waiting for the next available executor'
On Thu, Dec 24, 2015 at 01:44:21AM -0500, Raghavendra Gowdappa wrote: > > Seems to be hung. May be a hung syscall? I've tried to kill it, but seems > > like its not dead. May be patch #12594 is causing some issues on netbsd. It > > has passed gluster regression. ps -axl shows PID 1394 (umount) waiting on tstile, which is used for spinlocks. No process should sit there for long, unless there is a kernel locking problem (which may be a userland locking problem thanks to FUSE). Using crash(8) I can see umount is awaiting for a vnode lock. There is certainly something to investigate but I lack time for now. I issued a reboot. Please tell me if you can reproduce it. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regression failures
On Fri, Oct 23, 2015 at 01:52:59PM +0530, Ravishankar N wrote: > All arbiter-statfs.t tests that are failing are on > nbslave74.cloud.gluster.org. > Loopback mounts are not happening on that slave. Perhaps it needs to be > rebooted. Indeed: the test passes on nbslave70. Someone reboot nbslave74? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] NetBSD tests stuck at the same place
On Thu, Sep 24, 2015 at 05:57:45AM -0400, Krutika Dhananjay wrote: > No matter how many times I (re)trigger the regression tests in NetBSD for > http://review.gluster.org/#/c/12213/ , they seem to get stuck at the same > point every time. > See > https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/10473/console > for instance. I will be able to look at this in a few hours. In the meantime, check that the filesystems of the test node are not full. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Recent changes in regression.sh
On Mon, Aug 31, 2015 at 07:53:49PM +0530, Raghavendra Talur wrote: > We started seeing errors like "/opt/qa/regression.sh: 120: Syntax error: > end of file unexpected (expecting ")"" in jenkins runs today. What run? > I suspect it is one of the recent changes in regression.sh which might have > caused it. Yes, I am surely the culprit. > Comparing the version at github and one at nbslave77.cloud.gluster.org I > found quite a few differences. If someone is aware of recent changes need > help in fixing it. What difference do you have? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] An attempt to thwart G_LOG corruption
Niels de Vos wrote: > Great idea! I was thinking of something like SElinux, but that is > obviously not available for NetBSD. There is some similar stuff on NetBSD I never learnt about. Immutable flag is simple and will work. I still have to decide if I include it in the image or if I prepare an install script to "freeze" the setup once the VM is created. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] An attempt to thwart G_LOG corruption
Emmanuel Dreyfus wrote: > Let me know if it is too wide and causes trouble. It was, I had to remove the immutable flag on: /usr/pkg/lib/python2.7/site-packages/gluster/ => we install glupy.py there /etc/openssl => ssl-authz.t create key and cert there And that lets a job pass regression while we have some guarantee that the tests cannot easily corrupt the system. https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/9580/ -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] An attempt to thwart G_LOG corruption
Emmanuel Dreyfus wrote: > Let me know if it is too wide and causes trouble. I -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] An attempt to thwart G_LOG corruption
Hello We have a rogue test that appends log data to an incorrect open file descriptors, clobebring various system and library files with logs. That quickly renders regression slaves unusable. I tried an exepriment to thwart that threat: NetBSD FFS filesystem features an immutable flag, which tells even root cannot modify the file. I applied it on nbslave7[1-j] for the following files and directories (and their children) /.cshrc /.profile /altroot /bin /boot /boot.cfg /etc /grub /lib /libdata /libexec /netbsd /netbsd7-XEN3PAE_DOMU /opt /rescue /root /sbin /stand /usr Let me know if it is too wide and causes trouble. If anyone wants to experiment: Recursively (-R) installs the flag in /usr: chflags -R uchg /usr Recursively remove it: chflags -R nouchg /usr We also have schg/noschg, which can be set at any time but can only be removed by root in a single-user shell. I ruled out this because I am not sure rackspace console access lets us use single user mode. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Fresh NetBSD regression failures
Avra Sengupta wrote: > All NetBSD regression failures are again failing (more like refusing to > build), with the following error. Random files clobbered by G_LOG? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regression failures
On Wed, Aug 19, 2015 at 07:35:30AM -0400, Kotresh Hiremath Ravishankar wrote: > 1. geo-rep does lazy umount of gluster volume which needs to be modified >to use 'gf_umount_lazy' provided by libglusterfs, correct? Yes, that spawns umountd, which does its best to emulate lazy unmount,, by trying to unmount at regular times. > 2. geo-rep uses lgetxattr, it is throwing 'undefined error', I tried searching >for man page for lgetxattr in netBSD but couldn't find. Is there a known >portability issue with it? No but if you can provide me a simple test in C showing the problem, I will be glad to fix the implementation. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regression failures
Kotresh Hiremath Ravishankar wrote: > Since the geo-rep regression tests are failing only in NetBSD, Is there > a way we can mask it's run only in NetBSD and let it run in linux? > I am working on geo-rep issues with NetBSD. Once these are fixed we can > enable on NetBSD as well. Yes, I can wipe them from regression.sh before running the tests, like we do for tests/bugs (never ported), tests/basic/tier/tier.t and tests/basic/ec (the two later used to pass but started exhibiting too much spurious failures). -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] nbslave77 disabled
On Mon, Aug 17, 2015 at 07:37:56AM +, Emmanuel Dreyfus wrote: > > Michael/Manu, could you have a look at that? > nbslave71 and nbslave79 seems very sick too. I restored nbslave7[179] -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] nbslave77 disabled
On Mon, Aug 17, 2015 at 09:23:41AM +0200, Niels de Vos wrote: > nbslave77 does not respond at all anymore (some errors related to > pam_start, see screenshot). I have disabled it again, it probably needs > a complete rebuild. > > Michael/Manu, could you have a look at that? Interesting: almost all files in /root and /etc were corupted with glusterfs regression log messages appended to them. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] nbslave77 disabled
On Mon, Aug 17, 2015 at 09:23:41AM +0200, Niels de Vos wrote: > nbslave77 does not respond at all anymore (some errors related to > pam_start, see screenshot). I have disabled it again, it probably needs > a complete rebuild. > > Michael/Manu, could you have a look at that? nbslave71 and nbslave79 seems very sick too. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] nbslave7h.cloud.gluster.org fails to post voting to Gerrit
Vijay Bellur wrote: > This required a gerrit db update and I have done that. Can you please > check now? Yes, it works. The test is to run as jenkins user: ssh nb7bu...@review.gluster.org 'gerrit --help' -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] nbslave7h.cloud.gluster.org fails to post voting to Gerrit
Niels de Vos wrote: > If you explain the steps that are needed to be done, and the location of > the ssh keys, others should be able to fix this in the future too. I confirm I have accidentaly overwritten the keys in the new image. $ ssh r...@nbslave7h.cloud.gluster.org # su -l jenkins $ ssh nb7bu...@review.gluster.org gerrit --help Permission denied (publickey). $ ls -l .ssh total 40 -rw--- 1 jenkins wheel 1675 Apr 14 13:47 id_rsa -rw-r--r-- 1 jenkins wheel422 Apr 14 13:47 id_rsa.pub -rw--- 1 jenkins wheel 1675 Dec 19 2014 id_rsa2048 -rw-r--r-- 1 jenkins wheel417 Dec 19 2014 id_rsa2048.oub -rw-r--r-- 1 jenkins wheel 10508 Apr 14 13:47 known_hosts The simpliest fix is to copy nbslave7h:/home/jenkins/.ssh/id_rsa in review.gluster.org:~nb7build/.ssh/authorized_keys but I cannot do that. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] bluild.gluster.org self signed cert
Hi build.gluster.org presented me a self-signed certificate. I accepted it, but please someone confirm it is on purpose. While there, startssl offers free certs... -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Public key problem on new vms for NetBSD
On Fri, Jun 19, 2015 at 11:45:04AM +0530, Pranith Kumar Karampuri wrote: > I see that NetBSD regressions are passing but not able to give +1 > because of following problem: > + ssh 'nb7bu...@review.gluster.org' gerrit review --message > ''\''http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/7046/consoleFull > : SUCCESS'\''' --project=glusterfs --code-review=0 '--verified=+1' > 276ba2dbd076a2c4b86e8afd0eaf2db7376ea2a8 > Permission denied (publickey). Someone renegerated ~jenkins/.ssh/id_rsa on a few nodes. I removed the new id_rsa and id_rsa.pub and replaced id_rsa by the right one copied from a machine where it worked. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
Vijay Bellur wrote: > I did dare just now and have rebooted Jenkins :). Let us see how this > iteration works out. Excellent! That fixed the Jenkins resolution problem, and we now have 10 NetBSD slave VM online. So we have two problems and their fixes available, for adding new VM: - Weak upstream DNS service: worked around by /etc/hosts (a secondary DNS would be more automatic, but at least it works) - Jenkins has a DNS cache and needs a restart How do ongoing jobs behaved on Jenkins restart? Did you have to restart them all or did Jenkins care of it? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
Justin Clift wrote: > If the DNS problem does turn out to be the dodgy iWeb hardware firewall, > then this fixes the DNS issue. (if not... well damn!) The DNS problem was worked around by installing a /etc/hosts, but jenkins does not realize it is there. It should probably be restarted, but nobody dare to try. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
Niels de Vos wrote: > I'm not sure what limitation you mean. Did we reach the limit of slaves > that Jenkins can reasonably address? No I mean its inability to catch a new DNS record. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Thu, Jun 18, 2015 at 10:19:27AM +0200, Niels de Vos wrote: > Good to know, but it would be much more helpful if someone could install > VMs there and add them to the Jenkins instance... Who can do that, or > who can guide someone else to get it done? How will that help, since we are having problems with Jenkin's ability to get more hosts? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
Niels de Vos wrote: > Maybe, but I hope those issues stay masked when resolving the hostnames > is more stable. When we have the other servers up and running, we would > have a better understanding and options to investigate issues like this. But Jenkins is still unable to launch an agent on e.g. nbslave75. Perhaps it needs to be restarted? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Status of nbslave7x
On Wed, Jun 17, 2015 at 03:00:29PM +, Emmanuel Dreyfus wrote: > Oh no, it did, but nuked them all almost instantly (see below). I > disabled it again. Basically we have borken jenkins setups, and DNS > trouble prevent us from adding new VM. What a mess. I retriggered most of the jobs, but at soem time the webUI refreshed and I lose track of what jobs I already retriggered or not. I left as is. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Status of nbslave7x
On Wed, Jun 17, 2015 at 08:34:06PM +0530, Kaushal M wrote: > Would restarting jenkins once help? It might help it pick up the newly > added entries to the hosts file. Won't it break all running jobs? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Status of nbslave7x
On Wed, Jun 17, 2015 at 02:57:28PM +, Emmanuel Dreyfus wrote: > I re-enabled it and it went online, but it does not seems to pick a job. Oh no, it did, but nuked them all almost instantly (see below). I disabled it again. Basically we have borken jenkins setups, and DNS trouble prevent us from adding new VM. What a mess. Triggered by Gerrit: http://review.gluster.org/11264 in silent mode. Building remotely on nbslave71.cloud.gluster.org (netbsd7_regression) in workspace /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered java.io.IOException: remote file operation failed: /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered at hudson.remoting.Channel@1f76c8cf:nbslave71.cloud.gluster.org: hudson.remoting.ChannelClosedException: channel is already closed at hudson.FilePath.act(FilePath.java:987) at hudson.FilePath.act(FilePath.java:969) at hudson.FilePath.mkdirs(FilePath.java:1152) at hudson.model.AbstractProject.checkout(AbstractProject.java:1269) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:610) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:532) at hudson.model.Run.execute(Run.java:1744) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:374) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:550) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:752) at hudson.FilePath.act(FilePath.java:980) ... 10 more Caused by: java.io.IOException at hudson.remoting.Channel.close(Channel.java:1110) at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118) at hudson.remoting.PingThread.ping(PingThread.java:126) at hudson.remoting.PingThread.run(PingThread.java:85) Caused by: java.util.concurrent.TimeoutException: Ping started at 1433860950328 hasn't completed by 1433861190328 ... 2 more Finished: FAILURE -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Status of nbslave7x
On Wed, Jun 17, 2015 at 07:39:06PM +0530, Kaushal M wrote: > nbslave7{d..f} were the entries created by Vijay last week, which were > resolving to nbslave71; there were no actual vms on rackspace. I had > disabled nbslave71 at that point in time to reboot it, but I think I > forgot to re-enable it. I re-enabled it and it went online, but it does not seems to pick a job. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] Status of nbslave7x
Status of NetBSD slave VM: 1 booked: nbslave71 It is noted to be disconnected by amarts. Is usage over? 3 removed from rackspace but still in jenkins: nbslave7d, nbslave7e, nbslave7f 6 active: nbslave72, nbslave77, nbslave7c, nbslave7g, nbslave7i, nbslave7j 3 offline: nbslave74 nbslave75 nbslave79 The 3 DNS records do not resolve (timeout) from build.gluster.org, while they do at mine. Adding them to /etc/hosts helps a lot on the command line, and it becomes possible to connect to port 22. But jenkins is still unable to connect and launch the agent. tcpdump on build.gluster;org shows it does not even tries. Perhaps there is a name cache in jenkisn and it needs to be restarted? I am leaving the /etc/hosts file loaded with nbslave74 nbslave75 nbslave79 -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 07:44:14AM -0400, Vijay Bellur wrote: > Do we still have the NFS crash that was causing tests to hang? Do we still have it on rebased patchsets? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 11:48:46AM +0200, Michael Scherer wrote: > And I think the DNS issues are just a symptom of a bigger network issue, > having local DNS might just mask the problem and which would then be non > DNS related ( like tcp connexion not working ). Well, if it is lost packets, TCP is more resistant, and if it is an overloaded DNS server, the problem is only for DNS. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 11:05:38AM +0200, Niels de Vos wrote: > I've already scripted the reboot-vm job to use Rackspace API, the DNS > requesting and formatting the results into some file can't be that > difficult. Let me know if a /etc/hosts format would do, or if you expect > something else. Perhaps a /etc/hosts would do it: jenkins launches the ssh command, and ssh should use /etc/hosts before the DNS. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Wed, Jun 17, 2015 at 11:59:22AM +0530, Kaushal M wrote: > cloud.gluster.org is served by Rackspace Cloud DNS Perhaps we can change that and setup a DNS for the zone? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
Venky Shankar wrote: > If that's the case, then I'll vote for this even if it takes some time > to get things in workable state. See my other mail about this: you enter a new slave VM in the DNS and it does not resolve, or somethimes you get 20s delays. I am convinced this is the reason why Jenkins bugs. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
Atin Mukherjee wrote: > > That *might* result in lots of NetBSD regression failures later on and > > we may end up with another round of fixups. > Agreed, that's the known risk but we don't have any other alternatives atm. I strongly disagree, we have a good alternative: configure a secondary DNS on build.gluster.org for the cloud.gluster.org zone. I could do the local configuration, but someone with administrative access will have to touch primary configuration to allow zone transfer (and enable notifications). The current situation is that we have 14 NetBSD VM online and only 5 are capable of running jobs because of various infrastructure configuration problems, broken DNS being the first offender. Another issue is the hanging NFS mounts (ps -axl shows dd stuck in wchan tstile), for which I had a change merged that should fix the problem, but only for rebased changes. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] Reduce regression runs wait time - New gerrit/review work flow
On Mon, Jun 15, 2015 at 10:09:33AM -0400, Jeff Darcy wrote: > As long as there's some visible marking on the summary pages to > distinguish patches that have passed smoke vs. those that haven't, I > think we're good. gerrit manual says you can add more comulns like revview and verified. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Thu, Jun 11, 2015 at 04:04:44PM +0200, Niels de Vos wrote: > Michael installed and configured dnsmasq on build.gluster.org yesterday. > If that does not help today, we need other ideas... Just to confirm the problem: [manu@build ~]$ time nslookup nbslave7i.cloud.gluster.org ;; connection timed out; trying next origin ;; connection timed out; no servers could be reached real0m20.013s user0m0.002s sys 0m0.012s Having a local cache does not help because upstream DNS service is weak. Without the local cache, individual processes crave for a reply, and with the local server, the local server crave itself crave for a reply. And here upstream DNS is really at fault: at mine I get a reply in 0.29s. We need to configure a local authoritative secondary DNS for the zone, so that the answer is always available locally wihtout having to rely on outside's infrastructure. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Thu, Jun 11, 2015 at 12:51:52PM +0200, Niels de Vos wrote: > I've just checked the online NetBSD slaves again, but they seem to have > been configured correctly... Maybe we are hitting a Jenkins bug, or > there was a (temporary?) issue with DNS resolution? DNS resolution is wrecked on build.gluster.org: I tried a tcpdump to diagnose the problem and: tcpdump: unknown host 'nbslave71.cloud.gluster.org' Another attmpt gives me the correct answer after more than 5 seconds. I am almost convinced that a local named on build.gluster.org would help a lot. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Thu, Jun 11, 2015 at 07:26:00AM +, Emmanuel Dreyfus wrote: > In my opinion the fix to this problem is to start new VM. I was busy > on other fronts hence I did not watched the situation, but it is still > grim, with most NetBSD slaves been in screwed state. We need to spin > more. Launching the slave on the new VM fails, but for once we have a maningful error: either DNS names are duplicated, or jenkins has a bug. <===[JENKINS REMOTING CAPACITY]===>ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins. java.lang.IllegalStateException: Already connected at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:466) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:371) at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:945) at hudson.plugins.sshslaves.SSHLauncher.access$400(SSHLauncher.java:133) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:696) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) [06/11/15 02:37:46] Launch failed - cleaning up connection [06/11/15 02:37:46] [SSH] Connection closed. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Thu, Jun 11, 2015 at 12:57:58PM +0530, Kaushal M wrote: > The problem was nbslave71. It used to be picked first for all changes > and would fail instantly. I've disabled it now. The other slaves are > working correctly. Saddly the Jenkins upgrade did not help here. Last time I investigated the failure was caused by the master breaking connexion, but I was not able to understand why. I w once able to receover a VM by fiddeling the jenkins configuration in web UI, but experimenting is not easy, as a miss will drain all the queue into complete failures. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] NetBSD regressions not being triggered for patches
On Thu, Jun 11, 2015 at 12:39:43PM +0530, Atin Mukherjee wrote: > Can we start merging patches with out NetBSD's vote? Currently we have > so many patches waiting for NetBSD's vote and it seems like no vms are > apparently running as well. This is blocking us to move forward. In my opinion the fix to this problem is to start new VM. I was busy on other fronts hence I did not watched the situation, but it is still grim, with most NetBSD slaves been in screwed state. We need to spin more. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] NetBSD slaves
Vijay Bellur wrote: > This certainly does explain the baffling behavior I just had a look, nbslave7[1cde] are stuck. nbslave71 does not accept SSH connexionx I rebooted nbslave7c nbslave[de] are not in the DNS. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] NetBSD slaves
On Tue, Jun 09, 2015 at 01:01:40PM +0530, Kaushal M wrote: > I cannot find the new VMs nbslave7{d..f} in rackspace. But jenkins can > still sees them. Anyone have any idea what's happening here? Botched DNS records? Unfortunately, creating VM needs many clics in rackspace web UI in order to have DNS set up. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] NetBSD slaves
Vijay Bellur wrote: > Rebooting is not working effectively for slaves like slave74. I have > created 3 new VMs nbslave7{d..f} for load balancing NetBSD regression > queue. If necessary we can spin a few ephemeral VMs over the weekend to > drain the queue. FWIW I planned nbslave7g after nbslave7f :-) When it does not reboot, a peek at the console may help. I guess this is a fsck problem. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] DNS issues in Jenkins infrastructure and/or on build.gluster.org
On Tue, May 19, 2015 at 11:09:03AM +0200, Niels de Vos wrote: > Yes, as long as it caches the correct DNS answers. For the occasion that > the answer from the upstream DNS is incorrect, it will get cached as > well. This could result in a longer time for problems, I think. I never saw incorect responses, just lack of response. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] DNS issues in Jenkins infrastructure and/or on build.gluster.org
On Tue, May 19, 2015 at 10:00:45AM +0200, Niels de Vos wrote: > I think there is a plan to decommission the current build.gluster.org > and move its services to a different server/datacenter. Hopefully DNS > will be more reliable soon. A local DNS on build.gluster.org will cache GlusterFS stuff and will speed up all DNS requests and make them more reliable at the same time. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Downtime for Jenkins
Another connectivity failure: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/5420 The slave VM uptime sugests it did not reboot during the build. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Downtime for Jenkins
Vijay Bellur wrote: > Around 9:25 UTC. There is this one that looks like the old bug: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/5410/console But the same machine (nbslave71) was at least able to run other jobs after this. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Downtime for Jenkins
Vijay Bellur wrote: > Manu - can you please verify and report back if the NetBSD slaves work > better with the upgraded Jenkins master? At what time the new jenkins started up? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] What is wrong with Jenkins
Jenkins decided to throw one more VM in the always failing case. Usual business, but I found a way to recover: I changed the way Jenkins was supposed to connect to the VM: through a command instead of SSH (I created ~/.ssh/id_rsa_nbslave for that) /usr/bin/ssh -oLogLevel=ERROR -oBatchMode=yes -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -i /var/lib/jenkins/.ssh/id_rsa_nbslave jenk...@nbslave71.cloud.gluster.org "/usr/pkg/java/openjdk7/bin/java -jar /home/jenkins/root/slave.jar" It did not work, but revering to using SSH fixed the problem, and the VM is now able to run jobs again. But there are other frustrating failures: for instance this one was disconnected during a run, I still wonder why: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/5380 -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra