[Gluster-infra] [Bug 1695484] smoke fails with "Build root is locked by another process"
https://bugzilla.redhat.com/show_bug.cgi?id=1695484 --- Comment #3 from M. Scherer --- So indeed, https://build.gluster.org/job/devrpm-fedora/15404/ aborted the patch test, then https://build.gluster.org/job/devrpm-fedora/15405/ failed. but the next run worked. Maybe the problem is that it take more than 30 seconds to clean the build or something similar. Maybe we need to add some more time, but I can't seems to find a log to evaluate how long it does take when things are cancelled. Let's keep stuff opened if the issue arise again to collect the log, and see if there is a pattern. -- You are receiving this mail because: You are on the CC list for the bug. ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?
Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit : > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan > wrote: > > > Hi, > > > > is_nfs_export_available is just a wrapper around "showmount" > > command AFAIR. > > I saw following messages in console output. > > mount.nfs: rpc.statd is not running but is required for remote > > locking. > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local, or > > start > > statd. > > 05:06:55 mount.nfs: an incorrect mount option was specified > > > > For me it looks rpcbind may not be running on the machine. > > Usually rpcbind starts automatically on machines, don't know > > whether it > > can happen or not. > > > > That's precisely what the question is. Why suddenly we're seeing this > happening too frequently. Today I saw atleast 4 to 5 such failures > already. > > Deepshika - Can you please help in inspecting this? So we think (we are not sure) that the issue is a bit complex. What we were investigating was nightly run fail on aws. When the build crash, the builder is restarted, since that's the easiest way to clean everything (since even with a perfect test suite that would clean itself, we could always end in a corrupt state on the system, WRT mount, fs, etc). In turn, this seems to cause trouble on aws, since cloud-init or something rename eth0 interface to ens5, without cleaning to the network configuration. So the network init script fail (because the image say "start eth0" and that's not present), but fail in a weird way. Network is initialised and working (we can connect), but the dhclient process is not in the right cgroup, and network.service is in failed state. Restarting network didn't work. In turn, this mean that rpc-statd refuse to start (due to systemd dependencies), which seems to impact various NFS tests. We have also seen that on some builders, rpcbind pick some IP v6 autoconfiguration, but we can't reproduce that, and there is no ip v6 set up anywhere. I suspect the network.service failure is somehow involved, but fail to see how. In turn, rpcbind.socket not starting could cause NFS test troubles. Our current stop gap fix was to fix all the builders one by one. Remove the config, kill the rogue dhclient, restart network service. However, we can't be sure this is going to fix the problem long term since this only manifest after a crash of the test suite, and it doesn't happen so often. (plus, it was working before some day in the past, when something did make this fail, and I do not know if that's a system upgrade, or a test change, or both). So we are still looking at it to have a complete understanding of the issue, but so far, we hacked our way to make it work (or so do I think). Deepshika is working to fix it long term, by fixing the issue regarding eth0/ens5 with a new base image. -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?
Le mercredi 03 avril 2019 à 15:12 +0300, Yaniv Kaul a écrit : > On Wed, Apr 3, 2019 at 2:53 PM Michael Scherer > wrote: > > > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit : > > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan < > > > jthot...@redhat.com> > > > wrote: > > > > > > > Hi, > > > > > > > > is_nfs_export_available is just a wrapper around "showmount" > > > > command AFAIR. > > > > I saw following messages in console output. > > > > mount.nfs: rpc.statd is not running but is required for remote > > > > locking. > > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local, > > > > or > > > > start > > > > statd. > > > > 05:06:55 mount.nfs: an incorrect mount option was specified > > > > > > > > For me it looks rpcbind may not be running on the machine. > > > > Usually rpcbind starts automatically on machines, don't know > > > > whether it > > > > can happen or not. > > > > > > > > > > That's precisely what the question is. Why suddenly we're seeing > > > this > > > happening too frequently. Today I saw atleast 4 to 5 such > > > failures > > > already. > > > > > > Deepshika - Can you please help in inspecting this? > > > > So in the past, this kind of stuff did happen with ipv6, so this > > could > > be a change on AWS and/or a upgrade. > > > > We need to enable IPv6, for two reasons: > 1. IPv6 is common these days, even if we don't test with it, it > should be > there. > 2. We should test with IPv6... > > I'm not sure, but I suspect we do disable IPv6 here and there. > Example[1]. > Y. > > [1] > https://github.com/gluster/centosci/blob/master/jobs/scripts/glusto/setup-glusto.yml We do disable ipv6 for sure, Nigel spent 3 days just on that for the AWS migration, and we do have a dedicated playbook applied on all builders that try to disable everything in every possible way: https://github.com/gluster/gluster.org_ansible_configuration/blob/master/roles/jenkins_builder/tasks/disable_ipv6_linux.yml According to the comment, that's from 2016, and I am sure this go further in the past since it wasn't just documented before. -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?
On Wed, Apr 3, 2019 at 2:53 PM Michael Scherer wrote: > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit : > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan > > wrote: > > > > > Hi, > > > > > > is_nfs_export_available is just a wrapper around "showmount" > > > command AFAIR. > > > I saw following messages in console output. > > > mount.nfs: rpc.statd is not running but is required for remote > > > locking. > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local, or > > > start > > > statd. > > > 05:06:55 mount.nfs: an incorrect mount option was specified > > > > > > For me it looks rpcbind may not be running on the machine. > > > Usually rpcbind starts automatically on machines, don't know > > > whether it > > > can happen or not. > > > > > > > That's precisely what the question is. Why suddenly we're seeing this > > happening too frequently. Today I saw atleast 4 to 5 such failures > > already. > > > > Deepshika - Can you please help in inspecting this? > > So in the past, this kind of stuff did happen with ipv6, so this could > be a change on AWS and/or a upgrade. > We need to enable IPv6, for two reasons: 1. IPv6 is common these days, even if we don't test with it, it should be there. 2. We should test with IPv6... I'm not sure, but I suspect we do disable IPv6 here and there. Example[1]. Y. [1] https://github.com/gluster/centosci/blob/master/jobs/scripts/glusto/setup-glusto.yml > > We are currently investigating a set of failure that happen after > reboot (resulting in partial network bring up, causing all kind of > weird issue), but it take some time to verify it, and since we lost 33% > of the team with Nigel departure, stuff do not move as fast as before. > > > -- > Michael Scherer > Sysadmin, Community Infrastructure and Platform, OSAS > > > ___ > Gluster-devel mailing list > gluster-de...@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?
Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit : > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan > wrote: > > > Hi, > > > > is_nfs_export_available is just a wrapper around "showmount" > > command AFAIR. > > I saw following messages in console output. > > mount.nfs: rpc.statd is not running but is required for remote > > locking. > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local, or > > start > > statd. > > 05:06:55 mount.nfs: an incorrect mount option was specified > > > > For me it looks rpcbind may not be running on the machine. > > Usually rpcbind starts automatically on machines, don't know > > whether it > > can happen or not. > > > > That's precisely what the question is. Why suddenly we're seeing this > happening too frequently. Today I saw atleast 4 to 5 such failures > already. > > Deepshika - Can you please help in inspecting this? So in the past, this kind of stuff did happen with ipv6, so this could be a change on AWS and/or a upgrade. We are currently investigating a set of failure that happen after reboot (resulting in partial network bring up, causing all kind of weird issue), but it take some time to verify it, and since we lost 33% of the team with Nigel departure, stuff do not move as fast as before. -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?
On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan wrote: > Hi, > > is_nfs_export_available is just a wrapper around "showmount" command AFAIR. > I saw following messages in console output. > mount.nfs: rpc.statd is not running but is required for remote locking. > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local, or start > statd. > 05:06:55 mount.nfs: an incorrect mount option was specified > > For me it looks rpcbind may not be running on the machine. > Usually rpcbind starts automatically on machines, don't know whether it > can happen or not. > That's precisely what the question is. Why suddenly we're seeing this happening too frequently. Today I saw atleast 4 to 5 such failures already. Deepshika - Can you please help in inspecting this? > Regards, > Jiffin > > > - Original Message - > From: "Atin Mukherjee" > To: "gluster-infra" , "Gluster Devel" < > gluster-de...@gluster.org> > Sent: Wednesday, April 3, 2019 10:46:51 AM > Subject: [Gluster-devel] is_nfs_export_available from nfs.rc failing too > often? > > I'm observing the above test function failing too often because of which > arbiter-mount.t test fails in many regression jobs. Such frequency of > failures wasn't there earlier. Does anyone know what has changed recently > to cause these failures in regression? I also hear when such failure > happens a reboot is required, is that true and if so why? > > One of the reference : > https://build.gluster.org/job/centos7-regression/5340/consoleFull > > > ___ > Gluster-devel mailing list > gluster-de...@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1695484] smoke fails with "Build root is locked by another process"
https://bugzilla.redhat.com/show_bug.cgi?id=1695484 M. Scherer changed: What|Removed |Added CC||msche...@redhat.com --- Comment #2 from M. Scherer --- Mhh, then shouldn't we clean up when there is something that do stop the build ? -- You are receiving this mail because: You are on the CC list for the bug. ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1695484] smoke fails with "Build root is locked by another process"
https://bugzilla.redhat.com/show_bug.cgi?id=1695484 Deepshikha khandelwal changed: What|Removed |Added CC||dkhan...@redhat.com --- Comment #1 from Deepshikha khandelwal --- It happens mainly because your previously running build was aborted by a new patchset and hence no cleanup. Re-triggering might help. -- You are receiving this mail because: You are on the CC list for the bug. ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1695484] New: smoke fails with "Build root is locked by another process"
https://bugzilla.redhat.com/show_bug.cgi?id=1695484 Bug ID: 1695484 Summary: smoke fails with "Build root is locked by another process" Product: GlusterFS Version: mainline Status: NEW Component: project-infrastructure Assignee: b...@gluster.org Reporter: pkara...@redhat.com CC: b...@gluster.org, gluster-infra@gluster.org Target Milestone: --- Classification: Community Description of problem: Please check https://build.gluster.org/job/devrpm-fedora/15405/console for more details. Smoke is failing with the reason mentioned in the subject. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: -- You are receiving this mail because: You are on the CC list for the bug. ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra