Re: [Gluster-devel] [Gluster-infra] rebal-all-nodes-migrate.t always fails now

Yaniv Kaul Thu, 04 Apr 2019 09:11:35 -0700

I'm not convinced this is solved. Just had what I believe is a similar
failure:


*00:12:02.532* A dependency job for rpc-statd.service failed. See
'journalctl -xe' for details.*00:12:02.532* mount.nfs: rpc.statd is
not running but is required for remote locking.*00:12:02.532*
mount.nfs: Either use '-o nolock' to keep locks local, or start
statd.*00:12:02.532* mount.nfs: an incorrect mount option was
specified

(of course, it can always be my patch!)

https://build.gluster.org/job/centos7-regression/5384/console


On Thu, Apr 4, 2019 at 6:56 PM Atin Mukherjee <amukh...@redhat.com> wrote:

> Thanks misc. I have always seen a pattern that on a reattempt (recheck
> centos) the same builder is picked up many time even though it's promised
> to pick up the builders in a round robin manner.
>
> On Thu, Apr 4, 2019 at 7:24 PM Michael Scherer <msche...@redhat.com>
> wrote:
>
>> Le jeudi 04 avril 2019 à 15:19 +0200, Michael Scherer a écrit :
>> > Le jeudi 04 avril 2019 à 13:53 +0200, Michael Scherer a écrit :
>> > > Le jeudi 04 avril 2019 à 16:13 +0530, Atin Mukherjee a écrit :
>> > > > Based on what I have seen that any multi node test case will fail
>> > > > and
>> > > > the
>> > > > above one is picked first from that group and If I am correct
>> > > > none
>> > > > of
>> > > > the
>> > > > code fixes will go through the regression until this is fixed. I
>> > > > suspect it
>> > > > to be an infra issue again. If we look at
>> > > > https://review.gluster.org/#/c/glusterfs/+/22501/ &
>> > > > https://build.gluster.org/job/centos7-regression/5382/ peer
>> > > > handshaking is
>> > > > stuck as 127.1.1.1 is unable to receive a response back, did we
>> > > > end
>> > > > up
>> > > > having firewall and other n/w settings screwed up? The test never
>> > > > fails
>> > > > locally.
>> > >
>> > > The firewall didn't change, and since the start has a line:
>> > > "-A INPUT -i lo -j ACCEPT", so all traffic on the localhost
>> > > interface
>> > > work. (I am not even sure that netfilter do anything meaningful on
>> > > the
>> > > loopback interface, but maybe I am wrong, and not keen on looking
>> > > kernel code for that).
>> > >
>> > >
>> > > Ping seems to work fine as well, so we can exclude a routing issue.
>> > >
>> > > Maybe we should look at the socket, does it listen to a specific
>> > > address or not ?
>> >
>> > So, I did look at the 20 first ailure, removed all not related to
>> > rebal-all-nodes-migrate.t and seen all were run on builder203, who
>> > was
>> > freshly reinstalled. As Deepshika noticed today, this one had a issue
>> > with ipv6, the 2nd issue we were tracking.
>> >
>> > Summary, rpcbind.socket systemd unit listen on ipv6 despites ipv6
>> > being
>> > disabled, and the fix is to reload systemd. We have so far no idea on
>> > why it happen, but suspect this might be related to the network issue
>> > we did identify, as that happen only after a reboot, that happen only
>> > if a build is cancelled/crashed/aborted.
>> >
>> > I apply the workaround on builder203, so if the culprit is that
>> > specific issue, guess that's fixed.
>> >
>> > I started a test to see how it go:
>> > https://build.gluster.org/job/centos7-regression/5383/
>>
>> The test did just pass, so I would assume the problem was local to
>> builder203. Not sure why it was always selected, except because this
>> was the only one that failed, so was always up for getting new jobs.
>>
>> Maybe we should increase the number of builder so this doesn't happen,
>> as I guess the others builders were busy at that time ?
>>
>> --
>> Michael Scherer
>> Sysadmin, Community Infrastructure and Platform, OSAS
>>
>>
>> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-infra] rebal-all-nodes-migrate.t always fails now

Reply via email to