Thanks misc. I have always seen a pattern that on a reattempt (recheck centos) the same builder is picked up many time even though it's promised to pick up the builders in a round robin manner.
On Thu, Apr 4, 2019 at 7:24 PM Michael Scherer <msche...@redhat.com> wrote: > Le jeudi 04 avril 2019 à 15:19 +0200, Michael Scherer a écrit : > > Le jeudi 04 avril 2019 à 13:53 +0200, Michael Scherer a écrit : > > > Le jeudi 04 avril 2019 à 16:13 +0530, Atin Mukherjee a écrit : > > > > Based on what I have seen that any multi node test case will fail > > > > and > > > > the > > > > above one is picked first from that group and If I am correct > > > > none > > > > of > > > > the > > > > code fixes will go through the regression until this is fixed. I > > > > suspect it > > > > to be an infra issue again. If we look at > > > > https://review.gluster.org/#/c/glusterfs/+/22501/ & > > > > https://build.gluster.org/job/centos7-regression/5382/ peer > > > > handshaking is > > > > stuck as 127.1.1.1 is unable to receive a response back, did we > > > > end > > > > up > > > > having firewall and other n/w settings screwed up? The test never > > > > fails > > > > locally. > > > > > > The firewall didn't change, and since the start has a line: > > > "-A INPUT -i lo -j ACCEPT", so all traffic on the localhost > > > interface > > > work. (I am not even sure that netfilter do anything meaningful on > > > the > > > loopback interface, but maybe I am wrong, and not keen on looking > > > kernel code for that). > > > > > > > > > Ping seems to work fine as well, so we can exclude a routing issue. > > > > > > Maybe we should look at the socket, does it listen to a specific > > > address or not ? > > > > So, I did look at the 20 first ailure, removed all not related to > > rebal-all-nodes-migrate.t and seen all were run on builder203, who > > was > > freshly reinstalled. As Deepshika noticed today, this one had a issue > > with ipv6, the 2nd issue we were tracking. > > > > Summary, rpcbind.socket systemd unit listen on ipv6 despites ipv6 > > being > > disabled, and the fix is to reload systemd. We have so far no idea on > > why it happen, but suspect this might be related to the network issue > > we did identify, as that happen only after a reboot, that happen only > > if a build is cancelled/crashed/aborted. > > > > I apply the workaround on builder203, so if the culprit is that > > specific issue, guess that's fixed. > > > > I started a test to see how it go: > > https://build.gluster.org/job/centos7-regression/5383/ > > The test did just pass, so I would assume the problem was local to > builder203. Not sure why it was always selected, except because this > was the only one that failed, so was always up for getting new jobs. > > Maybe we should increase the number of builder so this doesn't happen, > as I guess the others builders were busy at that time ? > > -- > Michael Scherer > Sysadmin, Community Infrastructure and Platform, OSAS > > >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel