subject:"Re\: \[Gluster\-infra\] \[Gluster\-devel\] is_nfs_export_available from nfs.rc failing too often\?"

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-07-18 Thread Sanju Rakonde

Deepshikha,

I see the failure here[1] which ran on builder206. So, we are good.

[1] https://build.gluster.org/job/centos7-regression/5901/consoleFull

On Wed, May 8, 2019 at 12:23 AM Deepshikha Khandelwal 
wrote:

> Sanju, can you please give us more info about the failures.
>
> I see the failures occurring on just one of the builder (builder206). I'm
> taking it back offline for now.
>
> On Tue, May 7, 2019 at 9:42 PM Michael Scherer 
> wrote:
>
>> Le mardi 07 mai 2019 à 20:04 +0530, Sanju Rakonde a écrit :
>> > Looks like is_nfs_export_available started failing again in recent
>> > centos-regressions.
>> >
>> > Michael, can you please check?
>>
>> I will try but I am leaving for vacation tonight, so if I find nothing,
>> until I leave, I guess Deepshika will have to look.
>>
>> > On Wed, Apr 24, 2019 at 5:30 PM Yaniv Kaul  wrote:
>> >
>> > >
>> > >
>> > > On Tue, Apr 23, 2019 at 5:15 PM Michael Scherer <
>> > > msche...@redhat.com>
>> > > wrote:
>> > >
>> > > > Le lundi 22 avril 2019 à 22:57 +0530, Atin Mukherjee a écrit :
>> > > > > Is this back again? The recent patches are failing regression
>> > > > > :-\ .
>> > > >
>> > > > So, on builder206, it took me a while to find that the issue is
>> > > > that
>> > > > nfs (the service) was running.
>> > > >
>> > > > ./tests/basic/afr/tarissue.t failed, because the nfs
>> > > > initialisation
>> > > > failed with a rather cryptic message:
>> > > >
>> > > > [2019-04-23 13:17:05.371733] I
>> > > > [socket.c:991:__socket_server_bind] 0-
>> > > > socket.nfs-server: process started listening on port (38465)
>> > > > [2019-04-23 13:17:05.385819] E
>> > > > [socket.c:972:__socket_server_bind] 0-
>> > > > socket.nfs-server: binding to  failed: Address already in use
>> > > > [2019-04-23 13:17:05.385843] E
>> > > > [socket.c:974:__socket_server_bind] 0-
>> > > > socket.nfs-server: Port is already in use
>> > > > [2019-04-23 13:17:05.385852] E [socket.c:3788:socket_listen] 0-
>> > > > socket.nfs-server: __socket_server_bind failed;closing socket 14
>> > > >
>> > > > I found where this came from, but a few stuff did surprised me:
>> > > >
>> > > > - the order of print is different that the order in the code
>> > > >
>> > >
>> > > Indeed strange...
>> > >
>> > > > - the message on "started listening" didn't take in account the
>> > > > fact
>> > > > that bind failed on:
>> > > >
>> > >
>> > > Shouldn't it bail out if it failed to bind?
>> > > Some missing 'goto out' around line 975/976?
>> > > Y.
>> > >
>> > > >
>> > > >
>> > > >
>> > > >
>>
>> https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/socket/src/socket.c#L967
>> > > >
>> > > > The message about port 38465 also threw me off the track. The
>> > > > real
>> > > > issue is that the service nfs was already running, and I couldn't
>> > > > find
>> > > > anything listening on port 38465
>> > > >
>> > > > once I do service nfs stop, it no longer failed.
>> > > >
>> > > > So far, I do know why nfs.service was activated.
>> > > >
>> > > > But at least, 206 should be fixed, and we know a bit more on what
>> > > > would
>> > > > be causing some failure.
>> > > >
>> > > >
>> > > >
>> > > > > On Wed, 3 Apr 2019 at 19:26, Michael Scherer <
>> > > > > msche...@redhat.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a
>> > > > > > écrit :
>> > > > > > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan <
>> > > > > > > jthot...@redhat.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > is_nfs_export_available is just a wrapper around
>> > > > > > > > "showmount"
>> > > > > > > > command AFAIR.
>> > > > > > > > I saw following messages in console output.
>> > > > > > > >  mount.nfs: rpc.statd is not running but is required for
>> > > > > > > > remote
>> > > > > > > > locking.
>> > > > > > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks
>> > > > > > > > local,
>> > > > > > > > or
>> > > > > > > > start
>> > > > > > > > statd.
>> > > > > > > > 05:06:55 mount.nfs: an incorrect mount option was
>> > > > > > > > specified
>> > > > > > > >
>> > > > > > > > For me it looks rpcbind may not be running on the
>> > > > > > > > machine.
>> > > > > > > > Usually rpcbind starts automatically on machines, don't
>> > > > > > > > know
>> > > > > > > > whether it
>> > > > > > > > can happen or not.
>> > > > > > > >
>> > > > > > >
>> > > > > > > That's precisely what the question is. Why suddenly we're
>> > > > > > > seeing
>> > > > > > > this
>> > > > > > > happening too frequently. Today I saw atleast 4 to 5 such
>> > > > > > > failures
>> > > > > > > already.
>> > > > > > >
>> > > > > > > Deepshika - Can you please help in inspecting this?
>> > > > > >
>> > > > > > So we think (we are not sure) that the issue is a bit
>> > > > > > complex.
>> > > > > >
>> > > > > > What we were investigating was nightly run fail on aws. When
>> > > > > > the
>> > > > > > build
>> > > > > > crash, the builder is restarted, since that's

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-07-18 Thread Sanju Rakonde

Looks like is_nfs_export_available started failing again in recent
centos-regressions.

Michael, can you please check?

On Wed, Apr 24, 2019 at 5:30 PM Yaniv Kaul  wrote:

>
>
> On Tue, Apr 23, 2019 at 5:15 PM Michael Scherer 
> wrote:
>
>> Le lundi 22 avril 2019 à 22:57 +0530, Atin Mukherjee a écrit :
>> > Is this back again? The recent patches are failing regression :-\ .
>>
>> So, on builder206, it took me a while to find that the issue is that
>> nfs (the service) was running.
>>
>> ./tests/basic/afr/tarissue.t failed, because the nfs initialisation
>> failed with a rather cryptic message:
>>
>> [2019-04-23 13:17:05.371733] I [socket.c:991:__socket_server_bind] 0-
>> socket.nfs-server: process started listening on port (38465)
>> [2019-04-23 13:17:05.385819] E [socket.c:972:__socket_server_bind] 0-
>> socket.nfs-server: binding to  failed: Address already in use
>> [2019-04-23 13:17:05.385843] E [socket.c:974:__socket_server_bind] 0-
>> socket.nfs-server: Port is already in use
>> [2019-04-23 13:17:05.385852] E [socket.c:3788:socket_listen] 0-
>> socket.nfs-server: __socket_server_bind failed;closing socket 14
>>
>> I found where this came from, but a few stuff did surprised me:
>>
>> - the order of print is different that the order in the code
>>
>
> Indeed strange...
>
>> - the message on "started listening" didn't take in account the fact
>> that bind failed on:
>>
>
> Shouldn't it bail out if it failed to bind?
> Some missing 'goto out' around line 975/976?
> Y.
>
>>
>>
>>
>> https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/socket/src/socket.c#L967
>>
>> The message about port 38465 also threw me off the track. The real
>> issue is that the service nfs was already running, and I couldn't find
>> anything listening on port 38465
>>
>> once I do service nfs stop, it no longer failed.
>>
>> So far, I do know why nfs.service was activated.
>>
>> But at least, 206 should be fixed, and we know a bit more on what would
>> be causing some failure.
>>
>>
>>
>> > On Wed, 3 Apr 2019 at 19:26, Michael Scherer 
>> > wrote:
>> >
>> > > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit :
>> > > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan <
>> > > > jthot...@redhat.com>
>> > > > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > is_nfs_export_available is just a wrapper around "showmount"
>> > > > > command AFAIR.
>> > > > > I saw following messages in console output.
>> > > > >  mount.nfs: rpc.statd is not running but is required for remote
>> > > > > locking.
>> > > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local,
>> > > > > or
>> > > > > start
>> > > > > statd.
>> > > > > 05:06:55 mount.nfs: an incorrect mount option was specified
>> > > > >
>> > > > > For me it looks rpcbind may not be running on the machine.
>> > > > > Usually rpcbind starts automatically on machines, don't know
>> > > > > whether it
>> > > > > can happen or not.
>> > > > >
>> > > >
>> > > > That's precisely what the question is. Why suddenly we're seeing
>> > > > this
>> > > > happening too frequently. Today I saw atleast 4 to 5 such
>> > > > failures
>> > > > already.
>> > > >
>> > > > Deepshika - Can you please help in inspecting this?
>> > >
>> > > So we think (we are not sure) that the issue is a bit complex.
>> > >
>> > > What we were investigating was nightly run fail on aws. When the
>> > > build
>> > > crash, the builder is restarted, since that's the easiest way to
>> > > clean
>> > > everything (since even with a perfect test suite that would clean
>> > > itself, we could always end in a corrupt state on the system, WRT
>> > > mount, fs, etc).
>> > >
>> > > In turn, this seems to cause trouble on aws, since cloud-init or
>> > > something rename eth0 interface to ens5, without cleaning to the
>> > > network configuration.
>> > >
>> > > So the network init script fail (because the image say "start eth0"
>> > > and
>> > > that's not present), but fail in a weird way. Network is
>> > > initialised
>> > > and working (we can connect), but the dhclient process is not in
>> > > the
>> > > right cgroup, and network.service is in failed state. Restarting
>> > > network didn't work. In turn, this mean that rpc-statd refuse to
>> > > start
>> > > (due to systemd dependencies), which seems to impact various NFS
>> > > tests.
>> > >
>> > > We have also seen that on some builders, rpcbind pick some IP v6
>> > > autoconfiguration, but we can't reproduce that, and there is no ip
>> > > v6
>> > > set up anywhere. I suspect the network.service failure is somehow
>> > > involved, but fail to see how. In turn, rpcbind.socket not starting
>> > > could cause NFS test troubles.
>> > >
>> > > Our current stop gap fix was to fix all the builders one by one.
>> > > Remove
>> > > the config, kill the rogue dhclient, restart network service.
>> > >
>> > > However, we can't be sure this is going to fix the problem long
>> > > term
>> > > since this only manifest after a crash of the test suite, and it

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-05-08 Thread Deepshikha Khandelwal

I took a quick look at the builders and noticed both have the same error of
'Cannot allocate memory' which comes up every time when the builder is
rebooted after a build abort. It is happening in the same pattern. Though
there's no such memory consumption on the builders.

I’m investigating more on this.

On Thu, May 9, 2019 at 10:02 AM Atin Mukherjee  wrote:

>
>
> On Wed, May 8, 2019 at 7:38 PM Atin Mukherjee  wrote:
>
>> builder204 needs to be fixed, too many failures, mostly none of the
>> patches are passing regression.
>>
>
> And with that builder201 joins the pool,
> https://build.gluster.org/job/centos7-regression/5943/consoleFull
>
>
>> On Wed, May 8, 2019 at 9:53 AM Atin Mukherjee 
>> wrote:
>>
>>>
>>>
>>> On Wed, May 8, 2019 at 7:16 AM Sanju Rakonde 
>>> wrote:
>>>
 Deepshikha,

 I see the failure here[1] which ran on builder206. So, we are good.

>>>
>>> Not really,
>>> https://build.gluster.org/job/centos7-regression/5909/consoleFull
>>> failed on builder204 for similar reasons I believe?
>>>
>>> I am bit more worried on this issue being resurfacing more often these
>>> days. What can we do to fix this permanently?
>>>
>>>
 [1] https://build.gluster.org/job/centos7-regression/5901/consoleFull

 On Wed, May 8, 2019 at 12:23 AM Deepshikha Khandelwal <
 dkhan...@redhat.com> wrote:

> Sanju, can you please give us more info about the failures.
>
> I see the failures occurring on just one of the builder (builder206).
> I'm taking it back offline for now.
>
> On Tue, May 7, 2019 at 9:42 PM Michael Scherer 
> wrote:
>
>> Le mardi 07 mai 2019 à 20:04 +0530, Sanju Rakonde a écrit :
>> > Looks like is_nfs_export_available started failing again in recent
>> > centos-regressions.
>> >
>> > Michael, can you please check?
>>
>> I will try but I am leaving for vacation tonight, so if I find
>> nothing,
>> until I leave, I guess Deepshika will have to look.
>>
>> > On Wed, Apr 24, 2019 at 5:30 PM Yaniv Kaul 
>> wrote:
>> >
>> > >
>> > >
>> > > On Tue, Apr 23, 2019 at 5:15 PM Michael Scherer <
>> > > msche...@redhat.com>
>> > > wrote:
>> > >
>> > > > Le lundi 22 avril 2019 à 22:57 +0530, Atin Mukherjee a écrit :
>> > > > > Is this back again? The recent patches are failing regression
>> > > > > :-\ .
>> > > >
>> > > > So, on builder206, it took me a while to find that the issue is
>> > > > that
>> > > > nfs (the service) was running.
>> > > >
>> > > > ./tests/basic/afr/tarissue.t failed, because the nfs
>> > > > initialisation
>> > > > failed with a rather cryptic message:
>> > > >
>> > > > [2019-04-23 13:17:05.371733] I
>> > > > [socket.c:991:__socket_server_bind] 0-
>> > > > socket.nfs-server: process started listening on port (38465)
>> > > > [2019-04-23 13:17:05.385819] E
>> > > > [socket.c:972:__socket_server_bind] 0-
>> > > > socket.nfs-server: binding to  failed: Address already in use
>> > > > [2019-04-23 13:17:05.385843] E
>> > > > [socket.c:974:__socket_server_bind] 0-
>> > > > socket.nfs-server: Port is already in use
>> > > > [2019-04-23 13:17:05.385852] E [socket.c:3788:socket_listen] 0-
>> > > > socket.nfs-server: __socket_server_bind failed;closing socket 14
>> > > >
>> > > > I found where this came from, but a few stuff did surprised me:
>> > > >
>> > > > - the order of print is different that the order in the code
>> > > >
>> > >
>> > > Indeed strange...
>> > >
>> > > > - the message on "started listening" didn't take in account the
>> > > > fact
>> > > > that bind failed on:
>> > > >
>> > >
>> > > Shouldn't it bail out if it failed to bind?
>> > > Some missing 'goto out' around line 975/976?
>> > > Y.
>> > >
>> > > >
>> > > >
>> > > >
>> > > >
>>
>> https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/socket/src/socket.c#L967
>> > > >
>> > > > The message about port 38465 also threw me off the track. The
>> > > > real
>> > > > issue is that the service nfs was already running, and I
>> couldn't
>> > > > find
>> > > > anything listening on port 38465
>> > > >
>> > > > once I do service nfs stop, it no longer failed.
>> > > >
>> > > > So far, I do know why nfs.service was activated.
>> > > >
>> > > > But at least, 206 should be fixed, and we know a bit more on
>> what
>> > > > would
>> > > > be causing some failure.
>> > > >
>> > > >
>> > > >
>> > > > > On Wed, 3 Apr 2019 at 19:26, Michael Scherer <
>> > > > > msche...@redhat.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a
>> > > > > > écrit :
>> > > > > > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan <
>> > > > > > > jthot...@redhat.com>

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-05-08 Thread Atin Mukherjee

On Wed, May 8, 2019 at 7:38 PM Atin Mukherjee  wrote:

> builder204 needs to be fixed, too many failures, mostly none of the
> patches are passing regression.
>

And with that builder201 joins the pool,
https://build.gluster.org/job/centos7-regression/5943/consoleFull


> On Wed, May 8, 2019 at 9:53 AM Atin Mukherjee  wrote:
>
>>
>>
>> On Wed, May 8, 2019 at 7:16 AM Sanju Rakonde  wrote:
>>
>>> Deepshikha,
>>>
>>> I see the failure here[1] which ran on builder206. So, we are good.
>>>
>>
>> Not really,
>> https://build.gluster.org/job/centos7-regression/5909/consoleFull failed
>> on builder204 for similar reasons I believe?
>>
>> I am bit more worried on this issue being resurfacing more often these
>> days. What can we do to fix this permanently?
>>
>>
>>> [1] https://build.gluster.org/job/centos7-regression/5901/consoleFull
>>>
>>> On Wed, May 8, 2019 at 12:23 AM Deepshikha Khandelwal <
>>> dkhan...@redhat.com> wrote:
>>>
 Sanju, can you please give us more info about the failures.

 I see the failures occurring on just one of the builder (builder206).
 I'm taking it back offline for now.

 On Tue, May 7, 2019 at 9:42 PM Michael Scherer 
 wrote:

> Le mardi 07 mai 2019 à 20:04 +0530, Sanju Rakonde a écrit :
> > Looks like is_nfs_export_available started failing again in recent
> > centos-regressions.
> >
> > Michael, can you please check?
>
> I will try but I am leaving for vacation tonight, so if I find nothing,
> until I leave, I guess Deepshika will have to look.
>
> > On Wed, Apr 24, 2019 at 5:30 PM Yaniv Kaul  wrote:
> >
> > >
> > >
> > > On Tue, Apr 23, 2019 at 5:15 PM Michael Scherer <
> > > msche...@redhat.com>
> > > wrote:
> > >
> > > > Le lundi 22 avril 2019 à 22:57 +0530, Atin Mukherjee a écrit :
> > > > > Is this back again? The recent patches are failing regression
> > > > > :-\ .
> > > >
> > > > So, on builder206, it took me a while to find that the issue is
> > > > that
> > > > nfs (the service) was running.
> > > >
> > > > ./tests/basic/afr/tarissue.t failed, because the nfs
> > > > initialisation
> > > > failed with a rather cryptic message:
> > > >
> > > > [2019-04-23 13:17:05.371733] I
> > > > [socket.c:991:__socket_server_bind] 0-
> > > > socket.nfs-server: process started listening on port (38465)
> > > > [2019-04-23 13:17:05.385819] E
> > > > [socket.c:972:__socket_server_bind] 0-
> > > > socket.nfs-server: binding to  failed: Address already in use
> > > > [2019-04-23 13:17:05.385843] E
> > > > [socket.c:974:__socket_server_bind] 0-
> > > > socket.nfs-server: Port is already in use
> > > > [2019-04-23 13:17:05.385852] E [socket.c:3788:socket_listen] 0-
> > > > socket.nfs-server: __socket_server_bind failed;closing socket 14
> > > >
> > > > I found where this came from, but a few stuff did surprised me:
> > > >
> > > > - the order of print is different that the order in the code
> > > >
> > >
> > > Indeed strange...
> > >
> > > > - the message on "started listening" didn't take in account the
> > > > fact
> > > > that bind failed on:
> > > >
> > >
> > > Shouldn't it bail out if it failed to bind?
> > > Some missing 'goto out' around line 975/976?
> > > Y.
> > >
> > > >
> > > >
> > > >
> > > >
>
> https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/socket/src/socket.c#L967
> > > >
> > > > The message about port 38465 also threw me off the track. The
> > > > real
> > > > issue is that the service nfs was already running, and I couldn't
> > > > find
> > > > anything listening on port 38465
> > > >
> > > > once I do service nfs stop, it no longer failed.
> > > >
> > > > So far, I do know why nfs.service was activated.
> > > >
> > > > But at least, 206 should be fixed, and we know a bit more on what
> > > > would
> > > > be causing some failure.
> > > >
> > > >
> > > >
> > > > > On Wed, 3 Apr 2019 at 19:26, Michael Scherer <
> > > > > msche...@redhat.com>
> > > > > wrote:
> > > > >
> > > > > > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a
> > > > > > écrit :
> > > > > > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan <
> > > > > > > jthot...@redhat.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > is_nfs_export_available is just a wrapper around
> > > > > > > > "showmount"
> > > > > > > > command AFAIR.
> > > > > > > > I saw following messages in console output.
> > > > > > > >  mount.nfs: rpc.statd is not running but is required for
> > > > > > > > remote
> > > > > > > > locking.
> > > > > > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks
> > > > > > > > local,
> > > > > >

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-05-08 Thread Atin Mukherjee

builder204 needs to be fixed, too many failures, mostly none of the patches
are passing regression.

On Wed, May 8, 2019 at 9:53 AM Atin Mukherjee  wrote:

>
>
> On Wed, May 8, 2019 at 7:16 AM Sanju Rakonde  wrote:
>
>> Deepshikha,
>>
>> I see the failure here[1] which ran on builder206. So, we are good.
>>
>
> Not really,
> https://build.gluster.org/job/centos7-regression/5909/consoleFull failed
> on builder204 for similar reasons I believe?
>
> I am bit more worried on this issue being resurfacing more often these
> days. What can we do to fix this permanently?
>
>
>> [1] https://build.gluster.org/job/centos7-regression/5901/consoleFull
>>
>> On Wed, May 8, 2019 at 12:23 AM Deepshikha Khandelwal <
>> dkhan...@redhat.com> wrote:
>>
>>> Sanju, can you please give us more info about the failures.
>>>
>>> I see the failures occurring on just one of the builder (builder206).
>>> I'm taking it back offline for now.
>>>
>>> On Tue, May 7, 2019 at 9:42 PM Michael Scherer 
>>> wrote:
>>>
 Le mardi 07 mai 2019 à 20:04 +0530, Sanju Rakonde a écrit :
 > Looks like is_nfs_export_available started failing again in recent
 > centos-regressions.
 >
 > Michael, can you please check?

 I will try but I am leaving for vacation tonight, so if I find nothing,
 until I leave, I guess Deepshika will have to look.

 > On Wed, Apr 24, 2019 at 5:30 PM Yaniv Kaul  wrote:
 >
 > >
 > >
 > > On Tue, Apr 23, 2019 at 5:15 PM Michael Scherer <
 > > msche...@redhat.com>
 > > wrote:
 > >
 > > > Le lundi 22 avril 2019 à 22:57 +0530, Atin Mukherjee a écrit :
 > > > > Is this back again? The recent patches are failing regression
 > > > > :-\ .
 > > >
 > > > So, on builder206, it took me a while to find that the issue is
 > > > that
 > > > nfs (the service) was running.
 > > >
 > > > ./tests/basic/afr/tarissue.t failed, because the nfs
 > > > initialisation
 > > > failed with a rather cryptic message:
 > > >
 > > > [2019-04-23 13:17:05.371733] I
 > > > [socket.c:991:__socket_server_bind] 0-
 > > > socket.nfs-server: process started listening on port (38465)
 > > > [2019-04-23 13:17:05.385819] E
 > > > [socket.c:972:__socket_server_bind] 0-
 > > > socket.nfs-server: binding to  failed: Address already in use
 > > > [2019-04-23 13:17:05.385843] E
 > > > [socket.c:974:__socket_server_bind] 0-
 > > > socket.nfs-server: Port is already in use
 > > > [2019-04-23 13:17:05.385852] E [socket.c:3788:socket_listen] 0-
 > > > socket.nfs-server: __socket_server_bind failed;closing socket 14
 > > >
 > > > I found where this came from, but a few stuff did surprised me:
 > > >
 > > > - the order of print is different that the order in the code
 > > >
 > >
 > > Indeed strange...
 > >
 > > > - the message on "started listening" didn't take in account the
 > > > fact
 > > > that bind failed on:
 > > >
 > >
 > > Shouldn't it bail out if it failed to bind?
 > > Some missing 'goto out' around line 975/976?
 > > Y.
 > >
 > > >
 > > >
 > > >
 > > >

 https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/socket/src/socket.c#L967
 > > >
 > > > The message about port 38465 also threw me off the track. The
 > > > real
 > > > issue is that the service nfs was already running, and I couldn't
 > > > find
 > > > anything listening on port 38465
 > > >
 > > > once I do service nfs stop, it no longer failed.
 > > >
 > > > So far, I do know why nfs.service was activated.
 > > >
 > > > But at least, 206 should be fixed, and we know a bit more on what
 > > > would
 > > > be causing some failure.
 > > >
 > > >
 > > >
 > > > > On Wed, 3 Apr 2019 at 19:26, Michael Scherer <
 > > > > msche...@redhat.com>
 > > > > wrote:
 > > > >
 > > > > > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a
 > > > > > écrit :
 > > > > > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan <
 > > > > > > jthot...@redhat.com>
 > > > > > > wrote:
 > > > > > >
 > > > > > > > Hi,
 > > > > > > >
 > > > > > > > is_nfs_export_available is just a wrapper around
 > > > > > > > "showmount"
 > > > > > > > command AFAIR.
 > > > > > > > I saw following messages in console output.
 > > > > > > >  mount.nfs: rpc.statd is not running but is required for
 > > > > > > > remote
 > > > > > > > locking.
 > > > > > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks
 > > > > > > > local,
 > > > > > > > or
 > > > > > > > start
 > > > > > > > statd.
 > > > > > > > 05:06:55 mount.nfs: an incorrect mount option was
 > > > > > > > specified
 > > > > > > >
 > > > > > > > For me it looks rpcbind may not be running on the
 > > > > > > > machine.
 > > > > > > > Usually

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-05-07 Thread Atin Mukherjee

On Wed, May 8, 2019 at 7:16 AM Sanju Rakonde  wrote:

> Deepshikha,
>
> I see the failure here[1] which ran on builder206. So, we are good.
>

Not really,
https://build.gluster.org/job/centos7-regression/5909/consoleFull failed on
builder204 for similar reasons I believe?

I am bit more worried on this issue being resurfacing more often these
days. What can we do to fix this permanently?


> [1] https://build.gluster.org/job/centos7-regression/5901/consoleFull
>
> On Wed, May 8, 2019 at 12:23 AM Deepshikha Khandelwal 
> wrote:
>
>> Sanju, can you please give us more info about the failures.
>>
>> I see the failures occurring on just one of the builder (builder206). I'm
>> taking it back offline for now.
>>
>> On Tue, May 7, 2019 at 9:42 PM Michael Scherer 
>> wrote:
>>
>>> Le mardi 07 mai 2019 à 20:04 +0530, Sanju Rakonde a écrit :
>>> > Looks like is_nfs_export_available started failing again in recent
>>> > centos-regressions.
>>> >
>>> > Michael, can you please check?
>>>
>>> I will try but I am leaving for vacation tonight, so if I find nothing,
>>> until I leave, I guess Deepshika will have to look.
>>>
>>> > On Wed, Apr 24, 2019 at 5:30 PM Yaniv Kaul  wrote:
>>> >
>>> > >
>>> > >
>>> > > On Tue, Apr 23, 2019 at 5:15 PM Michael Scherer <
>>> > > msche...@redhat.com>
>>> > > wrote:
>>> > >
>>> > > > Le lundi 22 avril 2019 à 22:57 +0530, Atin Mukherjee a écrit :
>>> > > > > Is this back again? The recent patches are failing regression
>>> > > > > :-\ .
>>> > > >
>>> > > > So, on builder206, it took me a while to find that the issue is
>>> > > > that
>>> > > > nfs (the service) was running.
>>> > > >
>>> > > > ./tests/basic/afr/tarissue.t failed, because the nfs
>>> > > > initialisation
>>> > > > failed with a rather cryptic message:
>>> > > >
>>> > > > [2019-04-23 13:17:05.371733] I
>>> > > > [socket.c:991:__socket_server_bind] 0-
>>> > > > socket.nfs-server: process started listening on port (38465)
>>> > > > [2019-04-23 13:17:05.385819] E
>>> > > > [socket.c:972:__socket_server_bind] 0-
>>> > > > socket.nfs-server: binding to  failed: Address already in use
>>> > > > [2019-04-23 13:17:05.385843] E
>>> > > > [socket.c:974:__socket_server_bind] 0-
>>> > > > socket.nfs-server: Port is already in use
>>> > > > [2019-04-23 13:17:05.385852] E [socket.c:3788:socket_listen] 0-
>>> > > > socket.nfs-server: __socket_server_bind failed;closing socket 14
>>> > > >
>>> > > > I found where this came from, but a few stuff did surprised me:
>>> > > >
>>> > > > - the order of print is different that the order in the code
>>> > > >
>>> > >
>>> > > Indeed strange...
>>> > >
>>> > > > - the message on "started listening" didn't take in account the
>>> > > > fact
>>> > > > that bind failed on:
>>> > > >
>>> > >
>>> > > Shouldn't it bail out if it failed to bind?
>>> > > Some missing 'goto out' around line 975/976?
>>> > > Y.
>>> > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>>
>>> https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/socket/src/socket.c#L967
>>> > > >
>>> > > > The message about port 38465 also threw me off the track. The
>>> > > > real
>>> > > > issue is that the service nfs was already running, and I couldn't
>>> > > > find
>>> > > > anything listening on port 38465
>>> > > >
>>> > > > once I do service nfs stop, it no longer failed.
>>> > > >
>>> > > > So far, I do know why nfs.service was activated.
>>> > > >
>>> > > > But at least, 206 should be fixed, and we know a bit more on what
>>> > > > would
>>> > > > be causing some failure.
>>> > > >
>>> > > >
>>> > > >
>>> > > > > On Wed, 3 Apr 2019 at 19:26, Michael Scherer <
>>> > > > > msche...@redhat.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a
>>> > > > > > écrit :
>>> > > > > > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan <
>>> > > > > > > jthot...@redhat.com>
>>> > > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > Hi,
>>> > > > > > > >
>>> > > > > > > > is_nfs_export_available is just a wrapper around
>>> > > > > > > > "showmount"
>>> > > > > > > > command AFAIR.
>>> > > > > > > > I saw following messages in console output.
>>> > > > > > > >  mount.nfs: rpc.statd is not running but is required for
>>> > > > > > > > remote
>>> > > > > > > > locking.
>>> > > > > > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks
>>> > > > > > > > local,
>>> > > > > > > > or
>>> > > > > > > > start
>>> > > > > > > > statd.
>>> > > > > > > > 05:06:55 mount.nfs: an incorrect mount option was
>>> > > > > > > > specified
>>> > > > > > > >
>>> > > > > > > > For me it looks rpcbind may not be running on the
>>> > > > > > > > machine.
>>> > > > > > > > Usually rpcbind starts automatically on machines, don't
>>> > > > > > > > know
>>> > > > > > > > whether it
>>> > > > > > > > can happen or not.
>>> > > > > > > >
>>> > > > > > >
>>> > > > > > > That's precisely what the question is. Why suddenly we're
>>> > > > > > > seeing
>>> > > > > > > this
>>> > > > > > > happening too frequently.

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-05-07 Thread Deepshikha Khandelwal

Sanju, can you please give us more info about the failures.

I see the failures occurring on just one of the builder (builder206). I'm
taking it back offline for now.

On Tue, May 7, 2019 at 9:42 PM Michael Scherer  wrote:

> Le mardi 07 mai 2019 à 20:04 +0530, Sanju Rakonde a écrit :
> > Looks like is_nfs_export_available started failing again in recent
> > centos-regressions.
> >
> > Michael, can you please check?
>
> I will try but I am leaving for vacation tonight, so if I find nothing,
> until I leave, I guess Deepshika will have to look.
>
> > On Wed, Apr 24, 2019 at 5:30 PM Yaniv Kaul  wrote:
> >
> > >
> > >
> > > On Tue, Apr 23, 2019 at 5:15 PM Michael Scherer <
> > > msche...@redhat.com>
> > > wrote:
> > >
> > > > Le lundi 22 avril 2019 à 22:57 +0530, Atin Mukherjee a écrit :
> > > > > Is this back again? The recent patches are failing regression
> > > > > :-\ .
> > > >
> > > > So, on builder206, it took me a while to find that the issue is
> > > > that
> > > > nfs (the service) was running.
> > > >
> > > > ./tests/basic/afr/tarissue.t failed, because the nfs
> > > > initialisation
> > > > failed with a rather cryptic message:
> > > >
> > > > [2019-04-23 13:17:05.371733] I
> > > > [socket.c:991:__socket_server_bind] 0-
> > > > socket.nfs-server: process started listening on port (38465)
> > > > [2019-04-23 13:17:05.385819] E
> > > > [socket.c:972:__socket_server_bind] 0-
> > > > socket.nfs-server: binding to  failed: Address already in use
> > > > [2019-04-23 13:17:05.385843] E
> > > > [socket.c:974:__socket_server_bind] 0-
> > > > socket.nfs-server: Port is already in use
> > > > [2019-04-23 13:17:05.385852] E [socket.c:3788:socket_listen] 0-
> > > > socket.nfs-server: __socket_server_bind failed;closing socket 14
> > > >
> > > > I found where this came from, but a few stuff did surprised me:
> > > >
> > > > - the order of print is different that the order in the code
> > > >
> > >
> > > Indeed strange...
> > >
> > > > - the message on "started listening" didn't take in account the
> > > > fact
> > > > that bind failed on:
> > > >
> > >
> > > Shouldn't it bail out if it failed to bind?
> > > Some missing 'goto out' around line 975/976?
> > > Y.
> > >
> > > >
> > > >
> > > >
> > > >
>
> https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/socket/src/socket.c#L967
> > > >
> > > > The message about port 38465 also threw me off the track. The
> > > > real
> > > > issue is that the service nfs was already running, and I couldn't
> > > > find
> > > > anything listening on port 38465
> > > >
> > > > once I do service nfs stop, it no longer failed.
> > > >
> > > > So far, I do know why nfs.service was activated.
> > > >
> > > > But at least, 206 should be fixed, and we know a bit more on what
> > > > would
> > > > be causing some failure.
> > > >
> > > >
> > > >
> > > > > On Wed, 3 Apr 2019 at 19:26, Michael Scherer <
> > > > > msche...@redhat.com>
> > > > > wrote:
> > > > >
> > > > > > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a
> > > > > > écrit :
> > > > > > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan <
> > > > > > > jthot...@redhat.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > is_nfs_export_available is just a wrapper around
> > > > > > > > "showmount"
> > > > > > > > command AFAIR.
> > > > > > > > I saw following messages in console output.
> > > > > > > >  mount.nfs: rpc.statd is not running but is required for
> > > > > > > > remote
> > > > > > > > locking.
> > > > > > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks
> > > > > > > > local,
> > > > > > > > or
> > > > > > > > start
> > > > > > > > statd.
> > > > > > > > 05:06:55 mount.nfs: an incorrect mount option was
> > > > > > > > specified
> > > > > > > >
> > > > > > > > For me it looks rpcbind may not be running on the
> > > > > > > > machine.
> > > > > > > > Usually rpcbind starts automatically on machines, don't
> > > > > > > > know
> > > > > > > > whether it
> > > > > > > > can happen or not.
> > > > > > > >
> > > > > > >
> > > > > > > That's precisely what the question is. Why suddenly we're
> > > > > > > seeing
> > > > > > > this
> > > > > > > happening too frequently. Today I saw atleast 4 to 5 such
> > > > > > > failures
> > > > > > > already.
> > > > > > >
> > > > > > > Deepshika - Can you please help in inspecting this?
> > > > > >
> > > > > > So we think (we are not sure) that the issue is a bit
> > > > > > complex.
> > > > > >
> > > > > > What we were investigating was nightly run fail on aws. When
> > > > > > the
> > > > > > build
> > > > > > crash, the builder is restarted, since that's the easiest way
> > > > > > to
> > > > > > clean
> > > > > > everything (since even with a perfect test suite that would
> > > > > > clean
> > > > > > itself, we could always end in a corrupt state on the system,
> > > > > > WRT
> > > > > > mount, fs, etc).
> > > > > >
> > > > > > In turn, this seems to cause trouble on aws, since cloud-init
> > > > > > or
>

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-05-07 Thread Michael Scherer

Le mardi 07 mai 2019 à 20:04 +0530, Sanju Rakonde a écrit :
> Looks like is_nfs_export_available started failing again in recent
> centos-regressions.
> 
> Michael, can you please check?

I will try but I am leaving for vacation tonight, so if I find nothing,
until I leave, I guess Deepshika will have to look.

> On Wed, Apr 24, 2019 at 5:30 PM Yaniv Kaul  wrote:
> 
> > 
> > 
> > On Tue, Apr 23, 2019 at 5:15 PM Michael Scherer <
> > msche...@redhat.com>
> > wrote:
> > 
> > > Le lundi 22 avril 2019 à 22:57 +0530, Atin Mukherjee a écrit :
> > > > Is this back again? The recent patches are failing regression
> > > > :-\ .
> > > 
> > > So, on builder206, it took me a while to find that the issue is
> > > that
> > > nfs (the service) was running.
> > > 
> > > ./tests/basic/afr/tarissue.t failed, because the nfs
> > > initialisation
> > > failed with a rather cryptic message:
> > > 
> > > [2019-04-23 13:17:05.371733] I
> > > [socket.c:991:__socket_server_bind] 0-
> > > socket.nfs-server: process started listening on port (38465)
> > > [2019-04-23 13:17:05.385819] E
> > > [socket.c:972:__socket_server_bind] 0-
> > > socket.nfs-server: binding to  failed: Address already in use
> > > [2019-04-23 13:17:05.385843] E
> > > [socket.c:974:__socket_server_bind] 0-
> > > socket.nfs-server: Port is already in use
> > > [2019-04-23 13:17:05.385852] E [socket.c:3788:socket_listen] 0-
> > > socket.nfs-server: __socket_server_bind failed;closing socket 14
> > > 
> > > I found where this came from, but a few stuff did surprised me:
> > > 
> > > - the order of print is different that the order in the code
> > > 
> > 
> > Indeed strange...
> > 
> > > - the message on "started listening" didn't take in account the
> > > fact
> > > that bind failed on:
> > > 
> > 
> > Shouldn't it bail out if it failed to bind?
> > Some missing 'goto out' around line 975/976?
> > Y.
> > 
> > > 
> > > 
> > > 
> > > 
https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/socket/src/socket.c#L967
> > > 
> > > The message about port 38465 also threw me off the track. The
> > > real
> > > issue is that the service nfs was already running, and I couldn't
> > > find
> > > anything listening on port 38465
> > > 
> > > once I do service nfs stop, it no longer failed.
> > > 
> > > So far, I do know why nfs.service was activated.
> > > 
> > > But at least, 206 should be fixed, and we know a bit more on what
> > > would
> > > be causing some failure.
> > > 
> > > 
> > > 
> > > > On Wed, 3 Apr 2019 at 19:26, Michael Scherer <
> > > > msche...@redhat.com>
> > > > wrote:
> > > > 
> > > > > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a
> > > > > écrit :
> > > > > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan <
> > > > > > jthot...@redhat.com>
> > > > > > wrote:
> > > > > > 
> > > > > > > Hi,
> > > > > > > 
> > > > > > > is_nfs_export_available is just a wrapper around
> > > > > > > "showmount"
> > > > > > > command AFAIR.
> > > > > > > I saw following messages in console output.
> > > > > > >  mount.nfs: rpc.statd is not running but is required for
> > > > > > > remote
> > > > > > > locking.
> > > > > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks
> > > > > > > local,
> > > > > > > or
> > > > > > > start
> > > > > > > statd.
> > > > > > > 05:06:55 mount.nfs: an incorrect mount option was
> > > > > > > specified
> > > > > > > 
> > > > > > > For me it looks rpcbind may not be running on the
> > > > > > > machine.
> > > > > > > Usually rpcbind starts automatically on machines, don't
> > > > > > > know
> > > > > > > whether it
> > > > > > > can happen or not.
> > > > > > > 
> > > > > > 
> > > > > > That's precisely what the question is. Why suddenly we're
> > > > > > seeing
> > > > > > this
> > > > > > happening too frequently. Today I saw atleast 4 to 5 such
> > > > > > failures
> > > > > > already.
> > > > > > 
> > > > > > Deepshika - Can you please help in inspecting this?
> > > > > 
> > > > > So we think (we are not sure) that the issue is a bit
> > > > > complex.
> > > > > 
> > > > > What we were investigating was nightly run fail on aws. When
> > > > > the
> > > > > build
> > > > > crash, the builder is restarted, since that's the easiest way
> > > > > to
> > > > > clean
> > > > > everything (since even with a perfect test suite that would
> > > > > clean
> > > > > itself, we could always end in a corrupt state on the system,
> > > > > WRT
> > > > > mount, fs, etc).
> > > > > 
> > > > > In turn, this seems to cause trouble on aws, since cloud-init 
> > > > > or
> > > > > something rename eth0 interface to ens5, without cleaning to
> > > > > the
> > > > > network configuration.
> > > > > 
> > > > > So the network init script fail (because the image say "start
> > > > > eth0"
> > > > > and
> > > > > that's not present), but fail in a weird way. Network is
> > > > > initialised
> > > > > and working (we can connect), but the dhclient process is not
> > > > > in
> > > > > the
> > > > > right cgroup, and network.service is

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-04-24 Thread Yaniv Kaul

On Tue, Apr 23, 2019 at 5:15 PM Michael Scherer  wrote:

> Le lundi 22 avril 2019 à 22:57 +0530, Atin Mukherjee a écrit :
> > Is this back again? The recent patches are failing regression :-\ .
>
> So, on builder206, it took me a while to find that the issue is that
> nfs (the service) was running.
>
> ./tests/basic/afr/tarissue.t failed, because the nfs initialisation
> failed with a rather cryptic message:
>
> [2019-04-23 13:17:05.371733] I [socket.c:991:__socket_server_bind] 0-
> socket.nfs-server: process started listening on port (38465)
> [2019-04-23 13:17:05.385819] E [socket.c:972:__socket_server_bind] 0-
> socket.nfs-server: binding to  failed: Address already in use
> [2019-04-23 13:17:05.385843] E [socket.c:974:__socket_server_bind] 0-
> socket.nfs-server: Port is already in use
> [2019-04-23 13:17:05.385852] E [socket.c:3788:socket_listen] 0-
> socket.nfs-server: __socket_server_bind failed;closing socket 14
>
> I found where this came from, but a few stuff did surprised me:
>
> - the order of print is different that the order in the code
>

Indeed strange...

> - the message on "started listening" didn't take in account the fact
> that bind failed on:
>

Shouldn't it bail out if it failed to bind?
Some missing 'goto out' around line 975/976?
Y.

>
>
>
> https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/socket/src/socket.c#L967
>
> The message about port 38465 also threw me off the track. The real
> issue is that the service nfs was already running, and I couldn't find
> anything listening on port 38465
>
> once I do service nfs stop, it no longer failed.
>
> So far, I do know why nfs.service was activated.
>
> But at least, 206 should be fixed, and we know a bit more on what would
> be causing some failure.
>
>
>
> > On Wed, 3 Apr 2019 at 19:26, Michael Scherer 
> > wrote:
> >
> > > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit :
> > > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan <
> > > > jthot...@redhat.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > is_nfs_export_available is just a wrapper around "showmount"
> > > > > command AFAIR.
> > > > > I saw following messages in console output.
> > > > >  mount.nfs: rpc.statd is not running but is required for remote
> > > > > locking.
> > > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local,
> > > > > or
> > > > > start
> > > > > statd.
> > > > > 05:06:55 mount.nfs: an incorrect mount option was specified
> > > > >
> > > > > For me it looks rpcbind may not be running on the machine.
> > > > > Usually rpcbind starts automatically on machines, don't know
> > > > > whether it
> > > > > can happen or not.
> > > > >
> > > >
> > > > That's precisely what the question is. Why suddenly we're seeing
> > > > this
> > > > happening too frequently. Today I saw atleast 4 to 5 such
> > > > failures
> > > > already.
> > > >
> > > > Deepshika - Can you please help in inspecting this?
> > >
> > > So we think (we are not sure) that the issue is a bit complex.
> > >
> > > What we were investigating was nightly run fail on aws. When the
> > > build
> > > crash, the builder is restarted, since that's the easiest way to
> > > clean
> > > everything (since even with a perfect test suite that would clean
> > > itself, we could always end in a corrupt state on the system, WRT
> > > mount, fs, etc).
> > >
> > > In turn, this seems to cause trouble on aws, since cloud-init or
> > > something rename eth0 interface to ens5, without cleaning to the
> > > network configuration.
> > >
> > > So the network init script fail (because the image say "start eth0"
> > > and
> > > that's not present), but fail in a weird way. Network is
> > > initialised
> > > and working (we can connect), but the dhclient process is not in
> > > the
> > > right cgroup, and network.service is in failed state. Restarting
> > > network didn't work. In turn, this mean that rpc-statd refuse to
> > > start
> > > (due to systemd dependencies), which seems to impact various NFS
> > > tests.
> > >
> > > We have also seen that on some builders, rpcbind pick some IP v6
> > > autoconfiguration, but we can't reproduce that, and there is no ip
> > > v6
> > > set up anywhere. I suspect the network.service failure is somehow
> > > involved, but fail to see how. In turn, rpcbind.socket not starting
> > > could cause NFS test troubles.
> > >
> > > Our current stop gap fix was to fix all the builders one by one.
> > > Remove
> > > the config, kill the rogue dhclient, restart network service.
> > >
> > > However, we can't be sure this is going to fix the problem long
> > > term
> > > since this only manifest after a crash of the test suite, and it
> > > doesn't happen so often. (plus, it was working before some day in
> > > the
> > > past, when something did make this fail, and I do not know if
> > > that's a
> > > system upgrade, or a test change, or both).
> > >
> > > So we are still looking at it to have a complete understanding of
> > > the
> > >

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-04-24 Thread Jiffin Thottan

Below looks like kernel nfs was started (may be enabled on the machine).

Did u start rpcbind manually on that machine, if yes can u please check kernel 
nfs status before and after that service?

--

Jiffin

- Original Message -
From: "Michael Scherer" 
To: "Atin Mukherjee" 
Cc: "Deepshikha Khandelwal" , "Gluster Devel" 
, "Jiffin Thottan" , 
"gluster-infra" 
Sent: Tuesday, April 23, 2019 7:44:49 PM
Subject: Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from 
nfs.rc failing too often?

Le lundi 22 avril 2019 à 22:57 +0530, Atin Mukherjee a écrit :
> Is this back again? The recent patches are failing regression :-\ .

So, on builder206, it took me a while to find that the issue is that
nfs (the service) was running.

./tests/basic/afr/tarissue.t failed, because the nfs initialisation
failed with a rather cryptic message:

[2019-04-23 13:17:05.371733] I [socket.c:991:__socket_server_bind] 0-
socket.nfs-server: process started listening on port (38465)
[2019-04-23 13:17:05.385819] E [socket.c:972:__socket_server_bind] 0-
socket.nfs-server: binding to  failed: Address already in use
[2019-04-23 13:17:05.385843] E [socket.c:974:__socket_server_bind] 0-
socket.nfs-server: Port is already in use
[2019-04-23 13:17:05.385852] E [socket.c:3788:socket_listen] 0-
socket.nfs-server: __socket_server_bind failed;closing socket 14

I found where this came from, but a few stuff did surprised me:

- the order of print is different that the order in the code
- the message on "started listening" didn't take in account the fact
that bind failed on:


https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/socket/src/socket.c#L967

The message about port 38465 also threw me off the track. The real
issue is that the service nfs was already running, and I couldn't find
anything listening on port 38465

once I do service nfs stop, it no longer failed.

So far, I do know why nfs.service was activated.

But at least, 206 should be fixed, and we know a bit more on what would
be causing some failure.

 

> On Wed, 3 Apr 2019 at 19:26, Michael Scherer 
> wrote:
> 
> > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit :
> > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan <
> > > jthot...@redhat.com>
> > > wrote:
> > > 
> > > > Hi,
> > > > 
> > > > is_nfs_export_available is just a wrapper around "showmount"
> > > > command AFAIR.
> > > > I saw following messages in console output.
> > > >  mount.nfs: rpc.statd is not running but is required for remote
> > > > locking.
> > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local,
> > > > or
> > > > start
> > > > statd.
> > > > 05:06:55 mount.nfs: an incorrect mount option was specified
> > > > 
> > > > For me it looks rpcbind may not be running on the machine.
> > > > Usually rpcbind starts automatically on machines, don't know
> > > > whether it
> > > > can happen or not.
> > > > 
> > > 
> > > That's precisely what the question is. Why suddenly we're seeing
> > > this
> > > happening too frequently. Today I saw atleast 4 to 5 such
> > > failures
> > > already.
> > > 
> > > Deepshika - Can you please help in inspecting this?
> > 
> > So we think (we are not sure) that the issue is a bit complex.
> > 
> > What we were investigating was nightly run fail on aws. When the
> > build
> > crash, the builder is restarted, since that's the easiest way to
> > clean
> > everything (since even with a perfect test suite that would clean
> > itself, we could always end in a corrupt state on the system, WRT
> > mount, fs, etc).
> > 
> > In turn, this seems to cause trouble on aws, since cloud-init or
> > something rename eth0 interface to ens5, without cleaning to the
> > network configuration.
> > 
> > So the network init script fail (because the image say "start eth0"
> > and
> > that's not present), but fail in a weird way. Network is
> > initialised
> > and working (we can connect), but the dhclient process is not in
> > the
> > right cgroup, and network.service is in failed state. Restarting
> > network didn't work. In turn, this mean that rpc-statd refuse to
> > start
> > (due to systemd dependencies), which seems to impact various NFS
> > tests.
> > 
> > We have also seen that on some builders, rpcbind pick some IP v6
> > autoconfiguration, but we can't reproduce that, and there is no ip
> > v6
> > set up anywhere. I suspect

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-04-23 Thread Michael Scherer

Le lundi 22 avril 2019 à 22:57 +0530, Atin Mukherjee a écrit :
> Is this back again? The recent patches are failing regression :-\ .

So, on builder206, it took me a while to find that the issue is that
nfs (the service) was running.

./tests/basic/afr/tarissue.t failed, because the nfs initialisation
failed with a rather cryptic message:

[2019-04-23 13:17:05.371733] I [socket.c:991:__socket_server_bind] 0-
socket.nfs-server: process started listening on port (38465)
[2019-04-23 13:17:05.385819] E [socket.c:972:__socket_server_bind] 0-
socket.nfs-server: binding to  failed: Address already in use
[2019-04-23 13:17:05.385843] E [socket.c:974:__socket_server_bind] 0-
socket.nfs-server: Port is already in use
[2019-04-23 13:17:05.385852] E [socket.c:3788:socket_listen] 0-
socket.nfs-server: __socket_server_bind failed;closing socket 14

I found where this came from, but a few stuff did surprised me:

- the order of print is different that the order in the code
- the message on "started listening" didn't take in account the fact
that bind failed on:


https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/socket/src/socket.c#L967

The message about port 38465 also threw me off the track. The real
issue is that the service nfs was already running, and I couldn't find
anything listening on port 38465

once I do service nfs stop, it no longer failed.

So far, I do know why nfs.service was activated.

But at least, 206 should be fixed, and we know a bit more on what would
be causing some failure.

 

> On Wed, 3 Apr 2019 at 19:26, Michael Scherer 
> wrote:
> 
> > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit :
> > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan <
> > > jthot...@redhat.com>
> > > wrote:
> > > 
> > > > Hi,
> > > > 
> > > > is_nfs_export_available is just a wrapper around "showmount"
> > > > command AFAIR.
> > > > I saw following messages in console output.
> > > >  mount.nfs: rpc.statd is not running but is required for remote
> > > > locking.
> > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local,
> > > > or
> > > > start
> > > > statd.
> > > > 05:06:55 mount.nfs: an incorrect mount option was specified
> > > > 
> > > > For me it looks rpcbind may not be running on the machine.
> > > > Usually rpcbind starts automatically on machines, don't know
> > > > whether it
> > > > can happen or not.
> > > > 
> > > 
> > > That's precisely what the question is. Why suddenly we're seeing
> > > this
> > > happening too frequently. Today I saw atleast 4 to 5 such
> > > failures
> > > already.
> > > 
> > > Deepshika - Can you please help in inspecting this?
> > 
> > So we think (we are not sure) that the issue is a bit complex.
> > 
> > What we were investigating was nightly run fail on aws. When the
> > build
> > crash, the builder is restarted, since that's the easiest way to
> > clean
> > everything (since even with a perfect test suite that would clean
> > itself, we could always end in a corrupt state on the system, WRT
> > mount, fs, etc).
> > 
> > In turn, this seems to cause trouble on aws, since cloud-init or
> > something rename eth0 interface to ens5, without cleaning to the
> > network configuration.
> > 
> > So the network init script fail (because the image say "start eth0"
> > and
> > that's not present), but fail in a weird way. Network is
> > initialised
> > and working (we can connect), but the dhclient process is not in
> > the
> > right cgroup, and network.service is in failed state. Restarting
> > network didn't work. In turn, this mean that rpc-statd refuse to
> > start
> > (due to systemd dependencies), which seems to impact various NFS
> > tests.
> > 
> > We have also seen that on some builders, rpcbind pick some IP v6
> > autoconfiguration, but we can't reproduce that, and there is no ip
> > v6
> > set up anywhere. I suspect the network.service failure is somehow
> > involved, but fail to see how. In turn, rpcbind.socket not starting
> > could cause NFS test troubles.
> > 
> > Our current stop gap fix was to fix all the builders one by one.
> > Remove
> > the config, kill the rogue dhclient, restart network service.
> > 
> > However, we can't be sure this is going to fix the problem long
> > term
> > since this only manifest after a crash of the test suite, and it
> > doesn't happen so often. (plus, it was working before some day in
> > the
> > past, when something did make this fail, and I do not know if
> > that's a
> > system upgrade, or a test change, or both).
> > 
> > So we are still looking at it to have a complete understanding of
> > the
> > issue, but so far, we hacked our way to make it work (or so do I
> > think).
> > 
> > Deepshika is working to fix it long term, by fixing the issue
> > regarding
> > eth0/ens5 with a new base image.
> > --
> > Michael Scherer
> > Sysadmin, Community Infrastructure and Platform, OSAS
> > 
> > 
> > --
> 
> - Atin (atinm)
-- 
Michael Scherer
Sysadmin, Community Infrastructure





signature.asc

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-04-22 Thread Atin Mukherjee

Is this back again? The recent patches are failing regression :-\ .

On Wed, 3 Apr 2019 at 19:26, Michael Scherer  wrote:

> Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit :
> > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan 
> > wrote:
> >
> > > Hi,
> > >
> > > is_nfs_export_available is just a wrapper around "showmount"
> > > command AFAIR.
> > > I saw following messages in console output.
> > >  mount.nfs: rpc.statd is not running but is required for remote
> > > locking.
> > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local, or
> > > start
> > > statd.
> > > 05:06:55 mount.nfs: an incorrect mount option was specified
> > >
> > > For me it looks rpcbind may not be running on the machine.
> > > Usually rpcbind starts automatically on machines, don't know
> > > whether it
> > > can happen or not.
> > >
> >
> > That's precisely what the question is. Why suddenly we're seeing this
> > happening too frequently. Today I saw atleast 4 to 5 such failures
> > already.
> >
> > Deepshika - Can you please help in inspecting this?
>
> So we think (we are not sure) that the issue is a bit complex.
>
> What we were investigating was nightly run fail on aws. When the build
> crash, the builder is restarted, since that's the easiest way to clean
> everything (since even with a perfect test suite that would clean
> itself, we could always end in a corrupt state on the system, WRT
> mount, fs, etc).
>
> In turn, this seems to cause trouble on aws, since cloud-init or
> something rename eth0 interface to ens5, without cleaning to the
> network configuration.
>
> So the network init script fail (because the image say "start eth0" and
> that's not present), but fail in a weird way. Network is initialised
> and working (we can connect), but the dhclient process is not in the
> right cgroup, and network.service is in failed state. Restarting
> network didn't work. In turn, this mean that rpc-statd refuse to start
> (due to systemd dependencies), which seems to impact various NFS tests.
>
> We have also seen that on some builders, rpcbind pick some IP v6
> autoconfiguration, but we can't reproduce that, and there is no ip v6
> set up anywhere. I suspect the network.service failure is somehow
> involved, but fail to see how. In turn, rpcbind.socket not starting
> could cause NFS test troubles.
>
> Our current stop gap fix was to fix all the builders one by one. Remove
> the config, kill the rogue dhclient, restart network service.
>
> However, we can't be sure this is going to fix the problem long term
> since this only manifest after a crash of the test suite, and it
> doesn't happen so often. (plus, it was working before some day in the
> past, when something did make this fail, and I do not know if that's a
> system upgrade, or a test change, or both).
>
> So we are still looking at it to have a complete understanding of the
> issue, but so far, we hacked our way to make it work (or so do I
> think).
>
> Deepshika is working to fix it long term, by fixing the issue regarding
> eth0/ens5 with a new base image.
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
>
>
> --
- Atin (atinm)
___
Gluster-infra mailing list
Gluster-infra@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-04-03 Thread Michael Scherer

Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit :
> On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan 
> wrote:
> 
> > Hi,
> > 
> > is_nfs_export_available is just a wrapper around "showmount"
> > command AFAIR.
> > I saw following messages in console output.
> >  mount.nfs: rpc.statd is not running but is required for remote
> > locking.
> > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local, or
> > start
> > statd.
> > 05:06:55 mount.nfs: an incorrect mount option was specified
> > 
> > For me it looks rpcbind may not be running on the machine.
> > Usually rpcbind starts automatically on machines, don't know
> > whether it
> > can happen or not.
> > 
> 
> That's precisely what the question is. Why suddenly we're seeing this
> happening too frequently. Today I saw atleast 4 to 5 such failures
> already.
> 
> Deepshika - Can you please help in inspecting this?

So we think (we are not sure) that the issue is a bit complex.

What we were investigating was nightly run fail on aws. When the build
crash, the builder is restarted, since that's the easiest way to clean
everything (since even with a perfect test suite that would clean
itself, we could always end in a corrupt state on the system, WRT
mount, fs, etc).

In turn, this seems to cause trouble on aws, since cloud-init or
something rename eth0 interface to ens5, without cleaning to the
network configuration. 

So the network init script fail (because the image say "start eth0" and
that's not present), but fail in a weird way. Network is initialised
and working (we can connect), but the dhclient process is not in the
right cgroup, and network.service is in failed state. Restarting
network didn't work. In turn, this mean that rpc-statd refuse to start
(due to systemd dependencies), which seems to impact various NFS tests.

We have also seen that on some builders, rpcbind pick some IP v6
autoconfiguration, but we can't reproduce that, and there is no ip v6
set up anywhere. I suspect the network.service failure is somehow
involved, but fail to see how. In turn, rpcbind.socket not starting
could cause NFS test troubles.

Our current stop gap fix was to fix all the builders one by one. Remove
the config, kill the rogue dhclient, restart network service. 

However, we can't be sure this is going to fix the problem long term
since this only manifest after a crash of the test suite, and it
doesn't happen so often. (plus, it was working before some day in the
past, when something did make this fail, and I do not know if that's a
system upgrade, or a test change, or both).

So we are still looking at it to have a complete understanding of the
issue, but so far, we hacked our way to make it work (or so do I
think).

Deepshika is working to fix it long term, by fixing the issue regarding
eth0/ens5 with a new base image.
-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS

signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-04-03 Thread Michael Scherer

Le mercredi 03 avril 2019 à 15:12 +0300, Yaniv Kaul a écrit :
> On Wed, Apr 3, 2019 at 2:53 PM Michael Scherer 
> wrote:
> 
> > Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit :
> > > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan <
> > > jthot...@redhat.com>
> > > wrote:
> > > 
> > > > Hi,
> > > > 
> > > > is_nfs_export_available is just a wrapper around "showmount"
> > > > command AFAIR.
> > > > I saw following messages in console output.
> > > >  mount.nfs: rpc.statd is not running but is required for remote
> > > > locking.
> > > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local,
> > > > or
> > > > start
> > > > statd.
> > > > 05:06:55 mount.nfs: an incorrect mount option was specified
> > > > 
> > > > For me it looks rpcbind may not be running on the machine.
> > > > Usually rpcbind starts automatically on machines, don't know
> > > > whether it
> > > > can happen or not.
> > > > 
> > > 
> > > That's precisely what the question is. Why suddenly we're seeing
> > > this
> > > happening too frequently. Today I saw atleast 4 to 5 such
> > > failures
> > > already.
> > > 
> > > Deepshika - Can you please help in inspecting this?
> > 
> > So in the past, this kind of stuff did happen with ipv6, so this
> > could
> > be a change on AWS and/or a upgrade.
> > 
> 
> We need to enable IPv6, for two reasons:
> 1. IPv6 is common these days, even if we don't test with it, it
> should be
> there.
> 2. We should test with IPv6...
> 
> I'm not sure, but I suspect we do disable IPv6 here and there.
> Example[1].
> Y.
> 
> [1]
> 
https://github.com/gluster/centosci/blob/master/jobs/scripts/glusto/setup-glusto.yml

We do disable ipv6 for sure, Nigel spent 3 days just on that for the
AWS migration, and we do have a dedicated playbook applied on all
builders that try to disable everything in every possible way:


https://github.com/gluster/gluster.org_ansible_configuration/blob/master/roles/jenkins_builder/tasks/disable_ipv6_linux.yml

According to the comment, that's from 2016, and I am sure this go
further in the past since it wasn't just documented before.


-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-04-03 Thread Yaniv Kaul

On Wed, Apr 3, 2019 at 2:53 PM Michael Scherer  wrote:

> Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit :
> > On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan 
> > wrote:
> >
> > > Hi,
> > >
> > > is_nfs_export_available is just a wrapper around "showmount"
> > > command AFAIR.
> > > I saw following messages in console output.
> > >  mount.nfs: rpc.statd is not running but is required for remote
> > > locking.
> > > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local, or
> > > start
> > > statd.
> > > 05:06:55 mount.nfs: an incorrect mount option was specified
> > >
> > > For me it looks rpcbind may not be running on the machine.
> > > Usually rpcbind starts automatically on machines, don't know
> > > whether it
> > > can happen or not.
> > >
> >
> > That's precisely what the question is. Why suddenly we're seeing this
> > happening too frequently. Today I saw atleast 4 to 5 such failures
> > already.
> >
> > Deepshika - Can you please help in inspecting this?
>
> So in the past, this kind of stuff did happen with ipv6, so this could
> be a change on AWS and/or a upgrade.
>

We need to enable IPv6, for two reasons:
1. IPv6 is common these days, even if we don't test with it, it should be
there.
2. We should test with IPv6...

I'm not sure, but I suspect we do disable IPv6 here and there. Example[1].
Y.

[1]
https://github.com/gluster/centosci/blob/master/jobs/scripts/glusto/setup-glusto.yml

>
> We are currently investigating a set of failure that happen after
> reboot (resulting in partial network bring up, causing all kind of
> weird issue), but it take some time to verify it, and since we lost 33%
> of the team with Nigel departure, stuff do not move as fast as before.
>
>
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
>
>
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-infra mailing list
Gluster-infra@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-04-03 Thread Michael Scherer

Le mercredi 03 avril 2019 à 16:30 +0530, Atin Mukherjee a écrit :
> On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan 
> wrote:
> 
> > Hi,
> > 
> > is_nfs_export_available is just a wrapper around "showmount"
> > command AFAIR.
> > I saw following messages in console output.
> >  mount.nfs: rpc.statd is not running but is required for remote
> > locking.
> > 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local, or
> > start
> > statd.
> > 05:06:55 mount.nfs: an incorrect mount option was specified
> > 
> > For me it looks rpcbind may not be running on the machine.
> > Usually rpcbind starts automatically on machines, don't know
> > whether it
> > can happen or not.
> > 
> 
> That's precisely what the question is. Why suddenly we're seeing this
> happening too frequently. Today I saw atleast 4 to 5 such failures
> already.
> 
> Deepshika - Can you please help in inspecting this?

So in the past, this kind of stuff did happen with ipv6, so this could
be a change on AWS and/or a upgrade. 

We are currently investigating a set of failure that happen after
reboot (resulting in partial network bring up, causing all kind of
weird issue), but it take some time to verify it, and since we lost 33%
of the team with Nigel departure, stuff do not move as fast as before.


-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-04-03 Thread Atin Mukherjee

On Wed, Apr 3, 2019 at 11:56 AM Jiffin Thottan  wrote:

> Hi,
>
> is_nfs_export_available is just a wrapper around "showmount" command AFAIR.
> I saw following messages in console output.
>  mount.nfs: rpc.statd is not running but is required for remote locking.
> 05:06:55 mount.nfs: Either use '-o nolock' to keep locks local, or start
> statd.
> 05:06:55 mount.nfs: an incorrect mount option was specified
>
> For me it looks rpcbind may not be running on the machine.
> Usually rpcbind starts automatically on machines, don't know whether it
> can happen or not.
>

That's precisely what the question is. Why suddenly we're seeing this
happening too frequently. Today I saw atleast 4 to 5 such failures already.

Deepshika - Can you please help in inspecting this?


> Regards,
> Jiffin
>
>
> - Original Message -
> From: "Atin Mukherjee" 
> To: "gluster-infra" , "Gluster Devel" <
> gluster-de...@gluster.org>
> Sent: Wednesday, April 3, 2019 10:46:51 AM
> Subject: [Gluster-devel] is_nfs_export_available from nfs.rc failing too
>   often?
>
> I'm observing the above test function failing too often because of which
> arbiter-mount.t test fails in many regression jobs. Such frequency of
> failures wasn't there earlier. Does anyone know what has changed recently
> to cause these failures in regression? I also hear when such failure
> happens a reboot is required, is that true and if so why?
>
> One of the reference :
> https://build.gluster.org/job/centos7-regression/5340/consoleFull
>
>
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-infra mailing list
Gluster-infra@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

2019-04-03 Thread Jiffin Thottan

Hi,

is_nfs_export_available is just a wrapper around "showmount" command AFAIR.
I saw following messages in console output.
 mount.nfs: rpc.statd is not running but is required for remote locking.
05:06:55 mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
05:06:55 mount.nfs: an incorrect mount option was specified

For me it looks rpcbind may not be running on the machine.
Usually rpcbind starts automatically on machines, don't know whether it can 
happen or not. 

Regards,
Jiffin


- Original Message -
From: "Atin Mukherjee" 
To: "gluster-infra" , "Gluster Devel" 

Sent: Wednesday, April 3, 2019 10:46:51 AM
Subject: [Gluster-devel] is_nfs_export_available from nfs.rc failing too
often?

I'm observing the above test function failing too often because of which 
arbiter-mount.t test fails in many regression jobs. Such frequency of failures 
wasn't there earlier. Does anyone know what has changed recently to cause these 
failures in regression? I also hear when such failure happens a reboot is 
required, is that true and if so why? 

One of the reference : 
https://build.gluster.org/job/centos7-regression/5340/consoleFull 


___
Gluster-devel mailing list
gluster-de...@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-infra mailing list
Gluster-infra@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

Re: [Gluster-infra] [Gluster-devel] is_nfs_export_available from nfs.rc failing too often?

18 matches

Site Navigation

Mail list logo

Footer information