Re: [ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.

2016-02-09 Thread Dejan Muhamedagic
Hi,

On Thu, Jan 28, 2016 at 04:42:55PM +0900, yuta takeshita wrote:
> Hi,
> Sorry for replying late.

No problem.

> 2016-01-15 21:19 GMT+09:00 Dejan Muhamedagic :
> 
> > Hi,
> >
> > On Fri, Jan 15, 2016 at 04:54:37PM +0900, yuta takeshita wrote:
> > > Hi,
> > >
> > > Tanks for responding and making a patch.
> > >
> > > 2016-01-14 19:16 GMT+09:00 Dejan Muhamedagic :
> > >
> > > > On Thu, Jan 14, 2016 at 11:04:09AM +0100, Dejan Muhamedagic wrote:
> > > > > Hi,
> > > > >
> > > > > On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote:
> > > > > > Hello.
> > > > > >
> > > > > > I have been a problem with nfsserver RA on RHEL 7.1 and systemd.
> > > > > > When the nfsd process is lost with unexpectly failure,
> > > > nfsserver_monitor()
> > > > > > doesn't detect it and doesn't execute failover.
> > > > > >
> > > > > > I use the below RA.(but this problem may be caused with latest
> > > > nfsserver RA
> > > > > > as well)
> > > > > >
> > > >
> > https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver
> > > > > >
> > > > > > The cause is following.
> > > > > >
> > > > > > 1. After execute "pkill -9 nfsd", "systemctl status
> > nfs-server.service"
> > > > > > returns 0.
> > > > >
> > > > > I think that it should be systemctl is-active. Already had a
> > > > > problem with systemctl status, well, not being what one would
> > > > > assume status would be. Can you please test that and then open
> > > > > either a pull request or issue at
> > > > > https://github.com/ClusterLabs/resource-agents
> > > >
> > > > I already made a pull request:
> > > >
> > > > https://github.com/ClusterLabs/resource-agents/pull/741
> > > >
> > > > Please test if you find time.
> > > >
> > > I tested the code, but still problems remain.
> > > systemctl is-active retrun active and the return code is 0 as well as
> > > systemctl status.
> > > Perhaps it is inappropriate to use systemctl for monitoring the kernel
> > > process.
> >
> > OK. My patch was too naive and didn't take into account the
> > systemd/kernel intricacies.
> >
> > > Mr Kay Sievers who is a developer of systemd said that systemd doesn't
> > > monitor kernel process in the following.
> > > http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/34367
> >
> > Thanks for the reference. One interesting thing could also be
> > reading /proc/fs/nfsd/threads instead of checking the process
> > existence. Furthermore, we could do some RPC based monitor, but
> > that would be, I guess, better suited for another monitor depth.
> >
> > OK. I survey and test the /proc/fs/nfsd/threads.
> It seems to work well on my cluster.
> I make a patch and a pull request.
> https://github.com/ClusterLabs/resource-agents/pull/746
> 
> Please check if you have time.

Some return codes of nfsserver_systemd_monitor() follow OCF and one
apparently LSB:

301 nfs_exec is-active
302 rc=$?
...
311 if [ $threads_num -gt 0 ]; then
312 return $OCF_SUCCESS
313 else
314 return 3
315 fi
316 else
317 return $OCF_ERR_GENERIC
...
321 return $rc

Given that nfs_exec() returns LSB codes, it should probably be
something like this:

311 if [ $threads_num -gt 0 ]; then
312 return 0
313 else
314 return 3
315 fi
316 else
317 return 1
...
321 return $rc

It won't make any actual difference, but the intent would be
cleaner (i.e. it's just by accident that the OCF codes are the
same in this case).

Cheers,

Dejan

> Regards,
> Yuta
> 
> > Cheers,
> >
> > Dejan
> >
> > > I reply to your pull request.
> > >
> > > Regards,
> > > Yuta Takeshita
> > >
> > > >
> > > > Thanks for reporting!
> > > >
> > > > Dejan
> > > >
> > > > > Thanks,
> > > > >
> > > > > Dejan
> > > > >
> > > > > > 2. nfsserver_monitor() judge with the return value of "systemctl
> > status
> > > > > > nfs-server.service".
> > > > > >
> > > > > >
> > --
> > > > > > # ps ax | grep nfsd
> > > > > > 25193 ?S< 0:00 [nfsd4]
> > > > > > 25194 ?S< 0:00 [nfsd4_callbacks]
> > > > > > 25197 ?S  0:00 [nfsd]
> > > > > > 25198 ?S  0:00 [nfsd]
> > > > > > 25199 ?S  0:00 [nfsd]
> > > > > > 25200 ?S  0:00 [nfsd]
> > > > > > 25201 ?S  0:00 [nfsd]
> > > > > > 25202 ?S  0:00 [nfsd]
> > > > > > 25203 ?S  0:00 [nfsd]
> > > > > > 25204 ?S  0:00 [nfsd]
> > > > > > 25238 pts/0S+ 0:00 grep --color=auto nfsd
> > > > > > #
> > > > > > # pkill -9 nfsd
> > > > > > #
> > > > > > # systemctl status nfs-server.service
> > > > > > ● nfs-server.service - NFS server and services
> > > > > >Loaded: loaded (/etc/systemd/system/nfs-server.service;
> > disabled;
> > > > vendor
> > > > > > preset: disabled)
> > > > > >Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1mi

Re: [ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.

2016-01-27 Thread yuta takeshita
Hi,
Sorry for replying late.

2016-01-15 21:19 GMT+09:00 Dejan Muhamedagic :

> Hi,
>
> On Fri, Jan 15, 2016 at 04:54:37PM +0900, yuta takeshita wrote:
> > Hi,
> >
> > Tanks for responding and making a patch.
> >
> > 2016-01-14 19:16 GMT+09:00 Dejan Muhamedagic :
> >
> > > On Thu, Jan 14, 2016 at 11:04:09AM +0100, Dejan Muhamedagic wrote:
> > > > Hi,
> > > >
> > > > On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote:
> > > > > Hello.
> > > > >
> > > > > I have been a problem with nfsserver RA on RHEL 7.1 and systemd.
> > > > > When the nfsd process is lost with unexpectly failure,
> > > nfsserver_monitor()
> > > > > doesn't detect it and doesn't execute failover.
> > > > >
> > > > > I use the below RA.(but this problem may be caused with latest
> > > nfsserver RA
> > > > > as well)
> > > > >
> > >
> https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver
> > > > >
> > > > > The cause is following.
> > > > >
> > > > > 1. After execute "pkill -9 nfsd", "systemctl status
> nfs-server.service"
> > > > > returns 0.
> > > >
> > > > I think that it should be systemctl is-active. Already had a
> > > > problem with systemctl status, well, not being what one would
> > > > assume status would be. Can you please test that and then open
> > > > either a pull request or issue at
> > > > https://github.com/ClusterLabs/resource-agents
> > >
> > > I already made a pull request:
> > >
> > > https://github.com/ClusterLabs/resource-agents/pull/741
> > >
> > > Please test if you find time.
> > >
> > I tested the code, but still problems remain.
> > systemctl is-active retrun active and the return code is 0 as well as
> > systemctl status.
> > Perhaps it is inappropriate to use systemctl for monitoring the kernel
> > process.
>
> OK. My patch was too naive and didn't take into account the
> systemd/kernel intricacies.
>
> > Mr Kay Sievers who is a developer of systemd said that systemd doesn't
> > monitor kernel process in the following.
> > http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/34367
>
> Thanks for the reference. One interesting thing could also be
> reading /proc/fs/nfsd/threads instead of checking the process
> existence. Furthermore, we could do some RPC based monitor, but
> that would be, I guess, better suited for another monitor depth.
>
> OK. I survey and test the /proc/fs/nfsd/threads.
It seems to work well on my cluster.
I make a patch and a pull request.
https://github.com/ClusterLabs/resource-agents/pull/746

Please check if you have time.

Regards,
Yuta

> Cheers,
>
> Dejan
>
> > I reply to your pull request.
> >
> > Regards,
> > Yuta Takeshita
> >
> > >
> > > Thanks for reporting!
> > >
> > > Dejan
> > >
> > > > Thanks,
> > > >
> > > > Dejan
> > > >
> > > > > 2. nfsserver_monitor() judge with the return value of "systemctl
> status
> > > > > nfs-server.service".
> > > > >
> > > > >
> --
> > > > > # ps ax | grep nfsd
> > > > > 25193 ?S< 0:00 [nfsd4]
> > > > > 25194 ?S< 0:00 [nfsd4_callbacks]
> > > > > 25197 ?S  0:00 [nfsd]
> > > > > 25198 ?S  0:00 [nfsd]
> > > > > 25199 ?S  0:00 [nfsd]
> > > > > 25200 ?S  0:00 [nfsd]
> > > > > 25201 ?S  0:00 [nfsd]
> > > > > 25202 ?S  0:00 [nfsd]
> > > > > 25203 ?S  0:00 [nfsd]
> > > > > 25204 ?S  0:00 [nfsd]
> > > > > 25238 pts/0S+ 0:00 grep --color=auto nfsd
> > > > > #
> > > > > # pkill -9 nfsd
> > > > > #
> > > > > # systemctl status nfs-server.service
> > > > > ● nfs-server.service - NFS server and services
> > > > >Loaded: loaded (/etc/systemd/system/nfs-server.service;
> disabled;
> > > vendor
> > > > > preset: disabled)
> > > > >Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min
> 3s ago
> > > > >   Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
> > > (code=exited,
> > > > > status=0/SUCCESS)
> > > > >   Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited,
> > > > > status=0/SUCCESS)
> > > > >  Main PID: 25184 (code=exited, status=0/SUCCESS)
> > > > >CGroup: /system.slice/nfs-server.service
> > > > > (snip)
> > > > > #
> > > > > # echo $?
> > > > > 0
> > > > > #
> > > > > # ps ax | grep nfsd
> > > > > 25256 pts/0S+ 0:00 grep --color=auto nfsd
> > > > >
> --
> > > > >
> > > > > It is because the nfsd process is kernel process, and systemd does
> not
> > > > > monitor the state of the kernel process of running.
> > > > >
> > > > > Is there something good way?
> > > > > (When I use "pidof" instead of "systemctl status", the faileover is
> > > > > successful.)
> > > > >
> > > > > Regards,
> > > > > Yuta Takeshita
> > > >
> > > > > ___
> > > > > Users mailing list: Users@clusterlabs.org
> > > > > http://clusterlabs.org/mailman/listinfo/users
> > > > >
> > > > > Project 

Re: [ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.

2016-01-15 Thread Dejan Muhamedagic
Hi,

On Fri, Jan 15, 2016 at 04:54:37PM +0900, yuta takeshita wrote:
> Hi,
> 
> Tanks for responding and making a patch.
> 
> 2016-01-14 19:16 GMT+09:00 Dejan Muhamedagic :
> 
> > On Thu, Jan 14, 2016 at 11:04:09AM +0100, Dejan Muhamedagic wrote:
> > > Hi,
> > >
> > > On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote:
> > > > Hello.
> > > >
> > > > I have been a problem with nfsserver RA on RHEL 7.1 and systemd.
> > > > When the nfsd process is lost with unexpectly failure,
> > nfsserver_monitor()
> > > > doesn't detect it and doesn't execute failover.
> > > >
> > > > I use the below RA.(but this problem may be caused with latest
> > nfsserver RA
> > > > as well)
> > > >
> > https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver
> > > >
> > > > The cause is following.
> > > >
> > > > 1. After execute "pkill -9 nfsd", "systemctl status nfs-server.service"
> > > > returns 0.
> > >
> > > I think that it should be systemctl is-active. Already had a
> > > problem with systemctl status, well, not being what one would
> > > assume status would be. Can you please test that and then open
> > > either a pull request or issue at
> > > https://github.com/ClusterLabs/resource-agents
> >
> > I already made a pull request:
> >
> > https://github.com/ClusterLabs/resource-agents/pull/741
> >
> > Please test if you find time.
> >
> I tested the code, but still problems remain.
> systemctl is-active retrun active and the return code is 0 as well as
> systemctl status.
> Perhaps it is inappropriate to use systemctl for monitoring the kernel
> process.

OK. My patch was too naive and didn't take into account the
systemd/kernel intricacies.

> Mr Kay Sievers who is a developer of systemd said that systemd doesn't
> monitor kernel process in the following.
> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/34367

Thanks for the reference. One interesting thing could also be
reading /proc/fs/nfsd/threads instead of checking the process
existence. Furthermore, we could do some RPC based monitor, but
that would be, I guess, better suited for another monitor depth.

Cheers,

Dejan

> I reply to your pull request.
> 
> Regards,
> Yuta Takeshita
> 
> >
> > Thanks for reporting!
> >
> > Dejan
> >
> > > Thanks,
> > >
> > > Dejan
> > >
> > > > 2. nfsserver_monitor() judge with the return value of "systemctl status
> > > > nfs-server.service".
> > > >
> > > > --
> > > > # ps ax | grep nfsd
> > > > 25193 ?S< 0:00 [nfsd4]
> > > > 25194 ?S< 0:00 [nfsd4_callbacks]
> > > > 25197 ?S  0:00 [nfsd]
> > > > 25198 ?S  0:00 [nfsd]
> > > > 25199 ?S  0:00 [nfsd]
> > > > 25200 ?S  0:00 [nfsd]
> > > > 25201 ?S  0:00 [nfsd]
> > > > 25202 ?S  0:00 [nfsd]
> > > > 25203 ?S  0:00 [nfsd]
> > > > 25204 ?S  0:00 [nfsd]
> > > > 25238 pts/0S+ 0:00 grep --color=auto nfsd
> > > > #
> > > > # pkill -9 nfsd
> > > > #
> > > > # systemctl status nfs-server.service
> > > > ● nfs-server.service - NFS server and services
> > > >Loaded: loaded (/etc/systemd/system/nfs-server.service; disabled;
> > vendor
> > > > preset: disabled)
> > > >Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min 3s ago
> > > >   Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
> > (code=exited,
> > > > status=0/SUCCESS)
> > > >   Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited,
> > > > status=0/SUCCESS)
> > > >  Main PID: 25184 (code=exited, status=0/SUCCESS)
> > > >CGroup: /system.slice/nfs-server.service
> > > > (snip)
> > > > #
> > > > # echo $?
> > > > 0
> > > > #
> > > > # ps ax | grep nfsd
> > > > 25256 pts/0S+ 0:00 grep --color=auto nfsd
> > > > --
> > > >
> > > > It is because the nfsd process is kernel process, and systemd does not
> > > > monitor the state of the kernel process of running.
> > > >
> > > > Is there something good way?
> > > > (When I use "pidof" instead of "systemctl status", the faileover is
> > > > successful.)
> > > >
> > > > Regards,
> > > > Yuta Takeshita
> > >
> > > > ___
> > > > Users mailing list: Users@clusterlabs.org
> > > > http://clusterlabs.org/mailman/listinfo/users
> > > >
> > > > Project Home: http://www.clusterlabs.org
> > > > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > Bugs: http://bugs.clusterlabs.org
> > >
> > >
> > > ___
> > > Users mailing list: Users@clusterlabs.org
> > > http://clusterlabs.org/mailman/listinfo/users
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> > ___
> > Users mailing list: U

Re: [ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.

2016-01-14 Thread yuta takeshita
Hi,

Tanks for responding and making a patch.

2016-01-14 19:16 GMT+09:00 Dejan Muhamedagic :

> On Thu, Jan 14, 2016 at 11:04:09AM +0100, Dejan Muhamedagic wrote:
> > Hi,
> >
> > On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote:
> > > Hello.
> > >
> > > I have been a problem with nfsserver RA on RHEL 7.1 and systemd.
> > > When the nfsd process is lost with unexpectly failure,
> nfsserver_monitor()
> > > doesn't detect it and doesn't execute failover.
> > >
> > > I use the below RA.(but this problem may be caused with latest
> nfsserver RA
> > > as well)
> > >
> https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver
> > >
> > > The cause is following.
> > >
> > > 1. After execute "pkill -9 nfsd", "systemctl status nfs-server.service"
> > > returns 0.
> >
> > I think that it should be systemctl is-active. Already had a
> > problem with systemctl status, well, not being what one would
> > assume status would be. Can you please test that and then open
> > either a pull request or issue at
> > https://github.com/ClusterLabs/resource-agents
>
> I already made a pull request:
>
> https://github.com/ClusterLabs/resource-agents/pull/741
>
> Please test if you find time.
>
I tested the code, but still problems remain.
systemctl is-active retrun active and the return code is 0 as well as
systemctl status.
Perhaps it is inappropriate to use systemctl for monitoring the kernel
process.
Mr Kay Sievers who is a developer of systemd said that systemd doesn't
monitor kernel process in the following.
http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/34367

I reply to your pull request.

Regards,
Yuta Takeshita

>
> Thanks for reporting!
>
> Dejan
>
> > Thanks,
> >
> > Dejan
> >
> > > 2. nfsserver_monitor() judge with the return value of "systemctl status
> > > nfs-server.service".
> > >
> > > --
> > > # ps ax | grep nfsd
> > > 25193 ?S< 0:00 [nfsd4]
> > > 25194 ?S< 0:00 [nfsd4_callbacks]
> > > 25197 ?S  0:00 [nfsd]
> > > 25198 ?S  0:00 [nfsd]
> > > 25199 ?S  0:00 [nfsd]
> > > 25200 ?S  0:00 [nfsd]
> > > 25201 ?S  0:00 [nfsd]
> > > 25202 ?S  0:00 [nfsd]
> > > 25203 ?S  0:00 [nfsd]
> > > 25204 ?S  0:00 [nfsd]
> > > 25238 pts/0S+ 0:00 grep --color=auto nfsd
> > > #
> > > # pkill -9 nfsd
> > > #
> > > # systemctl status nfs-server.service
> > > ● nfs-server.service - NFS server and services
> > >Loaded: loaded (/etc/systemd/system/nfs-server.service; disabled;
> vendor
> > > preset: disabled)
> > >Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min 3s ago
> > >   Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
> (code=exited,
> > > status=0/SUCCESS)
> > >   Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited,
> > > status=0/SUCCESS)
> > >  Main PID: 25184 (code=exited, status=0/SUCCESS)
> > >CGroup: /system.slice/nfs-server.service
> > > (snip)
> > > #
> > > # echo $?
> > > 0
> > > #
> > > # ps ax | grep nfsd
> > > 25256 pts/0S+ 0:00 grep --color=auto nfsd
> > > --
> > >
> > > It is because the nfsd process is kernel process, and systemd does not
> > > monitor the state of the kernel process of running.
> > >
> > > Is there something good way?
> > > (When I use "pidof" instead of "systemctl status", the faileover is
> > > successful.)
> > >
> > > Regards,
> > > Yuta Takeshita
> >
> > > ___
> > > Users mailing list: Users@clusterlabs.org
> > > http://clusterlabs.org/mailman/listinfo/users
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.

2016-01-14 Thread Dejan Muhamedagic
On Thu, Jan 14, 2016 at 11:04:09AM +0100, Dejan Muhamedagic wrote:
> Hi,
> 
> On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote:
> > Hello.
> > 
> > I have been a problem with nfsserver RA on RHEL 7.1 and systemd.
> > When the nfsd process is lost with unexpectly failure, nfsserver_monitor()
> > doesn't detect it and doesn't execute failover.
> > 
> > I use the below RA.(but this problem may be caused with latest nfsserver RA
> > as well)
> > https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver
> > 
> > The cause is following.
> > 
> > 1. After execute "pkill -9 nfsd", "systemctl status nfs-server.service"
> > returns 0.
> 
> I think that it should be systemctl is-active. Already had a
> problem with systemctl status, well, not being what one would
> assume status would be. Can you please test that and then open
> either a pull request or issue at
> https://github.com/ClusterLabs/resource-agents

I already made a pull request:

https://github.com/ClusterLabs/resource-agents/pull/741

Please test if you find time.

Thanks for reporting!

Dejan

> Thanks,
> 
> Dejan
> 
> > 2. nfsserver_monitor() judge with the return value of "systemctl status
> > nfs-server.service".
> > 
> > --
> > # ps ax | grep nfsd
> > 25193 ?S< 0:00 [nfsd4]
> > 25194 ?S< 0:00 [nfsd4_callbacks]
> > 25197 ?S  0:00 [nfsd]
> > 25198 ?S  0:00 [nfsd]
> > 25199 ?S  0:00 [nfsd]
> > 25200 ?S  0:00 [nfsd]
> > 25201 ?S  0:00 [nfsd]
> > 25202 ?S  0:00 [nfsd]
> > 25203 ?S  0:00 [nfsd]
> > 25204 ?S  0:00 [nfsd]
> > 25238 pts/0S+ 0:00 grep --color=auto nfsd
> > #
> > # pkill -9 nfsd
> > #
> > # systemctl status nfs-server.service
> > ● nfs-server.service - NFS server and services
> >Loaded: loaded (/etc/systemd/system/nfs-server.service; disabled; vendor
> > preset: disabled)
> >Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min 3s ago
> >   Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited,
> > status=0/SUCCESS)
> >   Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited,
> > status=0/SUCCESS)
> >  Main PID: 25184 (code=exited, status=0/SUCCESS)
> >CGroup: /system.slice/nfs-server.service
> > (snip)
> > #
> > # echo $?
> > 0
> > #
> > # ps ax | grep nfsd
> > 25256 pts/0S+ 0:00 grep --color=auto nfsd
> > --
> > 
> > It is because the nfsd process is kernel process, and systemd does not
> > monitor the state of the kernel process of running.
> > 
> > Is there something good way?
> > (When I use "pidof" instead of "systemctl status", the faileover is
> > successful.)
> > 
> > Regards,
> > Yuta Takeshita
> 
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.

2016-01-14 Thread Dejan Muhamedagic
Hi,

On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote:
> Hello.
> 
> I have been a problem with nfsserver RA on RHEL 7.1 and systemd.
> When the nfsd process is lost with unexpectly failure, nfsserver_monitor()
> doesn't detect it and doesn't execute failover.
> 
> I use the below RA.(but this problem may be caused with latest nfsserver RA
> as well)
> https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver
> 
> The cause is following.
> 
> 1. After execute "pkill -9 nfsd", "systemctl status nfs-server.service"
> returns 0.

I think that it should be systemctl is-active. Already had a
problem with systemctl status, well, not being what one would
assume status would be. Can you please test that and then open
either a pull request or issue at
https://github.com/ClusterLabs/resource-agents

Thanks,

Dejan

> 2. nfsserver_monitor() judge with the return value of "systemctl status
> nfs-server.service".
> 
> --
> # ps ax | grep nfsd
> 25193 ?S< 0:00 [nfsd4]
> 25194 ?S< 0:00 [nfsd4_callbacks]
> 25197 ?S  0:00 [nfsd]
> 25198 ?S  0:00 [nfsd]
> 25199 ?S  0:00 [nfsd]
> 25200 ?S  0:00 [nfsd]
> 25201 ?S  0:00 [nfsd]
> 25202 ?S  0:00 [nfsd]
> 25203 ?S  0:00 [nfsd]
> 25204 ?S  0:00 [nfsd]
> 25238 pts/0S+ 0:00 grep --color=auto nfsd
> #
> # pkill -9 nfsd
> #
> # systemctl status nfs-server.service
> ● nfs-server.service - NFS server and services
>Loaded: loaded (/etc/systemd/system/nfs-server.service; disabled; vendor
> preset: disabled)
>Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min 3s ago
>   Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited,
> status=0/SUCCESS)
>   Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited,
> status=0/SUCCESS)
>  Main PID: 25184 (code=exited, status=0/SUCCESS)
>CGroup: /system.slice/nfs-server.service
> (snip)
> #
> # echo $?
> 0
> #
> # ps ax | grep nfsd
> 25256 pts/0S+ 0:00 grep --color=auto nfsd
> --
> 
> It is because the nfsd process is kernel process, and systemd does not
> monitor the state of the kernel process of running.
> 
> Is there something good way?
> (When I use "pidof" instead of "systemctl status", the faileover is
> successful.)
> 
> Regards,
> Yuta Takeshita

> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org