Re: [ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.
Hi, On Thu, Jan 28, 2016 at 04:42:55PM +0900, yuta takeshita wrote: > Hi, > Sorry for replying late. No problem. > 2016-01-15 21:19 GMT+09:00 Dejan Muhamedagic : > > > Hi, > > > > On Fri, Jan 15, 2016 at 04:54:37PM +0900, yuta takeshita wrote: > > > Hi, > > > > > > Tanks for responding and making a patch. > > > > > > 2016-01-14 19:16 GMT+09:00 Dejan Muhamedagic : > > > > > > > On Thu, Jan 14, 2016 at 11:04:09AM +0100, Dejan Muhamedagic wrote: > > > > > Hi, > > > > > > > > > > On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote: > > > > > > Hello. > > > > > > > > > > > > I have been a problem with nfsserver RA on RHEL 7.1 and systemd. > > > > > > When the nfsd process is lost with unexpectly failure, > > > > nfsserver_monitor() > > > > > > doesn't detect it and doesn't execute failover. > > > > > > > > > > > > I use the below RA.(but this problem may be caused with latest > > > > nfsserver RA > > > > > > as well) > > > > > > > > > > > > https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver > > > > > > > > > > > > The cause is following. > > > > > > > > > > > > 1. After execute "pkill -9 nfsd", "systemctl status > > nfs-server.service" > > > > > > returns 0. > > > > > > > > > > I think that it should be systemctl is-active. Already had a > > > > > problem with systemctl status, well, not being what one would > > > > > assume status would be. Can you please test that and then open > > > > > either a pull request or issue at > > > > > https://github.com/ClusterLabs/resource-agents > > > > > > > > I already made a pull request: > > > > > > > > https://github.com/ClusterLabs/resource-agents/pull/741 > > > > > > > > Please test if you find time. > > > > > > > I tested the code, but still problems remain. > > > systemctl is-active retrun active and the return code is 0 as well as > > > systemctl status. > > > Perhaps it is inappropriate to use systemctl for monitoring the kernel > > > process. > > > > OK. My patch was too naive and didn't take into account the > > systemd/kernel intricacies. > > > > > Mr Kay Sievers who is a developer of systemd said that systemd doesn't > > > monitor kernel process in the following. > > > http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/34367 > > > > Thanks for the reference. One interesting thing could also be > > reading /proc/fs/nfsd/threads instead of checking the process > > existence. Furthermore, we could do some RPC based monitor, but > > that would be, I guess, better suited for another monitor depth. > > > > OK. I survey and test the /proc/fs/nfsd/threads. > It seems to work well on my cluster. > I make a patch and a pull request. > https://github.com/ClusterLabs/resource-agents/pull/746 > > Please check if you have time. Some return codes of nfsserver_systemd_monitor() follow OCF and one apparently LSB: 301 nfs_exec is-active 302 rc=$? ... 311 if [ $threads_num -gt 0 ]; then 312 return $OCF_SUCCESS 313 else 314 return 3 315 fi 316 else 317 return $OCF_ERR_GENERIC ... 321 return $rc Given that nfs_exec() returns LSB codes, it should probably be something like this: 311 if [ $threads_num -gt 0 ]; then 312 return 0 313 else 314 return 3 315 fi 316 else 317 return 1 ... 321 return $rc It won't make any actual difference, but the intent would be cleaner (i.e. it's just by accident that the OCF codes are the same in this case). Cheers, Dejan > Regards, > Yuta > > > Cheers, > > > > Dejan > > > > > I reply to your pull request. > > > > > > Regards, > > > Yuta Takeshita > > > > > > > > > > > Thanks for reporting! > > > > > > > > Dejan > > > > > > > > > Thanks, > > > > > > > > > > Dejan > > > > > > > > > > > 2. nfsserver_monitor() judge with the return value of "systemctl > > status > > > > > > nfs-server.service". > > > > > > > > > > > > > > -- > > > > > > # ps ax | grep nfsd > > > > > > 25193 ?S< 0:00 [nfsd4] > > > > > > 25194 ?S< 0:00 [nfsd4_callbacks] > > > > > > 25197 ?S 0:00 [nfsd] > > > > > > 25198 ?S 0:00 [nfsd] > > > > > > 25199 ?S 0:00 [nfsd] > > > > > > 25200 ?S 0:00 [nfsd] > > > > > > 25201 ?S 0:00 [nfsd] > > > > > > 25202 ?S 0:00 [nfsd] > > > > > > 25203 ?S 0:00 [nfsd] > > > > > > 25204 ?S 0:00 [nfsd] > > > > > > 25238 pts/0S+ 0:00 grep --color=auto nfsd > > > > > > # > > > > > > # pkill -9 nfsd > > > > > > # > > > > > > # systemctl status nfs-server.service > > > > > > ● nfs-server.service - NFS server and services > > > > > >Loaded: loaded (/etc/systemd/system/nfs-server.service; > > disabled; > > > > vendor > > > > > > preset: disabled) > > > > > >Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1mi
Re: [ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.
Hi, Sorry for replying late. 2016-01-15 21:19 GMT+09:00 Dejan Muhamedagic : > Hi, > > On Fri, Jan 15, 2016 at 04:54:37PM +0900, yuta takeshita wrote: > > Hi, > > > > Tanks for responding and making a patch. > > > > 2016-01-14 19:16 GMT+09:00 Dejan Muhamedagic : > > > > > On Thu, Jan 14, 2016 at 11:04:09AM +0100, Dejan Muhamedagic wrote: > > > > Hi, > > > > > > > > On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote: > > > > > Hello. > > > > > > > > > > I have been a problem with nfsserver RA on RHEL 7.1 and systemd. > > > > > When the nfsd process is lost with unexpectly failure, > > > nfsserver_monitor() > > > > > doesn't detect it and doesn't execute failover. > > > > > > > > > > I use the below RA.(but this problem may be caused with latest > > > nfsserver RA > > > > > as well) > > > > > > > > > https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver > > > > > > > > > > The cause is following. > > > > > > > > > > 1. After execute "pkill -9 nfsd", "systemctl status > nfs-server.service" > > > > > returns 0. > > > > > > > > I think that it should be systemctl is-active. Already had a > > > > problem with systemctl status, well, not being what one would > > > > assume status would be. Can you please test that and then open > > > > either a pull request or issue at > > > > https://github.com/ClusterLabs/resource-agents > > > > > > I already made a pull request: > > > > > > https://github.com/ClusterLabs/resource-agents/pull/741 > > > > > > Please test if you find time. > > > > > I tested the code, but still problems remain. > > systemctl is-active retrun active and the return code is 0 as well as > > systemctl status. > > Perhaps it is inappropriate to use systemctl for monitoring the kernel > > process. > > OK. My patch was too naive and didn't take into account the > systemd/kernel intricacies. > > > Mr Kay Sievers who is a developer of systemd said that systemd doesn't > > monitor kernel process in the following. > > http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/34367 > > Thanks for the reference. One interesting thing could also be > reading /proc/fs/nfsd/threads instead of checking the process > existence. Furthermore, we could do some RPC based monitor, but > that would be, I guess, better suited for another monitor depth. > > OK. I survey and test the /proc/fs/nfsd/threads. It seems to work well on my cluster. I make a patch and a pull request. https://github.com/ClusterLabs/resource-agents/pull/746 Please check if you have time. Regards, Yuta > Cheers, > > Dejan > > > I reply to your pull request. > > > > Regards, > > Yuta Takeshita > > > > > > > > Thanks for reporting! > > > > > > Dejan > > > > > > > Thanks, > > > > > > > > Dejan > > > > > > > > > 2. nfsserver_monitor() judge with the return value of "systemctl > status > > > > > nfs-server.service". > > > > > > > > > > > -- > > > > > # ps ax | grep nfsd > > > > > 25193 ?S< 0:00 [nfsd4] > > > > > 25194 ?S< 0:00 [nfsd4_callbacks] > > > > > 25197 ?S 0:00 [nfsd] > > > > > 25198 ?S 0:00 [nfsd] > > > > > 25199 ?S 0:00 [nfsd] > > > > > 25200 ?S 0:00 [nfsd] > > > > > 25201 ?S 0:00 [nfsd] > > > > > 25202 ?S 0:00 [nfsd] > > > > > 25203 ?S 0:00 [nfsd] > > > > > 25204 ?S 0:00 [nfsd] > > > > > 25238 pts/0S+ 0:00 grep --color=auto nfsd > > > > > # > > > > > # pkill -9 nfsd > > > > > # > > > > > # systemctl status nfs-server.service > > > > > ● nfs-server.service - NFS server and services > > > > >Loaded: loaded (/etc/systemd/system/nfs-server.service; > disabled; > > > vendor > > > > > preset: disabled) > > > > >Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min > 3s ago > > > > > Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS > > > (code=exited, > > > > > status=0/SUCCESS) > > > > > Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited, > > > > > status=0/SUCCESS) > > > > > Main PID: 25184 (code=exited, status=0/SUCCESS) > > > > >CGroup: /system.slice/nfs-server.service > > > > > (snip) > > > > > # > > > > > # echo $? > > > > > 0 > > > > > # > > > > > # ps ax | grep nfsd > > > > > 25256 pts/0S+ 0:00 grep --color=auto nfsd > > > > > > -- > > > > > > > > > > It is because the nfsd process is kernel process, and systemd does > not > > > > > monitor the state of the kernel process of running. > > > > > > > > > > Is there something good way? > > > > > (When I use "pidof" instead of "systemctl status", the faileover is > > > > > successful.) > > > > > > > > > > Regards, > > > > > Yuta Takeshita > > > > > > > > > ___ > > > > > Users mailing list: Users@clusterlabs.org > > > > > http://clusterlabs.org/mailman/listinfo/users > > > > > > > > > > Project
Re: [ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.
Hi, On Fri, Jan 15, 2016 at 04:54:37PM +0900, yuta takeshita wrote: > Hi, > > Tanks for responding and making a patch. > > 2016-01-14 19:16 GMT+09:00 Dejan Muhamedagic : > > > On Thu, Jan 14, 2016 at 11:04:09AM +0100, Dejan Muhamedagic wrote: > > > Hi, > > > > > > On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote: > > > > Hello. > > > > > > > > I have been a problem with nfsserver RA on RHEL 7.1 and systemd. > > > > When the nfsd process is lost with unexpectly failure, > > nfsserver_monitor() > > > > doesn't detect it and doesn't execute failover. > > > > > > > > I use the below RA.(but this problem may be caused with latest > > nfsserver RA > > > > as well) > > > > > > https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver > > > > > > > > The cause is following. > > > > > > > > 1. After execute "pkill -9 nfsd", "systemctl status nfs-server.service" > > > > returns 0. > > > > > > I think that it should be systemctl is-active. Already had a > > > problem with systemctl status, well, not being what one would > > > assume status would be. Can you please test that and then open > > > either a pull request or issue at > > > https://github.com/ClusterLabs/resource-agents > > > > I already made a pull request: > > > > https://github.com/ClusterLabs/resource-agents/pull/741 > > > > Please test if you find time. > > > I tested the code, but still problems remain. > systemctl is-active retrun active and the return code is 0 as well as > systemctl status. > Perhaps it is inappropriate to use systemctl for monitoring the kernel > process. OK. My patch was too naive and didn't take into account the systemd/kernel intricacies. > Mr Kay Sievers who is a developer of systemd said that systemd doesn't > monitor kernel process in the following. > http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/34367 Thanks for the reference. One interesting thing could also be reading /proc/fs/nfsd/threads instead of checking the process existence. Furthermore, we could do some RPC based monitor, but that would be, I guess, better suited for another monitor depth. Cheers, Dejan > I reply to your pull request. > > Regards, > Yuta Takeshita > > > > > Thanks for reporting! > > > > Dejan > > > > > Thanks, > > > > > > Dejan > > > > > > > 2. nfsserver_monitor() judge with the return value of "systemctl status > > > > nfs-server.service". > > > > > > > > -- > > > > # ps ax | grep nfsd > > > > 25193 ?S< 0:00 [nfsd4] > > > > 25194 ?S< 0:00 [nfsd4_callbacks] > > > > 25197 ?S 0:00 [nfsd] > > > > 25198 ?S 0:00 [nfsd] > > > > 25199 ?S 0:00 [nfsd] > > > > 25200 ?S 0:00 [nfsd] > > > > 25201 ?S 0:00 [nfsd] > > > > 25202 ?S 0:00 [nfsd] > > > > 25203 ?S 0:00 [nfsd] > > > > 25204 ?S 0:00 [nfsd] > > > > 25238 pts/0S+ 0:00 grep --color=auto nfsd > > > > # > > > > # pkill -9 nfsd > > > > # > > > > # systemctl status nfs-server.service > > > > ● nfs-server.service - NFS server and services > > > >Loaded: loaded (/etc/systemd/system/nfs-server.service; disabled; > > vendor > > > > preset: disabled) > > > >Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min 3s ago > > > > Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS > > (code=exited, > > > > status=0/SUCCESS) > > > > Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited, > > > > status=0/SUCCESS) > > > > Main PID: 25184 (code=exited, status=0/SUCCESS) > > > >CGroup: /system.slice/nfs-server.service > > > > (snip) > > > > # > > > > # echo $? > > > > 0 > > > > # > > > > # ps ax | grep nfsd > > > > 25256 pts/0S+ 0:00 grep --color=auto nfsd > > > > -- > > > > > > > > It is because the nfsd process is kernel process, and systemd does not > > > > monitor the state of the kernel process of running. > > > > > > > > Is there something good way? > > > > (When I use "pidof" instead of "systemctl status", the faileover is > > > > successful.) > > > > > > > > Regards, > > > > Yuta Takeshita > > > > > > > ___ > > > > Users mailing list: Users@clusterlabs.org > > > > http://clusterlabs.org/mailman/listinfo/users > > > > > > > > Project Home: http://www.clusterlabs.org > > > > Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > > Bugs: http://bugs.clusterlabs.org > > > > > > > > > ___ > > > Users mailing list: Users@clusterlabs.org > > > http://clusterlabs.org/mailman/listinfo/users > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > ___ > > Users mailing list: U
Re: [ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.
Hi, Tanks for responding and making a patch. 2016-01-14 19:16 GMT+09:00 Dejan Muhamedagic : > On Thu, Jan 14, 2016 at 11:04:09AM +0100, Dejan Muhamedagic wrote: > > Hi, > > > > On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote: > > > Hello. > > > > > > I have been a problem with nfsserver RA on RHEL 7.1 and systemd. > > > When the nfsd process is lost with unexpectly failure, > nfsserver_monitor() > > > doesn't detect it and doesn't execute failover. > > > > > > I use the below RA.(but this problem may be caused with latest > nfsserver RA > > > as well) > > > > https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver > > > > > > The cause is following. > > > > > > 1. After execute "pkill -9 nfsd", "systemctl status nfs-server.service" > > > returns 0. > > > > I think that it should be systemctl is-active. Already had a > > problem with systemctl status, well, not being what one would > > assume status would be. Can you please test that and then open > > either a pull request or issue at > > https://github.com/ClusterLabs/resource-agents > > I already made a pull request: > > https://github.com/ClusterLabs/resource-agents/pull/741 > > Please test if you find time. > I tested the code, but still problems remain. systemctl is-active retrun active and the return code is 0 as well as systemctl status. Perhaps it is inappropriate to use systemctl for monitoring the kernel process. Mr Kay Sievers who is a developer of systemd said that systemd doesn't monitor kernel process in the following. http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/34367 I reply to your pull request. Regards, Yuta Takeshita > > Thanks for reporting! > > Dejan > > > Thanks, > > > > Dejan > > > > > 2. nfsserver_monitor() judge with the return value of "systemctl status > > > nfs-server.service". > > > > > > -- > > > # ps ax | grep nfsd > > > 25193 ?S< 0:00 [nfsd4] > > > 25194 ?S< 0:00 [nfsd4_callbacks] > > > 25197 ?S 0:00 [nfsd] > > > 25198 ?S 0:00 [nfsd] > > > 25199 ?S 0:00 [nfsd] > > > 25200 ?S 0:00 [nfsd] > > > 25201 ?S 0:00 [nfsd] > > > 25202 ?S 0:00 [nfsd] > > > 25203 ?S 0:00 [nfsd] > > > 25204 ?S 0:00 [nfsd] > > > 25238 pts/0S+ 0:00 grep --color=auto nfsd > > > # > > > # pkill -9 nfsd > > > # > > > # systemctl status nfs-server.service > > > ● nfs-server.service - NFS server and services > > >Loaded: loaded (/etc/systemd/system/nfs-server.service; disabled; > vendor > > > preset: disabled) > > >Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min 3s ago > > > Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS > (code=exited, > > > status=0/SUCCESS) > > > Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited, > > > status=0/SUCCESS) > > > Main PID: 25184 (code=exited, status=0/SUCCESS) > > >CGroup: /system.slice/nfs-server.service > > > (snip) > > > # > > > # echo $? > > > 0 > > > # > > > # ps ax | grep nfsd > > > 25256 pts/0S+ 0:00 grep --color=auto nfsd > > > -- > > > > > > It is because the nfsd process is kernel process, and systemd does not > > > monitor the state of the kernel process of running. > > > > > > Is there something good way? > > > (When I use "pidof" instead of "systemctl status", the faileover is > > > successful.) > > > > > > Regards, > > > Yuta Takeshita > > > > > ___ > > > Users mailing list: Users@clusterlabs.org > > > http://clusterlabs.org/mailman/listinfo/users > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > > > ___ > > Users mailing list: Users@clusterlabs.org > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.
On Thu, Jan 14, 2016 at 11:04:09AM +0100, Dejan Muhamedagic wrote: > Hi, > > On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote: > > Hello. > > > > I have been a problem with nfsserver RA on RHEL 7.1 and systemd. > > When the nfsd process is lost with unexpectly failure, nfsserver_monitor() > > doesn't detect it and doesn't execute failover. > > > > I use the below RA.(but this problem may be caused with latest nfsserver RA > > as well) > > https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver > > > > The cause is following. > > > > 1. After execute "pkill -9 nfsd", "systemctl status nfs-server.service" > > returns 0. > > I think that it should be systemctl is-active. Already had a > problem with systemctl status, well, not being what one would > assume status would be. Can you please test that and then open > either a pull request or issue at > https://github.com/ClusterLabs/resource-agents I already made a pull request: https://github.com/ClusterLabs/resource-agents/pull/741 Please test if you find time. Thanks for reporting! Dejan > Thanks, > > Dejan > > > 2. nfsserver_monitor() judge with the return value of "systemctl status > > nfs-server.service". > > > > -- > > # ps ax | grep nfsd > > 25193 ?S< 0:00 [nfsd4] > > 25194 ?S< 0:00 [nfsd4_callbacks] > > 25197 ?S 0:00 [nfsd] > > 25198 ?S 0:00 [nfsd] > > 25199 ?S 0:00 [nfsd] > > 25200 ?S 0:00 [nfsd] > > 25201 ?S 0:00 [nfsd] > > 25202 ?S 0:00 [nfsd] > > 25203 ?S 0:00 [nfsd] > > 25204 ?S 0:00 [nfsd] > > 25238 pts/0S+ 0:00 grep --color=auto nfsd > > # > > # pkill -9 nfsd > > # > > # systemctl status nfs-server.service > > ● nfs-server.service - NFS server and services > >Loaded: loaded (/etc/systemd/system/nfs-server.service; disabled; vendor > > preset: disabled) > >Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min 3s ago > > Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, > > status=0/SUCCESS) > > Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited, > > status=0/SUCCESS) > > Main PID: 25184 (code=exited, status=0/SUCCESS) > >CGroup: /system.slice/nfs-server.service > > (snip) > > # > > # echo $? > > 0 > > # > > # ps ax | grep nfsd > > 25256 pts/0S+ 0:00 grep --color=auto nfsd > > -- > > > > It is because the nfsd process is kernel process, and systemd does not > > monitor the state of the kernel process of running. > > > > Is there something good way? > > (When I use "pidof" instead of "systemctl status", the faileover is > > successful.) > > > > Regards, > > Yuta Takeshita > > > ___ > > Users mailing list: Users@clusterlabs.org > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] nfsserver_monitor() doesn't detect nfsd process is lost.
Hi, On Thu, Jan 14, 2016 at 04:20:19PM +0900, yuta takeshita wrote: > Hello. > > I have been a problem with nfsserver RA on RHEL 7.1 and systemd. > When the nfsd process is lost with unexpectly failure, nfsserver_monitor() > doesn't detect it and doesn't execute failover. > > I use the below RA.(but this problem may be caused with latest nfsserver RA > as well) > https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/heartbeat/nfsserver > > The cause is following. > > 1. After execute "pkill -9 nfsd", "systemctl status nfs-server.service" > returns 0. I think that it should be systemctl is-active. Already had a problem with systemctl status, well, not being what one would assume status would be. Can you please test that and then open either a pull request or issue at https://github.com/ClusterLabs/resource-agents Thanks, Dejan > 2. nfsserver_monitor() judge with the return value of "systemctl status > nfs-server.service". > > -- > # ps ax | grep nfsd > 25193 ?S< 0:00 [nfsd4] > 25194 ?S< 0:00 [nfsd4_callbacks] > 25197 ?S 0:00 [nfsd] > 25198 ?S 0:00 [nfsd] > 25199 ?S 0:00 [nfsd] > 25200 ?S 0:00 [nfsd] > 25201 ?S 0:00 [nfsd] > 25202 ?S 0:00 [nfsd] > 25203 ?S 0:00 [nfsd] > 25204 ?S 0:00 [nfsd] > 25238 pts/0S+ 0:00 grep --color=auto nfsd > # > # pkill -9 nfsd > # > # systemctl status nfs-server.service > ● nfs-server.service - NFS server and services >Loaded: loaded (/etc/systemd/system/nfs-server.service; disabled; vendor > preset: disabled) >Active: active (exited) since 木 2016-01-14 11:35:39 JST; 1min 3s ago > Process: 25184 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, > status=0/SUCCESS) > Process: 25182 ExecStartPre=/usr/sbin/exportfs -r (code=exited, > status=0/SUCCESS) > Main PID: 25184 (code=exited, status=0/SUCCESS) >CGroup: /system.slice/nfs-server.service > (snip) > # > # echo $? > 0 > # > # ps ax | grep nfsd > 25256 pts/0S+ 0:00 grep --color=auto nfsd > -- > > It is because the nfsd process is kernel process, and systemd does not > monitor the state of the kernel process of running. > > Is there something good way? > (When I use "pidof" instead of "systemctl status", the faileover is > successful.) > > Regards, > Yuta Takeshita > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org