On 12/27/2016 03:08 PM, Dimitri Maziuk wrote: > I ran centos 7.3.1611 update over the holidays and my drbd + nfs + imap > active-passive pair locked up again. This has now been consistent for at > least 3 kernel updates. This time I had enough consoles open to run > fuser & lsof though. > > The procedure: > > 1. pcs cluster standby <secondary> > 2. yum up && reboot <secondary> > 3. pcs cluster unstandby <secondary> > > Fine so far. > > 4. pcs cluster standby <primary> > results in > >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:41 INFO: Running >> stop for /dev/drbd0 on /raid >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:41 INFO: Trying to >> unmount /raid >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:41 ERROR: Couldn't >> unmount /raid; trying cleanup with TERM >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:41 INFO: No >> processes on /raid were signalled. force_unmount is set to 'yes' >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:42 ERROR: Couldn't >> unmount /raid; trying cleanup with TERM >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:42 INFO: No >> processes on /raid were signalled. force_unmount is set to 'yes' >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:43 ERROR: Couldn't >> unmount /raid; trying cleanup with TERM >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:43 INFO: No >> processes on /raid were signalled. force_unmount is set to 'yes' >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:44 ERROR: Couldn't >> unmount /raid; trying cleanup with KILL >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:44 INFO: No >> processes on /raid were signalled. force_unmount is set to 'yes' >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:45 ERROR: Couldn't >> unmount /raid; trying cleanup with KILL >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:46 INFO: No >> processes on /raid were signalled. force_unmount is set to 'yes' >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:47 ERROR: Couldn't >> unmount /raid; trying cleanup with KILL >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:47 INFO: No >> processes on /raid were signalled. force_unmount is set to 'yes' >> Filesystem(drbd_filesystem)[18277]: 2016/12/23_17:36:48 ERROR: Couldn't >> unmount /raid, giving up! >> Dec 23 17:36:48 [1138] zebrafish.bmrb.wisc.edu lrmd: notice: >> operation_finished: drbd_filesystem_stop_0:18277:stderr [ umount: >> /raid: target i >> s busy. ] > > ... until the system's powered down. Before power down I ran lsof, it > hung, and fuser: > >> # fuser -vum /raid >> USER PID ACCESS COMMAND >> /raid: root kernel mount (root)/raid > > After running yum up on the primary and rebooting it again, > > 5. pcs cluster unstandby <primary> > causes the same fail to unmount loop on the secondary, that has to be > powered down until the primary recovers. > > Hopefully I'm doing something wrong, please someone tell me what it is. > Anyone? Bueller?
That is disconcerting. Since no one here seems to know, have you tried asking on the drbd list? It sounds like an issue with the drbd kernel module. http://lists.linbit.com/listinfo/drbd-user _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org