Hi, In my case, the umount succeed when the Fibre Channels is disconnected, so it seemed that the handling status file caused a longer failover, as Dejan said. If the umount fails, it will go into a timeout, might call stonith action, and this case also makes sense (though I couldn't see this).
I tried the following setup; (1) timeout : multipath > RA multipath timeout = 120s Filesystem RA stop timeout = 60s (2) timeout : multipath < RA multipath timeout = 60s Filesystem RA stop timeout = 120s case (1), Filesystem_stop() fails. The hanging FC causes the stop timeout. case (2), Filesystem_stop() succeeds. Filesystem is hanging out, but line 758 and 759 succeed(rc=0). The status file is no more inaccessible, so it remains on the filesystem, in fact. > > 758 if [ -f "$STATUSFILE" ]; then > > 759 rm -f ${STATUSFILE} > > 760 if [ $? -ne 0 ]; then so, the line 761 might not be called as expected. > > 761 ocf_log warn "Failed to remove status file ${STATUSFILE}." By the way, my concern is the unexpected stop timeout and the longer fail over time, if OCF_CHECK_LEVEL is set as 20, it would be better to try remove its status file just in case. It can handle the case (2) if the user wants to recover this case with STONITH. Thanks, Junko 2012/5/8 Dejan Muhamedagic <de...@suse.de>: > Hi Lars, > > On Tue, May 08, 2012 at 01:35:16PM +0200, Lars Marowsky-Bree wrote: >> On 2012-05-08T12:08:27, Dejan Muhamedagic <de...@suse.de> wrote: >> >> > > In the default (without OCF_CHECK_LEVE), it's enough to try unmount >> > > the file system, isn't it? >> > > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/Filesystem#L774 >> > >> > I don't see a need to remove the STATUSFILE at all, as that may >> > (and as you observed it) prevent the filesystem from stopping. >> > Perhaps to skip it altogether? If nobody objects let's just >> > remove this code: >> > >> > 758 if [ -f "$STATUSFILE" ]; then >> > 759 rm -f ${STATUSFILE} >> > 760 if [ $? -ne 0 ]; then >> > 761 ocf_log warn "Failed to remove status file >> > ${STATUSFILE}." >> > 762 fi >> > 763 fi >> >> That would mean you can no longer differentiate between a "crash" and a >> clean unmount. > > One could take a look at the logs. I guess that a crash would > otherwise be noticeable as well :) > >> A hanging FC/SAN is likely to be unable to flush any other dirty buffers >> too, as well, so the umount may not necessarily succeed w/o errors. I >> think it's unreasonable to expect that the node will survive such a >> scenario w/o recovery. > > True. However, in case of network attached storage or other > transient errors it may lead to an unnecessary timeout followed > by fencing, i.e. the chance for a longer failover time is higher. > Just leaving a file around may not justify the risk. > > Junko-san, what was your experience? > > Cheers, > > Dejan > >> Regards, >> Lars >> >> -- >> Architect Storage/HA >> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, >> HRB 21284 (AG Nürnberg) >> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde >> >> _______________________________________________________ >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >> Home Page: http://linux-ha.org/ > _______________________________________________________ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/