Andy Kurth created VCL-924:
------------------------------

             Summary: Commands may hang on management node if it has an 
unavailable NFS share
                 Key: VCL-924
                 URL: https://issues.apache.org/jira/browse/VCL-924
             Project: VCL
          Issue Type: Bug
          Components: vcld (backend)
    Affects Versions: 2.4.2
            Reporter: Andy Kurth
            Assignee: Andy Kurth
             Fix For: 2.5


We came across a situation on one of our management nodes related to this:
https://bugzilla.redhat.com/show_bug.cgi?id=962755

The management node had an old NFS share mounted from a storage unit which was 
removed from service.  Attempts to unmount the share were not successful.

Under fairly rare circumstances, a vcld process will call lsof on the 
management node in order to determine which other vcld process is preventing it 
from obtaining a semaphore.  This vcld process hung indefinitely due to the 
unavailable NFS share and the issue described in the link above.

There is currently no timeout mechanism built into the code which executes 
commands locally on the management node.  It would be beneficial to add one and 
specify a timeout on commands which may hang such as lsof.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to