[
https://issues.apache.org/jira/browse/VCL-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098889#comment-15098889
]
ASF subversion and git services commented on VCL-924:
-----------------------------------------------------
Commit 1724690 from [email protected] in branch 'vcl/trunk'
[ https://svn.apache.org/r1724690 ]
VCL-924
Added $timeout_seconds argument to utils.pm::run_command. It uses an alarm if
this argument is provided. Updated ManagementNode.pm::execute which calls
utils.pm::run_command to pass the argument if provided. Added argument to lsof
command in Semaphore.pm.
> Commands may hang on management node if it has an unavailable NFS share
> -----------------------------------------------------------------------
>
> Key: VCL-924
> URL: https://issues.apache.org/jira/browse/VCL-924
> Project: VCL
> Issue Type: Bug
> Components: vcld (backend)
> Affects Versions: 2.4.2
> Reporter: Andy Kurth
> Assignee: Andy Kurth
> Fix For: 2.5
>
>
> We came across a situation on one of our management nodes related to this:
> https://bugzilla.redhat.com/show_bug.cgi?id=962755
> The management node had an old NFS share mounted from a storage unit which
> was removed from service. Attempts to unmount the share were not successful.
> Under fairly rare circumstances, a vcld process will call lsof on the
> management node in order to determine which other vcld process is preventing
> it from obtaining a semaphore. This vcld process hung indefinitely due to
> the unavailable NFS share and the issue described in the link above.
> There is currently no timeout mechanism built into the code which executes
> commands locally on the management node. It would be beneficial to add one
> and specify a timeout on commands which may hang such as lsof.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)