A pair of locking issues in GFS2 observed when running VM storgae stress tests.
0001-GFS2-use-schedule-timeout-in-find-insert-glock.patch covers a case where an application level flock would wedge. The VM control plane makes extensive use of flocks to control access to VM virtual disks and databases and we envountered several failed tests where the flocks did not get acquired even when noone was holding them. Investigation indicates that there is a race in find_insert_glock where the call to schedule can be called when the expected waiter has already completed its work. Replace schedule with schedule_timeout and log. 0002-GFS2-Flush-the-GFS2-delete-workqueue-before-stopping.patch covers a case where umount would wedge unrecoverably. The completion of the stress test involves the deletion of the test machines and virtual disks followed by the filesystem being unmounted on all hosts before the hosts are returned to the lab pool. umount was found to wedge and this has been traced to gfs2_log_reserve being called in the flush_workqueue but after the associated kthread processes had been stopped. Thus there was nobody to handle the log reserver request and the code wedged. Mark Syms (1): GFS2: use schedule timeout in find insert glock Tim Smith (1): GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads fs/gfs2/glock.c | 3 ++- fs/gfs2/super.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) -- 1.8.3.1