[jira] [Commented] (VCL-1023) Cluster reservations may fail to copy an image if assigned to multiple VM hosts sharing a datastore

ASF subversion and git services (JIRA) Thu, 16 Mar 2017 08:37:57 -0700

    [ 
https://issues.apache.org/jira/browse/VCL-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928284#comment-15928284
 ]


ASF subversion and git services commented on VCL-1023:
------------------------------------------------------

Commit 1787209 from [email protected] in branch 'vcl/trunk'
[ https://svn.apache.org/r1787209 ]

VCL-1023
Reworked Semaphore.pm to use the new vcldsemaphore table instead of a lockfile 
on the management node. This allows semaphores to be obeyed for vcld processes 
running on different management nodes.

Added VMware.pm::get_datastore_directory_semaphore which retrieves a 
datastore's URL/UUID identifier and uses this to obtain a semaphore instead of 
the descriptive datastore name. This allows semaphores to be obeyed for a 
particular directory on a datastore, even if different hosts to mount the same 
datastore with using different names.

Updated VMware.pm::prepare_vmdk to use get_datastore_directory_semaphore.

Deleted no longer used subroutines:
* Module.pm::does_semaphore_exist
* Semaphore.pm::get_process_semaphore_ids
* Semaphore.pm::get_reservation_semaphore_ids
* Semaphore.pm::semaphore_exists
* Semaphore.pm::get_lockfile_paths
* Semaphore.pm::release_lockfile

Updated Module.pm to not call 'use VCL::Module::Semaphore' at the beginning. 
This causes subroutine redefined warnings because there's sort of a circular 
reference the way Module.pm uses Semaphore.pm and Semaphore.pm inherits from 
Module.pm. Added require and import statement inside of 
Module.pm::get_semaphore contained within an eval block as a replacement.

Updated new.pm::computer_not_being_used to use 
utils.pm::get_vcld_semaphore_info instead of calling 
Semaphore.pm::get_process_semaphore_ids.


Other
Added DataStructure.pm::get_connect_method_info_matching_name. It was used for 
some experimentation and isn't currently being called, but may be useful in the 
future.

Updated Linux.pm::get_network_bridge_info to check for exit statuses > 0 
instead of anything != 0. Perl occasionally returns -1 even though the command 
was successful.

> Cluster reservations may fail to copy an image if assigned to multiple VM 
> hosts sharing a datastore
> ---------------------------------------------------------------------------------------------------
>
>                 Key: VCL-1023
>                 URL: https://issues.apache.org/jira/browse/VCL-1023
>             Project: VCL
>          Issue Type: Bug
>          Components: vcld (backend)
>    Affects Versions: 2.4.2
>            Reporter: Andy Kurth
>            Assignee: Andy Kurth
>             Fix For: 2.5
>
>
> Conditions:
> * Cluster request
> * Multiple reservations are assigned the same image revision
> * Reservations are assigned to VMs on different VMware ESXi hosts
> * VMware ESXi hosts share a common virtual disk image datastore
> * Image does not yet exist on the datastore and needs to be copied from the 
> repository
> Each vcld process checks if the image needs to be copied from the repository 
> to the datastore.  Since the same image revision was assigned to multiple 
> reservations in the cluster request, multiple vcld processes determine the 
> image needs to be copied.
> The code does obtain a semaphore before attempting to copy the image.  
> However, the semaphore name is based on both the VM host name and image name:
> {noformat}
> 2017-03-14 
> 00:25:46|18904|3115170|3222911|new|Module.pm:get_semaphore|1601|created 
> 'blade1a1-13-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
>  Semaphore object, memory address: 557fdf0
> 2017-03-14 
> 00:25:46|18908|3115170|3222912|new|Module.pm:get_semaphore|1601|created 
> 'blade1a1-8-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
>  Semaphore object, memory address: 5023f10
> 2017-03-14 
> 00:25:47|18913|3115170|3222914|new|Module.pm:get_semaphore|1601|created 
> 'blade1a1-9-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
>  Semaphore object, memory address: 5024518
> 2017-03-14 
> 00:25:47|18926|3115170|3222918|new|Module.pm:get_semaphore|1601|created 
> 'blade1a1-3-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
>  Semaphore object, memory address: 50256d0
> 2017-03-14 
> 00:26:12|18930|3115170|3222919|new|Module.pm:get_semaphore|1601|created 
> 'blade1a1-11-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
>  Semaphore object, memory address: 5021988
> 2017-03-14 
> 00:31:18|18917|3115170|3222916|new|Module.pm:get_semaphore|1601|created 
> 'blade1a1-13-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
>  Semaphore object, memory address: 5578c60
> 2017-03-14 
> 00:31:24|18922|3115170|3222917|new|Module.pm:get_semaphore|1601|created 
> 'blade1a1-3-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
>  Semaphore object, memory address: 4493e78
> {noformat}
> The first 5 processes each obtained a semaphore within 30 seconds of each 
> other.  Afterwards, each attempted to copy the .vmdk to the same shared 
> directory.
> The last 2 processes obeyed the semaphore and waited several minutes because 
> the VM host name was the same as that of another reservation.  Once the 
> process assigned to the same VM host finished attempting to copy the .vmdk 
> and released the semaphore, the last 2 processes checked if the copy was 
> still necessary.  This is how it is supposed to work for all processes 
> copying to the same destination.
> The code should be updated to use a better name for the semaphore.  The 
> datastore UUID should be used along with the image revision name.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (VCL-1023) Cluster reservations may fail to copy an image if assigned to multiple VM hosts sharing a datastore

Reply via email to