Andy Kurth created VCL-1023:
-------------------------------

             Summary: Cluster reservations may fail to copy an image if 
assigned to multiple VM hosts sharing a datastore
                 Key: VCL-1023
                 URL: https://issues.apache.org/jira/browse/VCL-1023
             Project: VCL
          Issue Type: Bug
          Components: vcld (backend)
    Affects Versions: 2.4.2
            Reporter: Andy Kurth
            Assignee: Andy Kurth
             Fix For: 2.5


Conditions:
* Cluster request
* Multiple reservations are assigned the same image revision
* Reservations are assigned to VMs on different VMware ESXi hosts
* VMware ESXi hosts share a common virtual disk image datastore
* Image does not yet exist on the datastore and needs to be copied from the 
repository

Each vcld process checks if the image needs to be copied from the repository to 
the datastore.  Since the same image revision was assigned to multiple 
reservations in the cluster request, multiple vcld processes determine the 
image needs to be copied.

The code does obtain a semaphore before attempting to copy the image.  However, 
the semaphore name is based on both the VM host name and image name:

{noformat}
2017-03-14 
00:25:46|18904|3115170|3222911|new|Module.pm:get_semaphore|1601|created 
'blade1a1-13-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
 Semaphore object, memory address: 557fdf0
2017-03-14 
00:25:46|18908|3115170|3222912|new|Module.pm:get_semaphore|1601|created 
'blade1a1-8-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
 Semaphore object, memory address: 5023f10
2017-03-14 
00:25:47|18913|3115170|3222914|new|Module.pm:get_semaphore|1601|created 
'blade1a1-9-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
 Semaphore object, memory address: 5024518
2017-03-14 
00:25:47|18926|3115170|3222918|new|Module.pm:get_semaphore|1601|created 
'blade1a1-3-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
 Semaphore object, memory address: 50256d0
2017-03-14 
00:26:12|18930|3115170|3222919|new|Module.pm:get_semaphore|1601|created 
'blade1a1-11-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
 Semaphore object, memory address: 5021988
2017-03-14 
00:31:18|18917|3115170|3222916|new|Module.pm:get_semaphore|1601|created 
'blade1a1-13-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
 Semaphore object, memory address: 5578c60
2017-03-14 
00:31:24|18922|3115170|3222917|new|Module.pm:get_semaphore|1601|created 
'blade1a1-3-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
 Semaphore object, memory address: 4493e78
{noformat}

The first 5 processes each obtained a semaphore within 30 seconds of each 
other.  Afterwards, each attempted to copy the .vmdk to the same shared 
directory.

The last 2 processes obeyed the semaphore and waited several minutes because 
the VM host name was the same as that of another reservation.  Once the process 
assigned to the same VM host finished attempting to copy the .vmdk and released 
the semaphore, the last 2 processes checked if the copy was still necessary.  
This is how it is supposed to work for all processes copying to the same 
destination.

The code should be updated to use a better name for the semaphore.  The 
datastore UUID should be used along with the image revision name.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to