I just returned to this problem a little while ago today.

The original reason I asked this question was because I noticed an issue
when attaching multiple volumes to a VM at the same time.

The attach logic is properly funneled through the VM job queue, but it
still fails (for the second attach command, which is executed right after
kicking of the attach command for the first volume).

As it turns out, the device ID - if not explicitly passed in - is acquired
by logic that runs BEFORE the attach command is submitted to the job queue.

What this means is you can have two attach commands running at the same
time and both can get the same "next" device ID (the logic is not
serialized until submitted to the job queue and the device ID that's
actually used by the hypervisor is not recorded in the DB until the
hypervisor returns that the volume was successfully attached). As such, the
second command that's sent to the VM to attach a volume has a device ID
that's already in use (it just became in use a moment earlier).

I rectified this situation by moving the call to get the "next" device ID
to a location that's invoked from the job queue. This way the two commands
will get unique device IDs.

Although this race condition has most likely been in the code for a while,
it's not likely to manifest itself to "non-managed" storage. The reason is
that non-managed-storage logic doesn't have to issue a command over the
network to have the SAN put the volume to attach in the correct ACL
("grantAccess" logic). Since managed storage has this work to do, it ends
up being just a little slower to attach a volume to a VM than for
non-managed storage. When you take the extra latency away, I've never been
able to get this race condition to surface.

Either way, it was a race condition and now it's fixed.

On Tue, Jan 13, 2015 at 6:51 PM, Nitin Mehta <nitin.me...@citrix.com> wrote:

> +Min.
>
> Unfortunately, I don’t think the framework is enhanced for all the
> different kinds of resources, but it should be the way to go.
> IMHO  Serialization through states was/is just a hacky way of getting
> around the situation and should be discontinued.
> Ideally, state of a resource should reflect only its lifecycle not the
> operations such as snapshotting, migrating etc.
>
> Thanks,
> -Nitin
>
> On 13/01/15 4:32 PM, "Mike Tutkowski" <mike.tutkow...@solidfire.com>
> wrote:
>
> >It appears that the job queue is used for some commands while for others
> >it
> >is not.
> >
> >Is the intend of the job queue to only serialize operations that are sent
> >to VMs?
> >
> >On Tue, Jan 13, 2015 at 3:14 PM, Mike Tutkowski <
> >mike.tutkow...@solidfire.com> wrote:
> >
> >> This is 4.6.
> >>
> >> It seems like our state-transitioning logic is intended (as one might
> >> expect) to protect the object in question from transitions that are
> >>invalid
> >> given it's current state (this is what I would expect).
> >>
> >> I do not see, say, the attach and detach operations being serialized. It
> >> seems they are running simultaneously.
> >>
> >> On Tue, Jan 13, 2015 at 2:09 PM, Nitin Mehta <nitin.me...@citrix.com>
> >> wrote:
> >>
> >>> States shouldn¹t be used to serialize operations on a volume. It
> >>>should be
> >>> used to denote the lifecycle of the volume instead.
> >>> I think the async job manager does take care of the serialization.
> >>>Which
> >>> version do you see this issue happening ?
> >>>
> >>> Thanks,
> >>> -Nitin
> >>>
> >>> On 13/01/15 12:28 PM, "Mike Tutkowski" <mike.tutkow...@solidfire.com>
> >>> wrote:
> >>>
> >>> >Hi,
> >>> >
> >>> >Does anyone know why we don't currently have a state and applicable
> >>> >transitions in Volume.State for attaching and detaching volumes?
> >>> >
> >>> >It seems like you'd want to, say, transition to Attaching only when
> >>> you're
> >>> >in the Ready state (or maybe some other states, as well).
> >>> >
> >>> >I think right now you can confuse the system by sending an attach
> >>>command
> >>> >and then a detach command before the attach command finishes (it's a
> >>>race
> >>> >condition...I don't think it always causes trouble).
> >>> >
> >>> >Thoughts?
> >>> >
> >>> >Thanks,
> >>> >Mike
> >>> >
> >>> >--
> >>> >*Mike Tutkowski*
> >>> >*Senior CloudStack Developer, SolidFire Inc.*
> >>> >e: mike.tutkow...@solidfire.com
> >>> >o: 303.746.7302
> >>> >Advancing the way the world uses the cloud
> >>> ><http://solidfire.com/solution/overview/?video=play>* *
> >>>
> >>>
> >>
> >>
> >> --
> >> *Mike Tutkowski*
> >> *Senior CloudStack Developer, SolidFire Inc.*
> >> e: mike.tutkow...@solidfire.com
> >> o: 303.746.7302
> >> Advancing the way the world uses the cloud
> >> <http://solidfire.com/solution/overview/?video=play>*™*
> >>
> >
> >
> >
> >--
> >*Mike Tutkowski*
> >*Senior CloudStack Developer, SolidFire Inc.*
> >e: mike.tutkow...@solidfire.com
> >o: 303.746.7302
> >Advancing the way the world uses the cloud
> ><http://solidfire.com/solution/overview/?video=play>*™*
>
>


-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkow...@solidfire.com
o: 303.746.7302
Advancing the way the world uses the cloud
<http://solidfire.com/solution/overview/?video=play>*™*

Reply via email to