Re: [Gluster-devel] Handling locks in NSR

2016-05-05 Thread Avra Sengupta

Hi,

I have sent a patch(http://review.gluster.org/#/c/14226/1) in accordance 
to lock/unlock fops in jbr-server and the discussion we had below. 
Please feel free to review the same. Thanks.


Regards,
Avra

On 03/03/2016 12:21 PM, Avra Sengupta wrote:

On 03/03/2016 02:29 AM, Shyam wrote:

On 03/02/2016 03:10 AM, Avra Sengupta wrote:

Hi,

All fops in NSR, follow a specific workflow as described in this
UML(https://docs.google.com/presentation/d/1lxwox72n6ovfOwzmdlNCZBJ5vQcCaONvZva0aLWKUqk/edit?usp=sharing). 


However all locking fops will follow a slightly different workflow as
described below. This is a first proposed draft for handling locks, and
we would like to hear your concerns and queries regarding the same.


This change, to handle locking FOPs differently, is due to what 
limitation/problem? (apologies if I missed an earlier thread on the 
same)


My understanding is that this is due to the fact that the actual FOP 
could fail/block (non-blocking/blocking) as there is an existing lock 
held, and hence just adding a journal entry and meeting quorum, is 
not sufficient for the success of the FOP (it is necessary though to 
handle lock preservation in the event of leadership change), rather 
acquiring the lock is. Is this understanding right?
Yes it is right, the change in approach for handling locks is to avoid 
getting into a deadlock amogst the followers.


Based on the above understanding of mine, and the discussion below, 
the intention seems to be to place the locking xlator below the 
journal. What if we place this xlator above the journal, but add 
requirements that FOPs handled by this xlator needs to reach the 
journal?


Assuming we adopt this strategy (i.e the locks xlator is above the 
journal xlator), a successful lock acquisition by the locks xlator is 
not enough to guarantee that the lock is preserved across the replica 
group, hence it has to reach the journal and as a result pass through 
other replica members journal and locks xlators as well.


If we do the above, what are the advantages and repercussions of the 
same?
Why would we want to put the locking xlator above the journal. Is 
there a use case for that?
Firstly, we would have to modify the locking xlator to make it pass 
through.
We would also introduce a small window where we perform the lock 
successfully, but have a failure on the journal. We would then have to 
release the lock because we failed to journal it. In the previous 
approach, if we fail to journal it, we wouldn't even go to the locking 
xlator. Logically it makes the locking xlator dependent on the 
journal's output, whereas ideally the journal should be dependent on 
the locking xlator's output.


Some of the points noted here (like conflicting non-blocking locks 
when the previous lock is not yet released) could be handled. Also in 
your scheme, what happens to blocking lock requests, the FOP will 
block, there is no async return to handle the success/failure of the 
same.
Yes the FOP will block on blocking lock requests. I assume that's the 
behaviour today. Please correct me if I am wrong.


The downside is that on reconciliation we need to, potentially, undo 
some of the locks that are held by the locks xlator (in the new 
leader), which is outside the scope of the journal xlator.
Yes we need to do lock cleanup on reconciliation, which is anyways 
outside the scope of the journal xlator. The reconciliation daemon 
will compare the terms on each replica node, and either acquire or 
release locks accordingly.



I also assume we need to do the same for the leases xlator as well, 
right?
Yes, as long as we handle locking properly leases xlators shouldn't be 
a problem.





1. On receiving the lock, the leader will Journal the lock himself, and
then try to actually acquire the lock. At this point in time, if it
fails to acquire the lock, then it will invalidate the journal entry,
and return a -ve ack back to the client. However, if it is 
successful in

acquiring the lock, it will mark the journal entry as complete, and
forward the fop to the followers.

2. The followers on receiving the fop, will journal it, and then try to
actually acquire the lock. If it fails to acquire the lock, then it 
will

invalidate the journal entry, and return a -ve ack back to the leader.
If it is successful in acquiring the lock, it will mark the journal
entry as complete,and send a +ve ack to the leader.

3. The leader on receiving all acks, will perform a quorum check. If
quorum meets, it will send a +ve ack to the client. If the quorum 
fails,

it will send a rollback to the followers.

4. The followers on receiving the rollback, will journal it first, and
then release the acquired lock. It will update the rollback entry in 
the

journal as complete and send an ack to the leader.

5. The leader on receiving the rollback acks, will journal it's own
rollback, and then release the acquired lock. It will update the
rollback entry in the journal, and send a -ve ack to 

Re: [Gluster-devel] Handling locks in NSR

2016-03-02 Thread Shyam

On 03/02/2016 03:10 AM, Avra Sengupta wrote:

Hi,

All fops in NSR, follow a specific workflow as described in this
UML(https://docs.google.com/presentation/d/1lxwox72n6ovfOwzmdlNCZBJ5vQcCaONvZva0aLWKUqk/edit?usp=sharing).
However all locking fops will follow a slightly different workflow as
described below. This is a first proposed draft for handling locks, and
we would like to hear your concerns and queries regarding the same.


This change, to handle locking FOPs differently, is due to what 
limitation/problem? (apologies if I missed an earlier thread on the same)


My understanding is that this is due to the fact that the actual FOP 
could fail/block (non-blocking/blocking) as there is an existing lock 
held, and hence just adding a journal entry and meeting quorum, is not 
sufficient for the success of the FOP (it is necessary though to handle 
lock preservation in the event of leadership change), rather acquiring 
the lock is. Is this understanding right?


Based on the above understanding of mine, and the discussion below, the 
intention seems to be to place the locking xlator below the journal. 
What if we place this xlator above the journal, but add requirements 
that FOPs handled by this xlator needs to reach the journal?


Assuming we adopt this strategy (i.e the locks xlator is above the 
journal xlator), a successful lock acquisition by the locks xlator is 
not enough to guarantee that the lock is preserved across the replica 
group, hence it has to reach the journal and as a result pass through 
other replica members journal and locks xlators as well.


If we do the above, what are the advantages and repercussions of the same?

Some of the points noted here (like conflicting non-blocking locks when 
the previous lock is not yet released) could be handled. Also in your 
scheme, what happens to blocking lock requests, the FOP will block, 
there is no async return to handle the success/failure of the same.


The downside is that on reconciliation we need to, potentially, undo 
some of the locks that are held by the locks xlator (in the new leader), 
which is outside the scope of the journal xlator.


I also assume we need to do the same for the leases xlator as well, right?



1. On receiving the lock, the leader will Journal the lock himself, and
then try to actually acquire the lock. At this point in time, if it
fails to acquire the lock, then it will invalidate the journal entry,
and return a -ve ack back to the client. However, if it is successful in
acquiring the lock, it will mark the journal entry as complete, and
forward the fop to the followers.

2. The followers on receiving the fop, will journal it, and then try to
actually acquire the lock. If it fails to acquire the lock, then it will
invalidate the journal entry, and return a -ve ack back to the leader.
If it is successful in acquiring the lock, it will mark the journal
entry as complete,and send a +ve ack to the leader.

3. The leader on receiving all acks, will perform a quorum check. If
quorum meets, it will send a +ve ack to the client. If the quorum fails,
it will send a rollback to the followers.

4. The followers on receiving the rollback, will journal it first, and
then release the acquired lock. It will update the rollback entry in the
journal as complete and send an ack to the leader.

5. The leader on receiving the rollback acks, will journal it's own
rollback, and then release the acquired lock. It will update the
rollback entry in the journal, and send a -ve ack to the client.

Few things to be noted in the above workflow are:
1. It will be a synchronous operation, across the replica volume.
2. Reconciliation will take care of nodes who have missed out the locks.
3. On a client disconnect, there will be a lock-timeout on whose
expiration all locks held by that particular client will be released.

Regards,
Avra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Handling locks in NSR

2016-03-02 Thread Rajesh Joseph


- Original Message -
> From: "Atin Mukherjee" <atin.mukherje...@gmail.com>
> To: "Avra Sengupta" <aseng...@redhat.com>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Wednesday, March 2, 2016 4:03:11 PM
> Subject: Re: [Gluster-devel] Handling locks in NSR
> 
> 
> 
> 
> 
> -Atin
> Sent from one plus one
> On 02-Mar-2016 3:41 pm, "Avra Sengupta" < aseng...@redhat.com > wrote:
> > 
> > On 03/02/2016 02:55 PM, Venky Shankar wrote:
> >> 
> >> On Wed, Mar 02, 2016 at 02:29:26PM +0530, Avra Sengupta wrote:
> >>> 
> >>> On 03/02/2016 02:02 PM, Venky Shankar wrote:
> >>>> 
> >>>> On Wed, Mar 02, 2016 at 01:40:08PM +0530, Avra Sengupta wrote:
> >>>>> 
> >>>>> Hi,
> >>>>> 
> >>>>> All fops in NSR, follow a specific workflow as described in this UML(
> >>>>> https://docs.google.com/presentation/d/1lxwox72n6ovfOwzmdlNCZBJ5vQcCaONvZva0aLWKUqk/edit?usp=sharing
> >>>>> ).
> >>>>> However all locking fops will follow a slightly different workflow as
> >>>>> described below. This is a first proposed draft for handling locks, and
> >>>>> we
> >>>>> would like to hear your concerns and queries regarding the same.
> >>>>> 
> >>>>> 1. On receiving the lock, the leader will Journal the lock himself, and
> >>>>> then
> >>>>> try to actually acquire the lock. At this point in time, if it fails to
> >>>>> acquire the lock, then it will invalidate the journal entry, and return
> >>>>> a
> >>>>> -ve ack back to the client. However, if it is successful in acquiring
> >>>>> the
> >>>>> lock, it will mark the journal entry as complete, and forward the fop
> >>>>> to the
> >>>>> followers.
> >>>> 
> >>>> So, does a contending non-blocking lock operation check only on the
> >>>> leader
> >>>> since the followers might have not yet ack'd an earlier lock operation?
> >>> 
> >>> A non-blocking lock follows the same work flow, and thereby checks on the
> >>> leader first. In this case, it would be blocked on the leader, till the
> >>> leader releases the lock. Then it will follow the same workflow.
> >> 
> >> A non-blocking lock should ideally return EAGAIN if the region is already
> >> locked.
> >> Checking just on the leader (posix/locks on the leader server stack) and
> >> returning
> >> EAGAIN is kind of incomplete as the earlier lock request might not have
> >> been granted
> >> (due to failure on followers).
> >> 
> >> or does it even matter if we return EAGAIN during the transient state?
> >> 
> >> We could block the lock on the leader until an earlier lock request is
> >> satisfied
> >> (in which case return EAGAIN) or in case of failure try to satisfy the
> >> lock request.
> > 
> > That is what I said, it will be blocked on the leader till the leader
> > releases the already held lock.
> > 
> >> 
> >>>>> 2. The followers on receiving the fop, will journal it, and then try to
> >>>>> actually acquire the lock. If it fails to acquire the lock, then it
> >>>>> will
> >>>>> invalidate the journal entry, and return a -ve ack back to the leader.
> >>>>> If it
> >>>>> is successful in acquiring the lock, it will mark the journal entry as
> >>>>> complete,and send a +ve ack to the leader.
> >>>>> 
> >>>>> 3. The leader on receiving all acks, will perform a quorum check. If
> >>>>> quorum
> >>>>> meets, it will send a +ve ack to the client. If the quorum fails, it
> >>>>> will
> >>>>> send a rollback to the followers.
> >>>>> 
> >>>>> 4. The followers on receiving the rollback, will journal it first, and
> >>>>> then
> >>>>> release the acquired lock. It will update the rollback entry in the
> >>>>> journal
> >>>>> as complete and send an ack to the leader.
> >>>> 
> >>>> What happens if the rollback fails for whatever reason?
> >>> 
> >>> The leader receives a -ve rollback ack, but there's litt

Re: [Gluster-devel] Handling locks in NSR

2016-03-02 Thread Atin Mukherjee
-Atin
Sent from one plus one
On 02-Mar-2016 1:40 pm, "Avra Sengupta"  wrote:
>
> Hi,
>
> All fops in NSR, follow a specific workflow as described in this UML(
https://docs.google.com/presentation/d/1lxwox72n6ovfOwzmdlNCZBJ5vQcCaONvZva0aLWKUqk/edit?usp=sharing).
However all locking fops will follow a slightly different workflow as
described below. This is a first proposed draft for handling locks, and we
would like to hear your concerns and queries regarding the same.
>
> 1. On receiving the lock, the leader will Journal the lock himself, and
then try to actually acquire the lock. At this point in time, if it fails
to acquire the lock, then it will invalidate the journal entry, and return
a -ve ack back to the client. However, if it is successful in acquiring the
lock, it will mark the journal entry as complete, and forward the fop to
the followers.
>
> 2. The followers on receiving the fop, will journal it, and then try to
actually acquire the lock. If it fails to acquire the lock, then it will
invalidate the journal entry, and return a -ve ack back to the leader. If
it is successful in acquiring the lock, it will mark the journal entry as
complete,and send a +ve ack to the leader.
>
> 3. The leader on receiving all acks, will perform a quorum check. If
quorum meets, it will send a +ve ack to the client. If the quorum fails, it
will send a rollback to the followers.
>
> 4. The followers on receiving the rollback, will journal it first, and
then release the acquired lock. It will update the rollback entry in the
journal as complete and send an ack to the leader.
>
> 5. The leader on receiving the rollback acks, will journal it's own
rollback, and then release the acquired lock. It will update the rollback
entry in the journal, and send a -ve ack to the client.
>
> Few things to be noted in the above workflow are:
> 1. It will be a synchronous operation, across the replica volume.
Is this true with existing replication mechanism?
> 2. Reconciliation will take care of nodes who have missed out the locks.
> 3. On a client disconnect, there will be a lock-timeout on whose
expiration all locks held by that particular client will be released.
>
> Regards,
> Avra
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Handling locks in NSR

2016-03-02 Thread Venky Shankar
On Wed, Mar 02, 2016 at 02:29:26PM +0530, Avra Sengupta wrote:
> On 03/02/2016 02:02 PM, Venky Shankar wrote:
> >On Wed, Mar 02, 2016 at 01:40:08PM +0530, Avra Sengupta wrote:
> >>Hi,
> >>
> >>All fops in NSR, follow a specific workflow as described in this 
> >>UML(https://docs.google.com/presentation/d/1lxwox72n6ovfOwzmdlNCZBJ5vQcCaONvZva0aLWKUqk/edit?usp=sharing).
> >>However all locking fops will follow a slightly different workflow as
> >>described below. This is a first proposed draft for handling locks, and we
> >>would like to hear your concerns and queries regarding the same.
> >>
> >>1. On receiving the lock, the leader will Journal the lock himself, and then
> >>try to actually acquire the lock. At this point in time, if it fails to
> >>acquire the lock, then it will invalidate the journal entry, and return a
> >>-ve ack back to the client. However, if it is successful in acquiring the
> >>lock, it will mark the journal entry as complete, and forward the fop to the
> >>followers.
> >So, does a contending non-blocking lock operation check only on the leader
> >since the followers might have not yet ack'd an earlier lock operation?
> A non-blocking lock follows the same work flow, and thereby checks on the
> leader first. In this case, it would be blocked on the leader, till the
> leader releases the lock. Then it will follow the same workflow.

A non-blocking lock should ideally return EAGAIN if the region is already 
locked.
Checking just on the leader (posix/locks on the leader server stack) and 
returning
EAGAIN is kind of incomplete as the earlier lock request might not have been 
granted
(due to failure on followers).

or does it even matter if we return EAGAIN during the transient state?

We could block the lock on the leader until an earlier lock request is satisfied
(in which case return EAGAIN) or in case of failure try to satisfy the lock 
request.

> >
> >>2. The followers on receiving the fop, will journal it, and then try to
> >>actually acquire the lock. If it fails to acquire the lock, then it will
> >>invalidate the journal entry, and return a -ve ack back to the leader. If it
> >>is successful in acquiring the lock, it will mark the journal entry as
> >>complete,and send a +ve ack to the leader.
> >>
> >>3. The leader on receiving all acks, will perform a quorum check. If quorum
> >>meets, it will send a +ve ack to the client. If the quorum fails, it will
> >>send a rollback to the followers.
> >>
> >>4. The followers on receiving the rollback, will journal it first, and then
> >>release the acquired lock. It will update the rollback entry in the journal
> >>as complete and send an ack to the leader.
> >What happens if the rollback fails for whatever reason?
> The leader receives a -ve rollback ack, but there's little it can do about
> it. Depending on the failure, it will be resolved during reconciliation
> >
> >>5. The leader on receiving the rollback acks, will journal it's own
> >>rollback, and then release the acquired lock. It will update the rollback
> >>entry in the journal, and send a -ve ack to the client.
> >>
> >>Few things to be noted in the above workflow are:
> >>1. It will be a synchronous operation, across the replica volume.
> >>2. Reconciliation will take care of nodes who have missed out the locks.
> >>3. On a client disconnect, there will be a lock-timeout on whose expiration
> >>all locks held by that particular client will be released.
> >>
> >>Regards,
> >>Avra
> >>___
> >>Gluster-devel mailing list
> >>Gluster-devel@gluster.org
> >>http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Handling locks in NSR

2016-03-02 Thread Avra Sengupta

On 03/02/2016 02:02 PM, Venky Shankar wrote:

On Wed, Mar 02, 2016 at 01:40:08PM +0530, Avra Sengupta wrote:

Hi,

All fops in NSR, follow a specific workflow as described in this 
UML(https://docs.google.com/presentation/d/1lxwox72n6ovfOwzmdlNCZBJ5vQcCaONvZva0aLWKUqk/edit?usp=sharing).
However all locking fops will follow a slightly different workflow as
described below. This is a first proposed draft for handling locks, and we
would like to hear your concerns and queries regarding the same.

1. On receiving the lock, the leader will Journal the lock himself, and then
try to actually acquire the lock. At this point in time, if it fails to
acquire the lock, then it will invalidate the journal entry, and return a
-ve ack back to the client. However, if it is successful in acquiring the
lock, it will mark the journal entry as complete, and forward the fop to the
followers.

So, does a contending non-blocking lock operation check only on the leader
since the followers might have not yet ack'd an earlier lock operation?
A non-blocking lock follows the same work flow, and thereby checks on 
the leader first. In this case, it would be blocked on the leader, till 
the leader releases the lock. Then it will follow the same workflow.



2. The followers on receiving the fop, will journal it, and then try to
actually acquire the lock. If it fails to acquire the lock, then it will
invalidate the journal entry, and return a -ve ack back to the leader. If it
is successful in acquiring the lock, it will mark the journal entry as
complete,and send a +ve ack to the leader.

3. The leader on receiving all acks, will perform a quorum check. If quorum
meets, it will send a +ve ack to the client. If the quorum fails, it will
send a rollback to the followers.

4. The followers on receiving the rollback, will journal it first, and then
release the acquired lock. It will update the rollback entry in the journal
as complete and send an ack to the leader.

What happens if the rollback fails for whatever reason?
The leader receives a -ve rollback ack, but there's little it can do 
about it. Depending on the failure, it will be resolved during 
reconciliation



5. The leader on receiving the rollback acks, will journal it's own
rollback, and then release the acquired lock. It will update the rollback
entry in the journal, and send a -ve ack to the client.

Few things to be noted in the above workflow are:
1. It will be a synchronous operation, across the replica volume.
2. Reconciliation will take care of nodes who have missed out the locks.
3. On a client disconnect, there will be a lock-timeout on whose expiration
all locks held by that particular client will be released.

Regards,
Avra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel