Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-28 Thread Raghavendra Gowdappa
On Thu, Mar 28, 2019 at 2:37 PM Xavi Hernandez  wrote:

> On Thu, Mar 28, 2019 at 3:05 AM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Wed, Mar 27, 2019 at 8:38 PM Xavi Hernandez 
>> wrote:
>>
>>> On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri <
>>> pkara...@redhat.com> wrote:
>>>


 On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez 
 wrote:

> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez 
>> wrote:
>>
>>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>


 On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez <
 jaher...@redhat.com> wrote:

> Hi Raghavendra,
>
> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
> rgowd...@redhat.com> wrote:
>
>> All,
>>
>> Glusterfs cleans up POSIX locks held on an fd when the
>> client/mount through which those locks are held disconnects from
>> bricks/server. This helps Glusterfs to not run into a stale lock 
>> problem
>> later (For eg., if application unlocks while the connection was still
>> down). However, this means the lock is no longer exclusive as other
>> applications/clients can acquire the same lock. To communicate that 
>> locks
>> are no longer valid, we are planning to mark the fd (which has POSIX 
>> locks)
>> bad on a disconnect so that any future operations on that fd will 
>> fail,
>> forcing the application to re-open the fd and re-acquire locks it 
>> needs [1].
>>
>
> Wouldn't it be better to retake the locks when the brick is
> reconnected if the lock is still in use ?
>

 There is also  a possibility that clients may never reconnect.
 That's the primary reason why bricks assume the worst (client will not
 reconnect) and cleanup the locks.

>>>
>>> True, so it's fine to cleanup the locks. I'm not saying that locks
>>> shouldn't be released on disconnect. The assumption is that if the 
>>> client
>>> has really died, it will also disconnect from other bricks, who will
>>> release the locks. So, eventually, another client will have enough 
>>> quorum
>>> to attempt a lock that will succeed. In other words, if a client gets
>>> disconnected from too many bricks simultaneously (loses Quorum), then 
>>> that
>>> client can be considered as bad and can return errors to the 
>>> application.
>>> This should also cause to release the locks on the remaining connected
>>> bricks.
>>>
>>> On the other hand, if the disconnection is very short and the client
>>> has not died, it will keep enough locked files (it has quorum) to avoid
>>> other clients to successfully acquire a lock. In this case, if the 
>>> brick is
>>> reconnected, all existing locks should be reacquired to recover the
>>> original state before the disconnection.
>>>
>>>

> BTW, the referenced bug is not public. Should we open another bug
> to track this ?
>

 I've just opened up the comment to give enough context. I'll open a
 bug upstream too.


>
>
>>
>> Note that with AFR/replicate in picture we can prevent errors to
>> application as long as Quorum number of children "never ever" lost
>> connection with bricks after locks have been acquired. I am using 
>> the term
>> "never ever" as locks are not healed back after re-connection and 
>> hence
>> first disconnect would've marked the fd bad and the fd remains so 
>> even
>> after re-connection happens. So, its not just Quorum number of 
>> children
>> "currently online", but Quorum number of children "never having
>> disconnected with bricks after locks are acquired".
>>
>
> I think this requisite is not feasible. In a distributed file
> system, sooner or later all bricks will be disconnected. It could be
> because of failures or because an upgrade is done, but it will happen.
>
> The difference here is how long are fd's kept open. If
> applications open and close files frequently enough (i.e. the fd is 
> not
> kept open more time than it takes to have more than Quorum bricks
> disconnected) then there's no problem. The problem can only appear on
> applications that open files for a long time and also use posix 
> locks. In
> this case, the only good solution I see is to retake the locks on 
> brick
> reconnection.
>

 A

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-28 Thread Xavi Hernandez
On Thu, Mar 28, 2019 at 3:05 AM Raghavendra Gowdappa 
wrote:

>
>
> On Wed, Mar 27, 2019 at 8:38 PM Xavi Hernandez 
> wrote:
>
>> On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri <
>> pkara...@redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez 
>>> wrote:
>>>
 On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri <
 pkara...@redhat.com> wrote:

>
>
> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez 
> wrote:
>
>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
>>> wrote:
>>>
 Hi Raghavendra,

 On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
 rgowd...@redhat.com> wrote:

> All,
>
> Glusterfs cleans up POSIX locks held on an fd when the
> client/mount through which those locks are held disconnects from
> bricks/server. This helps Glusterfs to not run into a stale lock 
> problem
> later (For eg., if application unlocks while the connection was still
> down). However, this means the lock is no longer exclusive as other
> applications/clients can acquire the same lock. To communicate that 
> locks
> are no longer valid, we are planning to mark the fd (which has POSIX 
> locks)
> bad on a disconnect so that any future operations on that fd will 
> fail,
> forcing the application to re-open the fd and re-acquire locks it 
> needs [1].
>

 Wouldn't it be better to retake the locks when the brick is
 reconnected if the lock is still in use ?

>>>
>>> There is also  a possibility that clients may never reconnect.
>>> That's the primary reason why bricks assume the worst (client will not
>>> reconnect) and cleanup the locks.
>>>
>>
>> True, so it's fine to cleanup the locks. I'm not saying that locks
>> shouldn't be released on disconnect. The assumption is that if the client
>> has really died, it will also disconnect from other bricks, who will
>> release the locks. So, eventually, another client will have enough quorum
>> to attempt a lock that will succeed. In other words, if a client gets
>> disconnected from too many bricks simultaneously (loses Quorum), then 
>> that
>> client can be considered as bad and can return errors to the application.
>> This should also cause to release the locks on the remaining connected
>> bricks.
>>
>> On the other hand, if the disconnection is very short and the client
>> has not died, it will keep enough locked files (it has quorum) to avoid
>> other clients to successfully acquire a lock. In this case, if the brick 
>> is
>> reconnected, all existing locks should be reacquired to recover the
>> original state before the disconnection.
>>
>>
>>>
 BTW, the referenced bug is not public. Should we open another bug
 to track this ?

>>>
>>> I've just opened up the comment to give enough context. I'll open a
>>> bug upstream too.
>>>
>>>


>
> Note that with AFR/replicate in picture we can prevent errors to
> application as long as Quorum number of children "never ever" lost
> connection with bricks after locks have been acquired. I am using the 
> term
> "never ever" as locks are not healed back after re-connection and 
> hence
> first disconnect would've marked the fd bad and the fd remains so even
> after re-connection happens. So, its not just Quorum number of 
> children
> "currently online", but Quorum number of children "never having
> disconnected with bricks after locks are acquired".
>

 I think this requisite is not feasible. In a distributed file
 system, sooner or later all bricks will be disconnected. It could be
 because of failures or because an upgrade is done, but it will happen.

 The difference here is how long are fd's kept open. If applications
 open and close files frequently enough (i.e. the fd is not kept open 
 more
 time than it takes to have more than Quorum bricks disconnected) then
 there's no problem. The problem can only appear on applications that 
 open
 files for a long time and also use posix locks. In this case, the only 
 good
 solution I see is to retake the locks on brick reconnection.

>>>
>>> Agree. But lock-healing should be done only by HA layers like AFR/EC
>>> as only they know whether there are enough online bricks to have 
>>> prevented
>>> any conflicting lock. Protocol/client itself doesn't have enough
>>> information 

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Raghavendra Gowdappa
On Wed, Mar 27, 2019 at 8:38 PM Xavi Hernandez  wrote:

> On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez 
>> wrote:
>>
>>> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri <
>>> pkara...@redhat.com> wrote:
>>>


 On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez 
 wrote:

> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa <
> rgowd...@redhat.com> wrote:
>
>>
>>
>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
>> wrote:
>>
>>> Hi Raghavendra,
>>>
>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>
 All,

 Glusterfs cleans up POSIX locks held on an fd when the client/mount
 through which those locks are held disconnects from bricks/server. This
 helps Glusterfs to not run into a stale lock problem later (For eg., if
 application unlocks while the connection was still down). However, this
 means the lock is no longer exclusive as other applications/clients can
 acquire the same lock. To communicate that locks are no longer valid, 
 we
 are planning to mark the fd (which has POSIX locks) bad on a 
 disconnect so
 that any future operations on that fd will fail, forcing the 
 application to
 re-open the fd and re-acquire locks it needs [1].

>>>
>>> Wouldn't it be better to retake the locks when the brick is
>>> reconnected if the lock is still in use ?
>>>
>>
>> There is also  a possibility that clients may never reconnect. That's
>> the primary reason why bricks assume the worst (client will not 
>> reconnect)
>> and cleanup the locks.
>>
>
> True, so it's fine to cleanup the locks. I'm not saying that locks
> shouldn't be released on disconnect. The assumption is that if the client
> has really died, it will also disconnect from other bricks, who will
> release the locks. So, eventually, another client will have enough quorum
> to attempt a lock that will succeed. In other words, if a client gets
> disconnected from too many bricks simultaneously (loses Quorum), then that
> client can be considered as bad and can return errors to the application.
> This should also cause to release the locks on the remaining connected
> bricks.
>
> On the other hand, if the disconnection is very short and the client
> has not died, it will keep enough locked files (it has quorum) to avoid
> other clients to successfully acquire a lock. In this case, if the brick 
> is
> reconnected, all existing locks should be reacquired to recover the
> original state before the disconnection.
>
>
>>
>>> BTW, the referenced bug is not public. Should we open another bug to
>>> track this ?
>>>
>>
>> I've just opened up the comment to give enough context. I'll open a
>> bug upstream too.
>>
>>
>>>
>>>

 Note that with AFR/replicate in picture we can prevent errors to
 application as long as Quorum number of children "never ever" lost
 connection with bricks after locks have been acquired. I am using the 
 term
 "never ever" as locks are not healed back after re-connection and hence
 first disconnect would've marked the fd bad and the fd remains so even
 after re-connection happens. So, its not just Quorum number of children
 "currently online", but Quorum number of children "never having
 disconnected with bricks after locks are acquired".

>>>
>>> I think this requisite is not feasible. In a distributed file
>>> system, sooner or later all bricks will be disconnected. It could be
>>> because of failures or because an upgrade is done, but it will happen.
>>>
>>> The difference here is how long are fd's kept open. If applications
>>> open and close files frequently enough (i.e. the fd is not kept open 
>>> more
>>> time than it takes to have more than Quorum bricks disconnected) then
>>> there's no problem. The problem can only appear on applications that 
>>> open
>>> files for a long time and also use posix locks. In this case, the only 
>>> good
>>> solution I see is to retake the locks on brick reconnection.
>>>
>>
>> Agree. But lock-healing should be done only by HA layers like AFR/EC
>> as only they know whether there are enough online bricks to have 
>> prevented
>> any conflicting lock. Protocol/client itself doesn't have enough
>> information to do that. If its a plain distribute, I don't see a way to
>> heal locks without loosing the property of exclusivity of locks.
>>
>
> Lock-healing of locks acquired while a brick was disconnected need to
> be 

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Xavi Hernandez
On Wed, 27 Mar 2019, 18:26 Pranith Kumar Karampuri, 
wrote:

>
>
> On Wed, Mar 27, 2019 at 8:38 PM Xavi Hernandez 
> wrote:
>
>> On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri <
>> pkara...@redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez 
>>> wrote:
>>>
 On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri <
 pkara...@redhat.com> wrote:

>
>
> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez 
> wrote:
>
>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
>>> wrote:
>>>
 Hi Raghavendra,

 On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
 rgowd...@redhat.com> wrote:

> All,
>
> Glusterfs cleans up POSIX locks held on an fd when the
> client/mount through which those locks are held disconnects from
> bricks/server. This helps Glusterfs to not run into a stale lock 
> problem
> later (For eg., if application unlocks while the connection was still
> down). However, this means the lock is no longer exclusive as other
> applications/clients can acquire the same lock. To communicate that 
> locks
> are no longer valid, we are planning to mark the fd (which has POSIX 
> locks)
> bad on a disconnect so that any future operations on that fd will 
> fail,
> forcing the application to re-open the fd and re-acquire locks it 
> needs [1].
>

 Wouldn't it be better to retake the locks when the brick is
 reconnected if the lock is still in use ?

>>>
>>> There is also  a possibility that clients may never reconnect.
>>> That's the primary reason why bricks assume the worst (client will not
>>> reconnect) and cleanup the locks.
>>>
>>
>> True, so it's fine to cleanup the locks. I'm not saying that locks
>> shouldn't be released on disconnect. The assumption is that if the client
>> has really died, it will also disconnect from other bricks, who will
>> release the locks. So, eventually, another client will have enough quorum
>> to attempt a lock that will succeed. In other words, if a client gets
>> disconnected from too many bricks simultaneously (loses Quorum), then 
>> that
>> client can be considered as bad and can return errors to the application.
>> This should also cause to release the locks on the remaining connected
>> bricks.
>>
>> On the other hand, if the disconnection is very short and the client
>> has not died, it will keep enough locked files (it has quorum) to avoid
>> other clients to successfully acquire a lock. In this case, if the brick 
>> is
>> reconnected, all existing locks should be reacquired to recover the
>> original state before the disconnection.
>>
>>
>>>
 BTW, the referenced bug is not public. Should we open another bug
 to track this ?

>>>
>>> I've just opened up the comment to give enough context. I'll open a
>>> bug upstream too.
>>>
>>>


>
> Note that with AFR/replicate in picture we can prevent errors to
> application as long as Quorum number of children "never ever" lost
> connection with bricks after locks have been acquired. I am using the 
> term
> "never ever" as locks are not healed back after re-connection and 
> hence
> first disconnect would've marked the fd bad and the fd remains so even
> after re-connection happens. So, its not just Quorum number of 
> children
> "currently online", but Quorum number of children "never having
> disconnected with bricks after locks are acquired".
>

 I think this requisite is not feasible. In a distributed file
 system, sooner or later all bricks will be disconnected. It could be
 because of failures or because an upgrade is done, but it will happen.

 The difference here is how long are fd's kept open. If applications
 open and close files frequently enough (i.e. the fd is not kept open 
 more
 time than it takes to have more than Quorum bricks disconnected) then
 there's no problem. The problem can only appear on applications that 
 open
 files for a long time and also use posix locks. In this case, the only 
 good
 solution I see is to retake the locks on brick reconnection.

>>>
>>> Agree. But lock-healing should be done only by HA layers like AFR/EC
>>> as only they know whether there are enough online bricks to have 
>>> prevented
>>> any conflicting lock. Protocol/client itself doesn't have enough
>>> information t

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Pranith Kumar Karampuri
On Wed, Mar 27, 2019 at 8:38 PM Xavi Hernandez  wrote:

> On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez 
>> wrote:
>>
>>> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri <
>>> pkara...@redhat.com> wrote:
>>>


 On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez 
 wrote:

> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa <
> rgowd...@redhat.com> wrote:
>
>>
>>
>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
>> wrote:
>>
>>> Hi Raghavendra,
>>>
>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>
 All,

 Glusterfs cleans up POSIX locks held on an fd when the client/mount
 through which those locks are held disconnects from bricks/server. This
 helps Glusterfs to not run into a stale lock problem later (For eg., if
 application unlocks while the connection was still down). However, this
 means the lock is no longer exclusive as other applications/clients can
 acquire the same lock. To communicate that locks are no longer valid, 
 we
 are planning to mark the fd (which has POSIX locks) bad on a 
 disconnect so
 that any future operations on that fd will fail, forcing the 
 application to
 re-open the fd and re-acquire locks it needs [1].

>>>
>>> Wouldn't it be better to retake the locks when the brick is
>>> reconnected if the lock is still in use ?
>>>
>>
>> There is also  a possibility that clients may never reconnect. That's
>> the primary reason why bricks assume the worst (client will not 
>> reconnect)
>> and cleanup the locks.
>>
>
> True, so it's fine to cleanup the locks. I'm not saying that locks
> shouldn't be released on disconnect. The assumption is that if the client
> has really died, it will also disconnect from other bricks, who will
> release the locks. So, eventually, another client will have enough quorum
> to attempt a lock that will succeed. In other words, if a client gets
> disconnected from too many bricks simultaneously (loses Quorum), then that
> client can be considered as bad and can return errors to the application.
> This should also cause to release the locks on the remaining connected
> bricks.
>
> On the other hand, if the disconnection is very short and the client
> has not died, it will keep enough locked files (it has quorum) to avoid
> other clients to successfully acquire a lock. In this case, if the brick 
> is
> reconnected, all existing locks should be reacquired to recover the
> original state before the disconnection.
>
>
>>
>>> BTW, the referenced bug is not public. Should we open another bug to
>>> track this ?
>>>
>>
>> I've just opened up the comment to give enough context. I'll open a
>> bug upstream too.
>>
>>
>>>
>>>

 Note that with AFR/replicate in picture we can prevent errors to
 application as long as Quorum number of children "never ever" lost
 connection with bricks after locks have been acquired. I am using the 
 term
 "never ever" as locks are not healed back after re-connection and hence
 first disconnect would've marked the fd bad and the fd remains so even
 after re-connection happens. So, its not just Quorum number of children
 "currently online", but Quorum number of children "never having
 disconnected with bricks after locks are acquired".

>>>
>>> I think this requisite is not feasible. In a distributed file
>>> system, sooner or later all bricks will be disconnected. It could be
>>> because of failures or because an upgrade is done, but it will happen.
>>>
>>> The difference here is how long are fd's kept open. If applications
>>> open and close files frequently enough (i.e. the fd is not kept open 
>>> more
>>> time than it takes to have more than Quorum bricks disconnected) then
>>> there's no problem. The problem can only appear on applications that 
>>> open
>>> files for a long time and also use posix locks. In this case, the only 
>>> good
>>> solution I see is to retake the locks on brick reconnection.
>>>
>>
>> Agree. But lock-healing should be done only by HA layers like AFR/EC
>> as only they know whether there are enough online bricks to have 
>> prevented
>> any conflicting lock. Protocol/client itself doesn't have enough
>> information to do that. If its a plain distribute, I don't see a way to
>> heal locks without loosing the property of exclusivity of locks.
>>
>
> Lock-healing of locks acquired while a brick was disconnected need to
> be 

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Xavi Hernandez
On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri 
wrote:

>
>
> On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez 
> wrote:
>
>> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri <
>> pkara...@redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez 
>>> wrote:
>>>
 On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa <
 rgowd...@redhat.com> wrote:

>
>
> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
> wrote:
>
>> Hi Raghavendra,
>>
>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>> All,
>>>
>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount
>>> through which those locks are held disconnects from bricks/server. This
>>> helps Glusterfs to not run into a stale lock problem later (For eg., if
>>> application unlocks while the connection was still down). However, this
>>> means the lock is no longer exclusive as other applications/clients can
>>> acquire the same lock. To communicate that locks are no longer valid, we
>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect 
>>> so
>>> that any future operations on that fd will fail, forcing the 
>>> application to
>>> re-open the fd and re-acquire locks it needs [1].
>>>
>>
>> Wouldn't it be better to retake the locks when the brick is
>> reconnected if the lock is still in use ?
>>
>
> There is also  a possibility that clients may never reconnect. That's
> the primary reason why bricks assume the worst (client will not reconnect)
> and cleanup the locks.
>

 True, so it's fine to cleanup the locks. I'm not saying that locks
 shouldn't be released on disconnect. The assumption is that if the client
 has really died, it will also disconnect from other bricks, who will
 release the locks. So, eventually, another client will have enough quorum
 to attempt a lock that will succeed. In other words, if a client gets
 disconnected from too many bricks simultaneously (loses Quorum), then that
 client can be considered as bad and can return errors to the application.
 This should also cause to release the locks on the remaining connected
 bricks.

 On the other hand, if the disconnection is very short and the client
 has not died, it will keep enough locked files (it has quorum) to avoid
 other clients to successfully acquire a lock. In this case, if the brick is
 reconnected, all existing locks should be reacquired to recover the
 original state before the disconnection.


>
>> BTW, the referenced bug is not public. Should we open another bug to
>> track this ?
>>
>
> I've just opened up the comment to give enough context. I'll open a
> bug upstream too.
>
>
>>
>>
>>>
>>> Note that with AFR/replicate in picture we can prevent errors to
>>> application as long as Quorum number of children "never ever" lost
>>> connection with bricks after locks have been acquired. I am using the 
>>> term
>>> "never ever" as locks are not healed back after re-connection and hence
>>> first disconnect would've marked the fd bad and the fd remains so even
>>> after re-connection happens. So, its not just Quorum number of children
>>> "currently online", but Quorum number of children "never having
>>> disconnected with bricks after locks are acquired".
>>>
>>
>> I think this requisite is not feasible. In a distributed file system,
>> sooner or later all bricks will be disconnected. It could be because of
>> failures or because an upgrade is done, but it will happen.
>>
>> The difference here is how long are fd's kept open. If applications
>> open and close files frequently enough (i.e. the fd is not kept open more
>> time than it takes to have more than Quorum bricks disconnected) then
>> there's no problem. The problem can only appear on applications that open
>> files for a long time and also use posix locks. In this case, the only 
>> good
>> solution I see is to retake the locks on brick reconnection.
>>
>
> Agree. But lock-healing should be done only by HA layers like AFR/EC
> as only they know whether there are enough online bricks to have prevented
> any conflicting lock. Protocol/client itself doesn't have enough
> information to do that. If its a plain distribute, I don't see a way to
> heal locks without loosing the property of exclusivity of locks.
>

 Lock-healing of locks acquired while a brick was disconnected need to
 be handled by AFR/EC. However, locks already present at the moment of
 disconnection could be recovered by client xlator itself as long as the
 file has not been closed (which client xlator already knows).

>>>
>>> What if another client

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Pranith Kumar Karampuri
On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez  wrote:

> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez 
>> wrote:
>>
>>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>


 On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
 wrote:

> Hi Raghavendra,
>
> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
> rgowd...@redhat.com> wrote:
>
>> All,
>>
>> Glusterfs cleans up POSIX locks held on an fd when the client/mount
>> through which those locks are held disconnects from bricks/server. This
>> helps Glusterfs to not run into a stale lock problem later (For eg., if
>> application unlocks while the connection was still down). However, this
>> means the lock is no longer exclusive as other applications/clients can
>> acquire the same lock. To communicate that locks are no longer valid, we
>> are planning to mark the fd (which has POSIX locks) bad on a disconnect 
>> so
>> that any future operations on that fd will fail, forcing the application 
>> to
>> re-open the fd and re-acquire locks it needs [1].
>>
>
> Wouldn't it be better to retake the locks when the brick is
> reconnected if the lock is still in use ?
>

 There is also  a possibility that clients may never reconnect. That's
 the primary reason why bricks assume the worst (client will not reconnect)
 and cleanup the locks.

>>>
>>> True, so it's fine to cleanup the locks. I'm not saying that locks
>>> shouldn't be released on disconnect. The assumption is that if the client
>>> has really died, it will also disconnect from other bricks, who will
>>> release the locks. So, eventually, another client will have enough quorum
>>> to attempt a lock that will succeed. In other words, if a client gets
>>> disconnected from too many bricks simultaneously (loses Quorum), then that
>>> client can be considered as bad and can return errors to the application.
>>> This should also cause to release the locks on the remaining connected
>>> bricks.
>>>
>>> On the other hand, if the disconnection is very short and the client has
>>> not died, it will keep enough locked files (it has quorum) to avoid other
>>> clients to successfully acquire a lock. In this case, if the brick is
>>> reconnected, all existing locks should be reacquired to recover the
>>> original state before the disconnection.
>>>
>>>

> BTW, the referenced bug is not public. Should we open another bug to
> track this ?
>

 I've just opened up the comment to give enough context. I'll open a bug
 upstream too.


>
>
>>
>> Note that with AFR/replicate in picture we can prevent errors to
>> application as long as Quorum number of children "never ever" lost
>> connection with bricks after locks have been acquired. I am using the 
>> term
>> "never ever" as locks are not healed back after re-connection and hence
>> first disconnect would've marked the fd bad and the fd remains so even
>> after re-connection happens. So, its not just Quorum number of children
>> "currently online", but Quorum number of children "never having
>> disconnected with bricks after locks are acquired".
>>
>
> I think this requisite is not feasible. In a distributed file system,
> sooner or later all bricks will be disconnected. It could be because of
> failures or because an upgrade is done, but it will happen.
>
> The difference here is how long are fd's kept open. If applications
> open and close files frequently enough (i.e. the fd is not kept open more
> time than it takes to have more than Quorum bricks disconnected) then
> there's no problem. The problem can only appear on applications that open
> files for a long time and also use posix locks. In this case, the only 
> good
> solution I see is to retake the locks on brick reconnection.
>

 Agree. But lock-healing should be done only by HA layers like AFR/EC as
 only they know whether there are enough online bricks to have prevented any
 conflicting lock. Protocol/client itself doesn't have enough information to
 do that. If its a plain distribute, I don't see a way to heal locks without
 loosing the property of exclusivity of locks.

>>>
>>> Lock-healing of locks acquired while a brick was disconnected need to be
>>> handled by AFR/EC. However, locks already present at the moment of
>>> disconnection could be recovered by client xlator itself as long as the
>>> file has not been closed (which client xlator already knows).
>>>
>>
>> What if another client (say mount-2) took locks at the time of disconnect
>> from mount-1 and modified the file and unlocked? client xlator doing the
>> heal may not be a good idea.
>>
>
> To avoid that we sh

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Xavi Hernandez
On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri 
wrote:

>
>
> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez 
> wrote:
>
>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
>>> wrote:
>>>
 Hi Raghavendra,

 On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
 rgowd...@redhat.com> wrote:

> All,
>
> Glusterfs cleans up POSIX locks held on an fd when the client/mount
> through which those locks are held disconnects from bricks/server. This
> helps Glusterfs to not run into a stale lock problem later (For eg., if
> application unlocks while the connection was still down). However, this
> means the lock is no longer exclusive as other applications/clients can
> acquire the same lock. To communicate that locks are no longer valid, we
> are planning to mark the fd (which has POSIX locks) bad on a disconnect so
> that any future operations on that fd will fail, forcing the application 
> to
> re-open the fd and re-acquire locks it needs [1].
>

 Wouldn't it be better to retake the locks when the brick is reconnected
 if the lock is still in use ?

>>>
>>> There is also  a possibility that clients may never reconnect. That's
>>> the primary reason why bricks assume the worst (client will not reconnect)
>>> and cleanup the locks.
>>>
>>
>> True, so it's fine to cleanup the locks. I'm not saying that locks
>> shouldn't be released on disconnect. The assumption is that if the client
>> has really died, it will also disconnect from other bricks, who will
>> release the locks. So, eventually, another client will have enough quorum
>> to attempt a lock that will succeed. In other words, if a client gets
>> disconnected from too many bricks simultaneously (loses Quorum), then that
>> client can be considered as bad and can return errors to the application.
>> This should also cause to release the locks on the remaining connected
>> bricks.
>>
>> On the other hand, if the disconnection is very short and the client has
>> not died, it will keep enough locked files (it has quorum) to avoid other
>> clients to successfully acquire a lock. In this case, if the brick is
>> reconnected, all existing locks should be reacquired to recover the
>> original state before the disconnection.
>>
>>
>>>
 BTW, the referenced bug is not public. Should we open another bug to
 track this ?

>>>
>>> I've just opened up the comment to give enough context. I'll open a bug
>>> upstream too.
>>>
>>>


>
> Note that with AFR/replicate in picture we can prevent errors to
> application as long as Quorum number of children "never ever" lost
> connection with bricks after locks have been acquired. I am using the term
> "never ever" as locks are not healed back after re-connection and hence
> first disconnect would've marked the fd bad and the fd remains so even
> after re-connection happens. So, its not just Quorum number of children
> "currently online", but Quorum number of children "never having
> disconnected with bricks after locks are acquired".
>

 I think this requisite is not feasible. In a distributed file system,
 sooner or later all bricks will be disconnected. It could be because of
 failures or because an upgrade is done, but it will happen.

 The difference here is how long are fd's kept open. If applications
 open and close files frequently enough (i.e. the fd is not kept open more
 time than it takes to have more than Quorum bricks disconnected) then
 there's no problem. The problem can only appear on applications that open
 files for a long time and also use posix locks. In this case, the only good
 solution I see is to retake the locks on brick reconnection.

>>>
>>> Agree. But lock-healing should be done only by HA layers like AFR/EC as
>>> only they know whether there are enough online bricks to have prevented any
>>> conflicting lock. Protocol/client itself doesn't have enough information to
>>> do that. If its a plain distribute, I don't see a way to heal locks without
>>> loosing the property of exclusivity of locks.
>>>
>>
>> Lock-healing of locks acquired while a brick was disconnected need to be
>> handled by AFR/EC. However, locks already present at the moment of
>> disconnection could be recovered by client xlator itself as long as the
>> file has not been closed (which client xlator already knows).
>>
>
> What if another client (say mount-2) took locks at the time of disconnect
> from mount-1 and modified the file and unlocked? client xlator doing the
> heal may not be a good idea.
>

To avoid that we should ensure that any lock/unlocks are sent to the
client, even if we know it's disconnected, so that client xlator can track
them. The alternative is to duplicate and maintain code both on AFR and EC
(and not sure if e

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Pranith Kumar Karampuri
On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez  wrote:

> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
>> wrote:
>>
>>> Hi Raghavendra,
>>>
>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>
 All,

 Glusterfs cleans up POSIX locks held on an fd when the client/mount
 through which those locks are held disconnects from bricks/server. This
 helps Glusterfs to not run into a stale lock problem later (For eg., if
 application unlocks while the connection was still down). However, this
 means the lock is no longer exclusive as other applications/clients can
 acquire the same lock. To communicate that locks are no longer valid, we
 are planning to mark the fd (which has POSIX locks) bad on a disconnect so
 that any future operations on that fd will fail, forcing the application to
 re-open the fd and re-acquire locks it needs [1].

>>>
>>> Wouldn't it be better to retake the locks when the brick is reconnected
>>> if the lock is still in use ?
>>>
>>
>> There is also  a possibility that clients may never reconnect. That's the
>> primary reason why bricks assume the worst (client will not reconnect) and
>> cleanup the locks.
>>
>
> True, so it's fine to cleanup the locks. I'm not saying that locks
> shouldn't be released on disconnect. The assumption is that if the client
> has really died, it will also disconnect from other bricks, who will
> release the locks. So, eventually, another client will have enough quorum
> to attempt a lock that will succeed. In other words, if a client gets
> disconnected from too many bricks simultaneously (loses Quorum), then that
> client can be considered as bad and can return errors to the application.
> This should also cause to release the locks on the remaining connected
> bricks.
>
> On the other hand, if the disconnection is very short and the client has
> not died, it will keep enough locked files (it has quorum) to avoid other
> clients to successfully acquire a lock. In this case, if the brick is
> reconnected, all existing locks should be reacquired to recover the
> original state before the disconnection.
>
>
>>
>>> BTW, the referenced bug is not public. Should we open another bug to
>>> track this ?
>>>
>>
>> I've just opened up the comment to give enough context. I'll open a bug
>> upstream too.
>>
>>
>>>
>>>

 Note that with AFR/replicate in picture we can prevent errors to
 application as long as Quorum number of children "never ever" lost
 connection with bricks after locks have been acquired. I am using the term
 "never ever" as locks are not healed back after re-connection and hence
 first disconnect would've marked the fd bad and the fd remains so even
 after re-connection happens. So, its not just Quorum number of children
 "currently online", but Quorum number of children "never having
 disconnected with bricks after locks are acquired".

>>>
>>> I think this requisite is not feasible. In a distributed file system,
>>> sooner or later all bricks will be disconnected. It could be because of
>>> failures or because an upgrade is done, but it will happen.
>>>
>>> The difference here is how long are fd's kept open. If applications open
>>> and close files frequently enough (i.e. the fd is not kept open more time
>>> than it takes to have more than Quorum bricks disconnected) then there's no
>>> problem. The problem can only appear on applications that open files for a
>>> long time and also use posix locks. In this case, the only good solution I
>>> see is to retake the locks on brick reconnection.
>>>
>>
>> Agree. But lock-healing should be done only by HA layers like AFR/EC as
>> only they know whether there are enough online bricks to have prevented any
>> conflicting lock. Protocol/client itself doesn't have enough information to
>> do that. If its a plain distribute, I don't see a way to heal locks without
>> loosing the property of exclusivity of locks.
>>
>
> Lock-healing of locks acquired while a brick was disconnected need to be
> handled by AFR/EC. However, locks already present at the moment of
> disconnection could be recovered by client xlator itself as long as the
> file has not been closed (which client xlator already knows).
>

What if another client (say mount-2) took locks at the time of disconnect
from mount-1 and modified the file and unlocked? client xlator doing the
heal may not be a good idea.


>
> Xavi
>
>
>> What I proposed is a short term solution. mid to long term solution
>> should be lock healing feature implemented in AFR/EC. In fact I had this
>> conversation with +Karampuri, Pranith  before
>> posting this msg to ML.
>>
>>
>>>
 However, this use case is not affected if the application don't acquire
 any POSIX locks. So, I am interested in knowing
 * whether your use cases use POSIX locks?
 * Is it feasible f

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Xavi Hernandez
On Wed, Mar 27, 2019 at 11:54 AM Raghavendra Gowdappa 
wrote:

>
>
> On Wed, Mar 27, 2019 at 4:22 PM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
>> wrote:
>>
>>> Hi Raghavendra,
>>>
>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>
 All,

 Glusterfs cleans up POSIX locks held on an fd when the client/mount
 through which those locks are held disconnects from bricks/server. This
 helps Glusterfs to not run into a stale lock problem later (For eg., if
 application unlocks while the connection was still down). However, this
 means the lock is no longer exclusive as other applications/clients can
 acquire the same lock. To communicate that locks are no longer valid, we
 are planning to mark the fd (which has POSIX locks) bad on a disconnect so
 that any future operations on that fd will fail, forcing the application to
 re-open the fd and re-acquire locks it needs [1].

>>>
>>> Wouldn't it be better to retake the locks when the brick is reconnected
>>> if the lock is still in use ?
>>>
>>
>> There is also  a possibility that clients may never reconnect. That's the
>> primary reason why bricks assume the worst (client will not reconnect) and
>> cleanup the locks.
>>
>>
>>> BTW, the referenced bug is not public. Should we open another bug to
>>> track this ?
>>>
>>
>> I've just opened up the comment to give enough context. I'll open a bug
>> upstream too.
>>
>>
>>>
>>>

 Note that with AFR/replicate in picture we can prevent errors to
 application as long as Quorum number of children "never ever" lost
 connection with bricks after locks have been acquired. I am using the term
 "never ever" as locks are not healed back after re-connection and hence
 first disconnect would've marked the fd bad and the fd remains so even
 after re-connection happens. So, its not just Quorum number of children
 "currently online", but Quorum number of children "never having
 disconnected with bricks after locks are acquired".

>>>
>>> I think this requisite is not feasible. In a distributed file system,
>>> sooner or later all bricks will be disconnected. It could be because of
>>> failures or because an upgrade is done, but it will happen.
>>>
>>> The difference here is how long are fd's kept open. If applications open
>>> and close files frequently enough (i.e. the fd is not kept open more time
>>> than it takes to have more than Quorum bricks disconnected) then there's no
>>> problem. The problem can only appear on applications that open files for a
>>> long time and also use posix locks. In this case, the only good solution I
>>> see is to retake the locks on brick reconnection.
>>>
>>
>> Agree. But lock-healing should be done only by HA layers like AFR/EC as
>> only they know whether there are enough online bricks to have prevented any
>> conflicting lock. Protocol/client itself doesn't have enough information to
>> do that. If its a plain distribute, I don't see a way to heal locks without
>> loosing the property of exclusivity of locks.
>>
>> What I proposed is a short term solution. mid to long term solution
>> should be lock healing feature implemented in AFR/EC. In fact I had this
>> conversation with +Karampuri, Pranith  before
>> posting this msg to ML.
>>
>>
>>>
 However, this use case is not affected if the application don't acquire
 any POSIX locks. So, I am interested in knowing
 * whether your use cases use POSIX locks?
 * Is it feasible for your application to re-open fds and re-acquire
 locks on seeing EBADFD errors?

>>>
>>> I think that many applications are not prepared to handle that.
>>>
>>
>> I too suspected that and in fact not too happy with the solution. But
>> went ahead with this mail as I heard implementing lock-heal  in AFR will
>> take time and hence there are no alternative short term solutions.
>>
>
> Also failing loudly is preferred to silently dropping locks.
>

Yes. Silently dropping locks can cause corruption, which is worse. However
causing application failures doesn't improve user experience either.

Unfortunately I'm not aware of any other short term solution right now.


>
>>
>>
>>> Xavi
>>>
>>>

 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7

 regards,
 Raghavendra

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Xavi Hernandez
On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa 
wrote:

>
>
> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
> wrote:
>
>> Hi Raghavendra,
>>
>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa 
>> wrote:
>>
>>> All,
>>>
>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount
>>> through which those locks are held disconnects from bricks/server. This
>>> helps Glusterfs to not run into a stale lock problem later (For eg., if
>>> application unlocks while the connection was still down). However, this
>>> means the lock is no longer exclusive as other applications/clients can
>>> acquire the same lock. To communicate that locks are no longer valid, we
>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so
>>> that any future operations on that fd will fail, forcing the application to
>>> re-open the fd and re-acquire locks it needs [1].
>>>
>>
>> Wouldn't it be better to retake the locks when the brick is reconnected
>> if the lock is still in use ?
>>
>
> There is also  a possibility that clients may never reconnect. That's the
> primary reason why bricks assume the worst (client will not reconnect) and
> cleanup the locks.
>

True, so it's fine to cleanup the locks. I'm not saying that locks
shouldn't be released on disconnect. The assumption is that if the client
has really died, it will also disconnect from other bricks, who will
release the locks. So, eventually, another client will have enough quorum
to attempt a lock that will succeed. In other words, if a client gets
disconnected from too many bricks simultaneously (loses Quorum), then that
client can be considered as bad and can return errors to the application.
This should also cause to release the locks on the remaining connected
bricks.

On the other hand, if the disconnection is very short and the client has
not died, it will keep enough locked files (it has quorum) to avoid other
clients to successfully acquire a lock. In this case, if the brick is
reconnected, all existing locks should be reacquired to recover the
original state before the disconnection.


>
>> BTW, the referenced bug is not public. Should we open another bug to
>> track this ?
>>
>
> I've just opened up the comment to give enough context. I'll open a bug
> upstream too.
>
>
>>
>>
>>>
>>> Note that with AFR/replicate in picture we can prevent errors to
>>> application as long as Quorum number of children "never ever" lost
>>> connection with bricks after locks have been acquired. I am using the term
>>> "never ever" as locks are not healed back after re-connection and hence
>>> first disconnect would've marked the fd bad and the fd remains so even
>>> after re-connection happens. So, its not just Quorum number of children
>>> "currently online", but Quorum number of children "never having
>>> disconnected with bricks after locks are acquired".
>>>
>>
>> I think this requisite is not feasible. In a distributed file system,
>> sooner or later all bricks will be disconnected. It could be because of
>> failures or because an upgrade is done, but it will happen.
>>
>> The difference here is how long are fd's kept open. If applications open
>> and close files frequently enough (i.e. the fd is not kept open more time
>> than it takes to have more than Quorum bricks disconnected) then there's no
>> problem. The problem can only appear on applications that open files for a
>> long time and also use posix locks. In this case, the only good solution I
>> see is to retake the locks on brick reconnection.
>>
>
> Agree. But lock-healing should be done only by HA layers like AFR/EC as
> only they know whether there are enough online bricks to have prevented any
> conflicting lock. Protocol/client itself doesn't have enough information to
> do that. If its a plain distribute, I don't see a way to heal locks without
> loosing the property of exclusivity of locks.
>

Lock-healing of locks acquired while a brick was disconnected need to be
handled by AFR/EC. However, locks already present at the moment of
disconnection could be recovered by client xlator itself as long as the
file has not been closed (which client xlator already knows).

Xavi


> What I proposed is a short term solution. mid to long term solution should
> be lock healing feature implemented in AFR/EC. In fact I had this
> conversation with +Karampuri, Pranith  before
> posting this msg to ML.
>
>
>>
>>> However, this use case is not affected if the application don't acquire
>>> any POSIX locks. So, I am interested in knowing
>>> * whether your use cases use POSIX locks?
>>> * Is it feasible for your application to re-open fds and re-acquire
>>> locks on seeing EBADFD errors?
>>>
>>
>> I think that many applications are not prepared to handle that.
>>
>
> I too suspected that and in fact not too happy with the solution. But went
> ahead with this mail as I heard implementing lock-heal  in AFR will take
> time and hence there are no alternative short term solutions.
>

>
>> Xavi

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Raghavendra Gowdappa
On Wed, Mar 27, 2019 at 4:22 PM Raghavendra Gowdappa 
wrote:

>
>
> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
> wrote:
>
>> Hi Raghavendra,
>>
>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa 
>> wrote:
>>
>>> All,
>>>
>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount
>>> through which those locks are held disconnects from bricks/server. This
>>> helps Glusterfs to not run into a stale lock problem later (For eg., if
>>> application unlocks while the connection was still down). However, this
>>> means the lock is no longer exclusive as other applications/clients can
>>> acquire the same lock. To communicate that locks are no longer valid, we
>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so
>>> that any future operations on that fd will fail, forcing the application to
>>> re-open the fd and re-acquire locks it needs [1].
>>>
>>
>> Wouldn't it be better to retake the locks when the brick is reconnected
>> if the lock is still in use ?
>>
>
> There is also  a possibility that clients may never reconnect. That's the
> primary reason why bricks assume the worst (client will not reconnect) and
> cleanup the locks.
>
>
>> BTW, the referenced bug is not public. Should we open another bug to
>> track this ?
>>
>
> I've just opened up the comment to give enough context. I'll open a bug
> upstream too.
>
>
>>
>>
>>>
>>> Note that with AFR/replicate in picture we can prevent errors to
>>> application as long as Quorum number of children "never ever" lost
>>> connection with bricks after locks have been acquired. I am using the term
>>> "never ever" as locks are not healed back after re-connection and hence
>>> first disconnect would've marked the fd bad and the fd remains so even
>>> after re-connection happens. So, its not just Quorum number of children
>>> "currently online", but Quorum number of children "never having
>>> disconnected with bricks after locks are acquired".
>>>
>>
>> I think this requisite is not feasible. In a distributed file system,
>> sooner or later all bricks will be disconnected. It could be because of
>> failures or because an upgrade is done, but it will happen.
>>
>> The difference here is how long are fd's kept open. If applications open
>> and close files frequently enough (i.e. the fd is not kept open more time
>> than it takes to have more than Quorum bricks disconnected) then there's no
>> problem. The problem can only appear on applications that open files for a
>> long time and also use posix locks. In this case, the only good solution I
>> see is to retake the locks on brick reconnection.
>>
>
> Agree. But lock-healing should be done only by HA layers like AFR/EC as
> only they know whether there are enough online bricks to have prevented any
> conflicting lock. Protocol/client itself doesn't have enough information to
> do that. If its a plain distribute, I don't see a way to heal locks without
> loosing the property of exclusivity of locks.
>
> What I proposed is a short term solution. mid to long term solution should
> be lock healing feature implemented in AFR/EC. In fact I had this
> conversation with +Karampuri, Pranith  before
> posting this msg to ML.
>
>
>>
>>> However, this use case is not affected if the application don't acquire
>>> any POSIX locks. So, I am interested in knowing
>>> * whether your use cases use POSIX locks?
>>> * Is it feasible for your application to re-open fds and re-acquire
>>> locks on seeing EBADFD errors?
>>>
>>
>> I think that many applications are not prepared to handle that.
>>
>
> I too suspected that and in fact not too happy with the solution. But went
> ahead with this mail as I heard implementing lock-heal  in AFR will take
> time and hence there are no alternative short term solutions.
>

Also failing loudly is preferred to silently dropping locks.


>
>
>> Xavi
>>
>>
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7
>>>
>>> regards,
>>> Raghavendra
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Raghavendra Gowdappa
On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez  wrote:

> Hi Raghavendra,
>
> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa 
> wrote:
>
>> All,
>>
>> Glusterfs cleans up POSIX locks held on an fd when the client/mount
>> through which those locks are held disconnects from bricks/server. This
>> helps Glusterfs to not run into a stale lock problem later (For eg., if
>> application unlocks while the connection was still down). However, this
>> means the lock is no longer exclusive as other applications/clients can
>> acquire the same lock. To communicate that locks are no longer valid, we
>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so
>> that any future operations on that fd will fail, forcing the application to
>> re-open the fd and re-acquire locks it needs [1].
>>
>
> Wouldn't it be better to retake the locks when the brick is reconnected if
> the lock is still in use ?
>

There is also  a possibility that clients may never reconnect. That's the
primary reason why bricks assume the worst (client will not reconnect) and
cleanup the locks.


> BTW, the referenced bug is not public. Should we open another bug to track
> this ?
>

I've just opened up the comment to give enough context. I'll open a bug
upstream too.


>
>
>>
>> Note that with AFR/replicate in picture we can prevent errors to
>> application as long as Quorum number of children "never ever" lost
>> connection with bricks after locks have been acquired. I am using the term
>> "never ever" as locks are not healed back after re-connection and hence
>> first disconnect would've marked the fd bad and the fd remains so even
>> after re-connection happens. So, its not just Quorum number of children
>> "currently online", but Quorum number of children "never having
>> disconnected with bricks after locks are acquired".
>>
>
> I think this requisite is not feasible. In a distributed file system,
> sooner or later all bricks will be disconnected. It could be because of
> failures or because an upgrade is done, but it will happen.
>
> The difference here is how long are fd's kept open. If applications open
> and close files frequently enough (i.e. the fd is not kept open more time
> than it takes to have more than Quorum bricks disconnected) then there's no
> problem. The problem can only appear on applications that open files for a
> long time and also use posix locks. In this case, the only good solution I
> see is to retake the locks on brick reconnection.
>

Agree. But lock-healing should be done only by HA layers like AFR/EC as
only they know whether there are enough online bricks to have prevented any
conflicting lock. Protocol/client itself doesn't have enough information to
do that. If its a plain distribute, I don't see a way to heal locks without
loosing the property of exclusivity of locks.

What I proposed is a short term solution. mid to long term solution should
be lock healing feature implemented in AFR/EC. In fact I had this
conversation with +Karampuri, Pranith  before posting
this msg to ML.


>
>> However, this use case is not affected if the application don't acquire
>> any POSIX locks. So, I am interested in knowing
>> * whether your use cases use POSIX locks?
>> * Is it feasible for your application to re-open fds and re-acquire locks
>> on seeing EBADFD errors?
>>
>
> I think that many applications are not prepared to handle that.
>

I too suspected that and in fact not too happy with the solution. But went
ahead with this mail as I heard implementing lock-heal  in AFR will take
time and hence there are no alternative short term solutions.


> Xavi
>
>
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7
>>
>> regards,
>> Raghavendra
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Soumya Koduri



On 3/27/19 12:55 PM, Xavi Hernandez wrote:

Hi Raghavendra,

On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


All,

Glusterfs cleans up POSIX locks held on an fd when the client/mount
through which those locks are held disconnects from bricks/server.
This helps Glusterfs to not run into a stale lock problem later (For
eg., if application unlocks while the connection was still down).
However, this means the lock is no longer exclusive as other
applications/clients can acquire the same lock. To communicate that
locks are no longer valid, we are planning to mark the fd (which has
POSIX locks) bad on a disconnect so that any future operations on
that fd will fail, forcing the application to re-open the fd and
re-acquire locks it needs [1].


Wouldn't it be better to retake the locks when the brick is reconnected 
if the lock is still in use ?


BTW, the referenced bug is not public. Should we open another bug to 
track this ?



Note that with AFR/replicate in picture we can prevent errors to
application as long as Quorum number of children "never ever" lost
connection with bricks after locks have been acquired. I am using
the term "never ever" as locks are not healed back after
re-connection and hence first disconnect would've marked the fd bad
and the fd remains so even after re-connection happens. So, its not
just Quorum number of children "currently online", but Quorum number
of children "never having disconnected with bricks after locks are
acquired".


I think this requisite is not feasible. In a distributed file system, 
sooner or later all bricks will be disconnected. It could be because of 
failures or because an upgrade is done, but it will happen.


The difference here is how long are fd's kept open. If applications open 
and close files frequently enough (i.e. the fd is not kept open more 
time than it takes to have more than Quorum bricks disconnected) then 
there's no problem. The problem can only appear on applications that 
open files for a long time and also use posix locks. In this case, the 
only good solution I see is to retake the locks on brick reconnection.



However, this use case is not affected if the application don't
acquire any POSIX locks. So, I am interested in knowing
* whether your use cases use POSIX locks?
* Is it feasible for your application to re-open fds and re-acquire
locks on seeing EBADFD errors?


I think that many applications are not prepared to handle that.


+1 to all the points mentioned by Xavi. This has been day-1 issue for 
all the applications using locks (like NFS-Ganesha and Samba). Not many 
applications re-open and re-acquire the locks. On receiving EBADFD, that 
error is most likely propagated to application clients.


Agree with Xavi that its better to heal/re-acquire the locks on brick 
reconnects before it accepts any fresh requests. I also suggest to have 
this healing mechanism generic enough (if possible) to heal any 
server-side state (like upcall, leases etc).


Thanks,
Soumya



Xavi


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7

regards,
Raghavendra

___
Gluster-users mailing list
Gluster-users@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Xavi Hernandez
Hi Raghavendra,

On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa 
wrote:

> All,
>
> Glusterfs cleans up POSIX locks held on an fd when the client/mount
> through which those locks are held disconnects from bricks/server. This
> helps Glusterfs to not run into a stale lock problem later (For eg., if
> application unlocks while the connection was still down). However, this
> means the lock is no longer exclusive as other applications/clients can
> acquire the same lock. To communicate that locks are no longer valid, we
> are planning to mark the fd (which has POSIX locks) bad on a disconnect so
> that any future operations on that fd will fail, forcing the application to
> re-open the fd and re-acquire locks it needs [1].
>

Wouldn't it be better to retake the locks when the brick is reconnected if
the lock is still in use ?

BTW, the referenced bug is not public. Should we open another bug to track
this ?


>
> Note that with AFR/replicate in picture we can prevent errors to
> application as long as Quorum number of children "never ever" lost
> connection with bricks after locks have been acquired. I am using the term
> "never ever" as locks are not healed back after re-connection and hence
> first disconnect would've marked the fd bad and the fd remains so even
> after re-connection happens. So, its not just Quorum number of children
> "currently online", but Quorum number of children "never having
> disconnected with bricks after locks are acquired".
>

I think this requisite is not feasible. In a distributed file system,
sooner or later all bricks will be disconnected. It could be because of
failures or because an upgrade is done, but it will happen.

The difference here is how long are fd's kept open. If applications open
and close files frequently enough (i.e. the fd is not kept open more time
than it takes to have more than Quorum bricks disconnected) then there's no
problem. The problem can only appear on applications that open files for a
long time and also use posix locks. In this case, the only good solution I
see is to retake the locks on brick reconnection.


> However, this use case is not affected if the application don't acquire
> any POSIX locks. So, I am interested in knowing
> * whether your use cases use POSIX locks?
> * Is it feasible for your application to re-open fds and re-acquire locks
> on seeing EBADFD errors?
>

I think that many applications are not prepared to handle that.

Xavi


>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7
>
> regards,
> Raghavendra
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-26 Thread Raghavendra Gowdappa
All,

Glusterfs cleans up POSIX locks held on an fd when the client/mount through
which those locks are held disconnects from bricks/server. This helps
Glusterfs to not run into a stale lock problem later (For eg., if
application unlocks while the connection was still down). However, this
means the lock is no longer exclusive as other applications/clients can
acquire the same lock. To communicate that locks are no longer valid, we
are planning to mark the fd (which has POSIX locks) bad on a disconnect so
that any future operations on that fd will fail, forcing the application to
re-open the fd and re-acquire locks it needs [1].

Note that with AFR/replicate in picture we can prevent errors to
application as long as Quorum number of children "never ever" lost
connection with bricks after locks have been acquired. I am using the term
"never ever" as locks are not healed back after re-connection and hence
first disconnect would've marked the fd bad and the fd remains so even
after re-connection happens. So, its not just Quorum number of children
"currently online", but Quorum number of children "never having
disconnected with bricks after locks are acquired".

However, this use case is not affected if the application don't acquire any
POSIX locks. So, I am interested in knowing
* whether your use cases use POSIX locks?
* Is it feasible for your application to re-open fds and re-acquire locks
on seeing EBADFD errors?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7

regards,
Raghavendra
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users