On Thu, Jun 1, 2023 at 9:10 PM Alexander Aring <aahri...@redhat.com> wrote: > Hi, > > On Thu, Jun 1, 2023 at 1:11 PM Andreas Gruenbacher <agrue...@redhat.com> > wrote: > > > > On Thu, Jun 1, 2023 at 6:28 PM Alexander Aring <aahri...@redhat.com> wrote: > > > Hi, > > > > > > On Tue, May 30, 2023 at 1:40 PM Andreas Gruenbacher <agrue...@redhat.com> > > > wrote: > > > > > > > > On Tue, May 30, 2023 at 4:08 PM Alexander Aring <aahri...@redhat.com> > > > > wrote: > > > > > Hi, > > > > > > > > > > On Tue, May 30, 2023 at 7:01 AM Andreas Gruenbacher > > > > > <agrue...@redhat.com> wrote: > > > > > > > > > > > > On Tue, May 30, 2023 at 12:19 AM Alexander Aring > > > > > > <aahri...@redhat.com> wrote: > > > > > > > Hi, > > > > > > > > > > > > > > On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher > > > > > > > <agrue...@redhat.com> wrote: > > > > > > > > > > > > > > > > On Wed, May 24, 2023 at 6:02 PM Alexander Aring > > > > > > > > <aahri...@redhat.com> wrote: > > > > > > > > > This patch fixes a possible plock op collisions when using > > > > > > > > > F_SETLKW lock > > > > > > > > > requests and fsid, number and owner are not enough to > > > > > > > > > identify a result > > > > > > > > > for a pending request. The ltp testcases [0] and [1] are > > > > > > > > > examples when > > > > > > > > > this is not enough in case of using classic posix locks with > > > > > > > > > threads and > > > > > > > > > open filedescriptor posix locks. > > > > > > > > > > > > > > > > > > The idea to fix the issue here is to place all lock request > > > > > > > > > in order. In > > > > > > > > > case of non F_SETLKW lock request (indicated if wait is set > > > > > > > > > or not) the > > > > > > > > > lock requests are ordered inside the recv_list. If a result > > > > > > > > > comes back > > > > > > > > > the right plock op can be found by the first plock_op in > > > > > > > > > recv_list which > > > > > > > > > has not info.wait set. This can be done only by non F_SETLKW > > > > > > > > > plock ops as > > > > > > > > > dlm_controld always reads a specific plock op > > > > > > > > > (list_move_tail() from > > > > > > > > > send_list to recv_mlist) and write the result immediately > > > > > > > > > back. > > > > > > > > > > > > > > > > > > This behaviour is for F_SETLKW not possible as multiple > > > > > > > > > waiters can be > > > > > > > > > get a result back in an random order. To avoid a collisions > > > > > > > > > in cases > > > > > > > > > like [0] or [1] this patch adds more fields to compare the > > > > > > > > > plock > > > > > > > > > operations as the lock request is the same. This is also > > > > > > > > > being made in > > > > > > > > > NFS to find an result for an asynchronous F_SETLKW lock > > > > > > > > > request [2][3]. We > > > > > > > > > still can't find the exact lock request for a specific result > > > > > > > > > if the > > > > > > > > > lock request is the same, but if this is the case we don't > > > > > > > > > care the > > > > > > > > > order how the identical lock requests get their result back > > > > > > > > > to grant the > > > > > > > > > lock. > > > > > > > > > > > > > > > > When the recv_list contains multiple indistinguishable > > > > > > > > requests, this > > > > > > > > can only be because they originated from multiple threads of > > > > > > > > the same > > > > > > > > process. In that case, I agree that it doesn't matter which of > > > > > > > > those > > > > > > > > requests we "complete" in dev_write() as long as we only > > > > > > > > complete one > > > > > > > > request. We do need to compare the additional request fields in > > > > > > > > dev_write() to find a suitable request, so that makes sense as > > > > > > > > well. > > > > > > > > We need to compare all of the fields that identify a request > > > > > > > > (optype, > > > > > > > > ex, wait, pid, nodeid, fsid, number, start, end, owner) to find > > > > > > > > the > > > > > > > > "right" request (or in case there is more than one identical > > > > > > > > request, > > > > > > > > a "suitable" request). > > > > > > > > > > > > > > > > > > > > > > In my "definition" why this works is as you said the "identical > > > > > > > request". There is a more deeper definition of "when is a request > > > > > > > identical" and in my opinion it is here as: "A request A is > > > > > > > identical > > > > > > > to request B when they get granted under the same 'time'" which > > > > > > > is all > > > > > > > the fields you mentioned. > > > > > > > > > > > > > > Even with cancellation (F_SETLKW only) it does not matter which > > > > > > > "identical request" you cancel because the kernel and user > > > > > > > (dlm_controld) makes no relation between a lock request instance. > > > > > > > You > > > > > > > need to have at least the same amount of "results" coming back > > > > > > > from > > > > > > > user space as the amount you are waiting for a result for the same > > > > > > > "identical request". > > > > > > > > > > > > That's not incorrect per se, but cancellations create an additional > > > > > > difficulty: they can either succeed or fail. To indicate that a > > > > > > cancellation has succeeded, a new type of message can be introduced > > > > > > (say, "CANCELLED"), and it's obvious that a CANCELLED message can > > > > > > only > > > > > > belong to a locking request that is being cancelled. When > > > > > > cancelling a > > > > > > locking request fails, the kernel will see a "locking request > > > > > > granted" > > > > > > message though, and when multiple identical locking requests are > > > > > > queued and only some of them have been cancelled, it won't be > > > > > > obvious > > > > > > which locking request a "locking request granted" message should be > > > > > > assigned to anymore. You really don't want to mix things up in that > > > > > > case. > > > > > > > > > > > > This complication makes it a whole lot more difficult to reason > > > > > > about > > > > > > the correctness of the code. All that complexity is avoidable by > > > > > > sticking with a fixed mapping of requests and replies (i.e., a > > > > > > unique > > > > > > request identifier). > > > > > > > > > > > > To put it differently, you can shoot yourself in the foot and still > > > > > > hop along on the other leg, but it may not be the best of all > > > > > > possible > > > > > > ideas. > > > > > > > > > > > > > > > > It makes things more complicated, I agree and the reason why this > > > > > works now is because there are a lot of "dependencies". I would love > > > > > to have an unique identifier to make it possible that we can follow an > > > > > instance handle of the original lock request. > > > > > > > > > > * an unique identifier which also works with the async lock request of > > > > > lockd case. > > > > > > > > What's the lockd case you're referring to here, and why is it relevant > > > > for the problem at hand? > > > > > > just mentioned that we need a solution which also works for the > > > asynchronous lock request (F_SETLK, F_SETLKW) case, there is only one > > > user lockd. [0] DLM plock handling implements the behaviour mentioned > > > at [0] but lm_grant() callback can also return negative values and > > > signals that the lock request was cancelled (on nfs layer) and then > > > need to tell it DLM. This however is not supported as we have a lack > > > of cancellation. > > > > Ouch, that's a bit messy. But if the vfs_lock_file() description is > > accurate, then only F_SETLK requests arriving via lockd can be > > asynchronous, and F_SETLKW requests never are asynchronous. And we > > only need to cancel F_SETLKW requests. It follows that we only ever > > need to cancel synchronous requests. > > > > I looked into the code and I think the second sentence of [0] is > important regarding to F_SLEEP "Callers expecting ->lock() to return > asynchronously will only use F_SETLK, not F_SETLKW; they will set > FL_SLEEP if (and only if) the request is for a blocking lock.". > If lockd does a lock request, it will then set args.wait to 1 if it > was F_SETLKW [1].
Hmm, this sets args.block? > The receiver will then create a blocking request [2] > and set FL_SLEEP [3]; it does unset FL_SLEEP when args.wait (now as > wait parameter) wasn't set. There is a case of [5] which unset wait > again, but it seems we can end with FL_SLEEP there anyway. > > I think we have currently an issue here that we handle F_SETLK when > F_SLEEP (in case of async lockd request) is handled as trylock which > isn't right. Don't ask me why they are going over F_SLEEP and F_SETLK > and not just using F_SETLKW to signal that it is a blocking request. > So we actually have the F_SETLKW case but it's just signaled with > F_SETLK + FL_SLEEP? It seems to me that the vfs_lock_file() description and the distinction it makes between F_SETLK and F_SETLKW makes sense in a scenario when locking is implemented locally in the kernel, but not in the context of dlm_controld, where all the locking decisions are made in user-space: in the former scenario, F_SETLK requests can be decided immediately without sleeping, but F_SETLKW requests may sleep. In the dlm_controld scenario, locking requests can never be decided immediately, and the requesting task will always sleep. So from the point of view of lockd, any request to dlm_controld should probably be treated as asynchronous. This makes things a bit more ugly than I was hoping for. Thanks, Andreas > - Alex > > [0] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/locks.c?h=v6.4-rc4#n2255 > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/clntproc.c?h=v6.4-rc4#n186 > [2] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc4#n501 > [3] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc4#n240 > [4] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc4#n535 > [5] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc4#n489 >