Hi, On Sat, 2021-06-12 at 21:35 +0000, Al Viro wrote: > On Sat, Jun 12, 2021 at 09:05:40PM +0000, Al Viro wrote: > > > Is the above an accurate description of the mainline situation > > there? > > In particular, normal read doesn't seem to bother with locks at > > all. > > What exactly are those cluster locks for in O_DIRECT read? > > BTW, assuming the lack of contention, how costly is > dropping/regaining > such cluster lock? >
The answer is that it depends... The locking modes for glocks for inodes look like this: ========== ========== ============== ========== ============== Glock mode Cache data Cache Metadata Dirty Data Dirty Metadata ========== ========== ============== ========== ============== UN No No No No SH Yes Yes No No DF No Yes No No EX Yes Yes Yes Yes ========== ========== ============== ========== ============== The above is a copy & paste from Documentation/filesystems/gfs2- glocks.rst. If you think of these locks as cache control, then it makes a lot more sense. The DF (deferred) mode is there only for DIO. It is a shared lock mode that is incompatible with the normal SH mode. That is because it is ok to cache data pages under SH but not under DF. That the only other difference between the two shared modes. DF is used for both read and write under DIO meaning that it is possible for multiple nodes to read & write the same file at the same time with DIO, leaving any synchronisation to the application layer. As soon as one performs an operation which alters the metadata tree (truncate, extend, hole filling) then we drop back to the normal EX mode, so DF is only used for preallocated files. Your original question though was about the cost of locking, and there is a wide variation according to circumstances. The glock layer caches the results of the DLM requests and will continue to hold glocks gained from remote nodes until either memory pressure or requests to drop the lock from another node is received. When no other nodes are interested in a lock, all such cluster lock activity is local. There is a cost to it though, and if (for example) you tried to take and drop the cluster lock on every page, that would definitely be noticeable. There are probably optimisations that could be done on what is quite a complex code path, but in general thats what we've discovered from testing. The introduction of ->readpages() vs the old ->readpage() made a measurable difference and likewise on the write side, iomap has also show performance increases due to the reduction in locking on multi-page writes. If there is another node that has an interest in a lock, then it can get very expensive in terms of latency to regain a lock. To drop the lock to a lower mode may involve I/O (from EX mode) and journal flush(es) and to get the lock back again involves I/O to other nodes and then a wait while they finish what they are doing. To avoid starvation there is a "minimum hold time" so that when a node gains a glock, it is allowed to retain it, in the absence of local requests, for a short period. The idea being that if a large number of glock requests are being made on a node, each for a short time, we allow several of those to complete before we do the expensive glock release to another node. See Documentation/filesystems/gfs2-glocks.rst for a longer explanation and locking order/rules between different lock types, Steve.