This implements a sleepable range rwlock, based on interval tree, serializing conflicting/intersecting/overlapping ranges within the tree. The largest range is given by [0, ~0 - 1] (inclusive). Unlike traditional locks, range locking involves dealing with the tree itself and the range to be locked, normally stack allocated and always explicitly prepared/initialized by the user in a [a0, a1] a0 <= a1 sorted manner, before actually taking the lock.
An interval tree of locked and to-be-locked ranges is kept. When a new range lock is requested, we add its interval to the tree and store number of intervals intersecting it to 'blocking_ranges'. For the reader case, 'blocking_ranges' is only accounted for if the intersecting range is marked as a writer. To achieve mutual exclusion of arbitrary ranges, we guarantee that task is blocked until there are no overlapping ranges in the tree. When a range is unlocked, we again walk intervals that overlap with the unlocked one and decrement their 'blocking_ranges'. Naturally, we wake up owner of any range lock whose 'blocking_ranges' drops to 0. Wakeup order therefore relies on the order of the interval tree -- as opposed to a more traditional fifo mechanism. There is no lock stealing either, which prevents starvation and guarantees fairness. How much does it cost: ---------------------- The cost of lock and unlock of a range is O((1+R_int)log(R_all)) where R_all is total number of ranges and R_int is the number of ranges intersecting the new range range to be added. Due to its sharable nature, full range locks can be compared with rw-sempahores, which also serves from a mutex standpoint as writer-only situations are pretty similar nowadays. The first is the memory footprint, tree locks are larger than rwsems: 80 vs 40 bytes, and also requires an additional 72 bytes of stack for the range structure. Secondly, because every range call is serialized by the tree->lock, any lock() fastpath will at least have an interval_tree_insert() and spinlock lock+unlock overhead compared to a single atomic insn in the case of rwsems. Similar scenario obviously for the unlock() case. The torture module was used to measure 1-1 differences in lock acquisition with increasing core counts over a period of 10 minutes. Readers and writers are interleaved, with a slight advantage to writers as its the first kthread that is created. The following shows the avg ops/minute with various thread-setups on boxes with small and large core-counts. ** 4-core AMD Opteron ** (write-only) rwsem-2thr: 4198.5, stddev: 7.77 range-2thr: 4199.1, stddev: 0.73 rwsem-4thr: 6036.8, stddev: 50.91 range-4thr: 6004.9, stddev: 126.57 rwsem-8thr: 6245.6, stddev: 59.39 range-8thr: 6229.3, stddev: 10.60 (read-only) rwsem-2thr: 5930.7, stddev: 21.92 range-2thr: 5917.3, stddev: 25.45 rwsem-4thr: 9881.6, stddev: 0.70 range-4thr: 9540.2, stddev: 98.28 rwsem-8thr: 11633.2, stddev: 7.72 range-8thr: 11314.7, stddev: 62.22 For the read/write-only cases, there is very little difference between the range lock and rwsems, with up to a 3% hit, which could very well be considered in the noise range. (read-write) rwsem-write-1thr: 1744.8, stddev: 11.59 rwsem-read-1thr: 1043.1, stddev: 3.97 range-write-1thr: 1740.2, stddev: 5.99 range-read-1thr: 1022.5, stddev: 6.41 rwsem-write-2thr: 1662.5, stddev: 0.70 rwsem-read-2thr: 1278.0, stddev: 25.45 range-write-2thr: 1321.5, stddev: 51.61 range-read-2thr: 1243.5, stddev: 30.40 rwsem-write-4thr: 1761.0, stddev: 11.31 rwsem-read-4thr: 1426.0, stddev: 7.07 range-write-4thr: 1417.0, stddev: 29.69 range-read-4thr: 1398.0, stddev: 56.56 While a single reader and writer threads does not show must difference, increasing core counts shows that in reader/writer workloads, writer threads can take a hit in raw performance of up to ~20%, while the number of reader throughput is quite similar among both locks. ** 240-core (ht) IvyBridge ** (write-only) rwsem-120thr: 6844.5, stddev: 82.73 range-120thr: 6070.5, stddev: 85.55 rwsem-240thr: 6292.5, stddev: 146.3 range-240thr: 6099.0, stddev: 15.55 rwsem-480thr: 6164.8, stddev: 33.94 range-480thr: 6062.3, stddev: 19.79 (read-only) rwsem-120thr: 136860.4, stddev: 2539.92 range-120thr: 138052.2, stddev: 327.39 rwsem-240thr: 235297.5, stddev: 2220.50 range-240thr: 232099.1, stddev: 3614.72 rwsem-480thr: 272683.0, stddev: 3924.32 range-480thr: 256539.2, stddev: 9541.69 Similar to the small box, larger machines show that range locks take only a minor (up to ~6% for 480 threads) hit even in completely exclusive or shared scenarios. (read-write) rwsem-write-60thr: 4658.1, stddev: 1303.19 rwsem-read-60thr: 1108.7, stddev: 718.42 range-write-60thr: 3203.6, stddev: 139.30 range-read-60thr: 1852.8, stddev: 147.5 rwsem-write-120thr: 3971.3, stddev: 1413.0 rwsem-read-120thr: 1038.8, stddev: 353.51 range-write-120thr: 2282.1, stddev: 207.18 range-read-120thr: 1856.5, stddev: 198.69 rwsem-write-240thr: 4112.7, stddev: 2448.1 rwsem-read-240thr: 1277.4, stddev: 430.30 range-write-240thr: 2353.1, stddev: 502.04 range-read-240thr: 1551.5, stddev: 361.33 When mixing readers and writers, writer throughput can take a hit of up to ~40%, similar to the 4 core machine, however, reader threads can increase the number of acquisitions in up to ~80%. In any case, the overall writer+reader throughput will always be higher for rwsems. A huge factor in this behavior is that range locks do not have writer spin-on-owner feature. On both machines when actually testing threads acquiring different ranges, the amount of throughput will always outperform the rwsem, due to the increased parallelism; which is no surprise either. As such microbenchmarks that merely pounds on a lock will pretty much always suffer upon direct lock conversions, but not enough to matter in the overall picture. Signed-off-by: Davidlohr Bueso <dbu...@suse.de> Reviewed-by: Jan Kara <j...@suse.cz> --- include/linux/range_rwlock.h | 115 +++++++++ kernel/locking/Makefile | 2 +- kernel/locking/range_rwlock.c | 554 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 670 insertions(+), 1 deletion(-) create mode 100644 include/linux/range_rwlock.h create mode 100644 kernel/locking/range_rwlock.c diff --git a/include/linux/range_rwlock.h b/include/linux/range_rwlock.h new file mode 100644 index 000000000000..0a8dad62d097 --- /dev/null +++ b/include/linux/range_rwlock.h @@ -0,0 +1,115 @@ +/* + * Range/interval rw-locking + * ------------------------- + * + * An interval tree of locked and to-be-locked ranges is kept. When a new range + * lock is requested, we add its interval to the tree and store number of + * intervals intersecting it to 'blocking_ranges'. For the reader case, + * 'blocking_ranges' is only accounted for if the intersecting range is + * marked as a writer. To achieve mutual exclusion of arbitrary ranges, we + * guarantee that task is blocked until there are no overlapping ranges in the + * tree. + * + * When a range is unlocked, we again walk intervals that overlap with the + * unlocked one and decrement their 'blocking_ranges'. Naturally, we wake up + * owner of any range lock whose 'blocking_ranges' drops to 0. Wakeup order + * therefore relies on the order of the interval tree -- as opposed to a + * more traditional fifo mechanism. There is no lock stealing either, which + * prevents starvation and guarantees fairness. + * + * The cost of lock and unlock of a range is O((1+R_int)log(R_all)) where R_all + * is total number of ranges and R_int is the number of ranges intersecting the + * operated range. + */ +#ifndef _LINUX_RANGE_RWLOCK_H +#define _LINUX_RANGE_RWLOCK_H + +#include <linux/rbtree.h> +#include <linux/interval_tree.h> +#include <linux/list.h> +#include <linux/spinlock.h> + +/* + * The largest range will span [0,RANGE_RWLOCK_FULL]. + */ +#define RANGE_RWLOCK_FULL ~0UL + +struct range_rwlock { + struct interval_tree_node node; + struct task_struct *waiter; + /* Number of ranges which are blocking acquisition of the lock */ + unsigned int blocking_ranges; + u64 seqnum; +}; + +struct range_rwlock_tree { + struct rb_root root; + spinlock_t lock; + struct interval_tree_node *leftmost; /* compute smallest 'start' */ + u64 seqnum; /* track order of incoming ranges, avoid overflows */ +}; + +#define __RANGE_RWLOCK_TREE_INITIALIZER(name) \ + { .leftmost = NULL \ + , .root = RB_ROOT \ + , .seqnum = 0 \ + , .lock = __SPIN_LOCK_UNLOCKED(name.lock) } + +#define DEFINE_RANGE_RWLOCK_TREE(name) \ + struct range_rwlock_tree name = __RANGE_RWLOCK_TREE_INITIALIZER(name) + +#define __RANGE_RWLOCK_INITIALIZER(__start, __last) { \ + .node = { \ + .start = (__start) \ + ,.last = (__last) \ + } \ + , .task = NULL \ + , .blocking_ranges = 0 \ + , .reader = false \ + , .seqnum = 0 \ + } + +#define DEFINE_RANGE_RWLOCK(name, start, last) \ + struct range_rwlock name = __RANGE_RWLOCK_INITIALIZER((start), (last)) + +#define DEFINE_RANGE_RWLOCK_FULL(name) \ + struct range_rwlock name = __RANGE_RWLOCK_INITIALIZER(0, RANGE_RWLOCK_FULL) + +static inline void range_rwlock_tree_init(struct range_rwlock_tree *tree) +{ + tree->root = RB_ROOT; + spin_lock_init(&tree->lock); + tree->leftmost = NULL; + tree->seqnum = 0; +} + +void range_rwlock_init(struct range_rwlock *lock, + unsigned long start, unsigned long last); +void range_rwlock_init_full(struct range_rwlock *lock); + +/* + * lock for reading + */ +void range_read_lock(struct range_rwlock_tree *tree, struct range_rwlock *lock); +int range_read_lock_interruptible(struct range_rwlock_tree *tree, + struct range_rwlock *lock); +int range_read_lock_killable(struct range_rwlock_tree *tree, + struct range_rwlock *lock); +int range_read_trylock(struct range_rwlock_tree *tree, struct range_rwlock *lock); +void range_read_unlock(struct range_rwlock_tree *tree, struct range_rwlock *lock); + +/* + * lock for writing + */ +void range_write_lock(struct range_rwlock_tree *tree, struct range_rwlock *lock); +int range_write_lock_interruptible(struct range_rwlock_tree *tree, + struct range_rwlock *lock); +int range_write_lock_killable(struct range_rwlock_tree *tree, + struct range_rwlock *lock); +int range_write_trylock(struct range_rwlock_tree *tree, struct range_rwlock *lock); +void range_write_unlock(struct range_rwlock_tree *tree, struct range_rwlock *lock); + +void range_downgrade_write(struct range_rwlock_tree *tree, + struct range_rwlock *lock); + +#endif diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile index 760158d9d98d..bf054d0af82b 100644 --- a/kernel/locking/Makefile +++ b/kernel/locking/Makefile @@ -2,7 +2,7 @@ # and is generally not a function of system call inputs. KCOV_INSTRUMENT := n -obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o +obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o range_rwlock.o ifdef CONFIG_FUNCTION_TRACER CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE) diff --git a/kernel/locking/range_rwlock.c b/kernel/locking/range_rwlock.c new file mode 100644 index 000000000000..550ad360d87a --- /dev/null +++ b/kernel/locking/range_rwlock.c @@ -0,0 +1,554 @@ +#include <linux/rbtree.h> +#include <linux/spinlock.h> +#include <linux/range_rwlock.h> +#include <linux/sched/signal.h> +#include <linux/sched/debug.h> +#include <linux/sched/wake_q.h> +#include <linux/sched.h> +#include <linux/export.h> + +#define range_entry(ptr, type, member) container_of(ptr, type, member) + +#define range_interval_tree_foreach(node, root, start, last) \ + for (node = interval_tree_iter_first(root, start, last); \ + node; node = interval_tree_iter_next(node, start, last)) + +/* + * Fastpath range intersection/overlap between A: [a0, a1] and B: [b0, b1] + * is given by: + * + * a0 <= b1 && b0 <= a1 + * + * ... where A holds the lock range and B holds the smallest 'start' and + * largest 'last' in the tree. For the later, we rely on the root node, + * which by augmented interval tree property, holds the largest value in + * its last-in-subtree. This allows mitigating some of the tree walk overhead + * for non-intersecting ranges, maintained and consulted in O(1). + */ +static inline bool +__range_intersects_intree(struct range_rwlock_tree *tree, struct range_rwlock *lock) +{ + struct interval_tree_node *root; + + if (unlikely(RB_EMPTY_ROOT(&tree->root))) + return false; + + root = range_entry(tree->root.rb_node, struct interval_tree_node, rb); + + return lock->node.start <= root->__subtree_last && + tree->leftmost->start <= lock->node.last; +} + +static inline void +__range_tree_insert(struct range_rwlock_tree *tree, struct range_rwlock *lock) +{ + if (unlikely(RB_EMPTY_ROOT(&tree->root)) || + lock->node.start < tree->leftmost->start) + tree->leftmost = &lock->node; + + lock->seqnum = tree->seqnum++; + interval_tree_insert(&lock->node, &tree->root); +} + +static inline void +__range_tree_remove(struct range_rwlock_tree *tree, struct range_rwlock *lock) +{ + if (tree->leftmost == &lock->node) { + struct rb_node *next = rb_next(&tree->leftmost->rb); + tree->leftmost = range_entry(next, struct interval_tree_node, rb); + } + + interval_tree_remove(&lock->node, &tree->root); +} + +/* + * lock->waiter reader tracking. + */ +#define RANGE_FLAG_READER 1UL + +static inline struct task_struct *range_lock_waiter(struct range_rwlock *lock) +{ + return (struct task_struct *) + ((unsigned long) lock->waiter & ~RANGE_FLAG_READER); +} + +static inline void range_lock_set_reader(struct range_rwlock *lock) +{ + lock->waiter = (struct task_struct *) + ((unsigned long)lock->waiter | RANGE_FLAG_READER); +} + +static inline void range_lock_clear_reader(struct range_rwlock *lock) +{ + lock->waiter = (struct task_struct *) + ((unsigned long)lock->waiter & ~RANGE_FLAG_READER); +} + +static inline bool range_lock_is_reader(struct range_rwlock *lock) +{ + return (unsigned long) lock->waiter & RANGE_FLAG_READER; +} + +static inline void +__range_rwlock_init(struct range_rwlock *lock, + unsigned long start, unsigned long last) +{ + WARN_ON(start > last); + + lock->node.start = start; + lock->node.last = last; + RB_CLEAR_NODE(&lock->node.rb); + lock->blocking_ranges = 0; + lock->waiter = NULL; + lock->seqnum = 0; +} + +/** + * range_rwlock_init - Initialize the range lock + * @lock: the range lock to be initialized + * @start: start of the interval (inclusive) + * @last: last location in the interval (inclusive) + * + * Initialize the range's [start, last] such that it can + * later be locked. User is expected to enter a sorted + * range, such that @start <= @last. + * + * It is not allowed to initialize an already locked range. + */ +void range_rwlock_init(struct range_rwlock *lock, unsigned long start, + unsigned long last) +{ + __range_rwlock_init(lock, start, last); +} +EXPORT_SYMBOL_GPL(range_rwlock_init); + +/** + * range_rwlock_init_full - Initialize a full range lock + * @lock: the range lock to be initialized + * + * Initialize the full range. + * + * It is not allowed to initialize an already locked range. + */ +void range_rwlock_init_full(struct range_rwlock *lock) +{ + __range_rwlock_init(lock, 0, RANGE_RWLOCK_FULL); +} +EXPORT_SYMBOL_GPL(range_rwlock_init_full); + +static inline void +range_rwlock_unblock(struct range_rwlock *lock, struct wake_q_head *wake_q) +{ + if (!--lock->blocking_ranges) + wake_q_add(wake_q, range_lock_waiter(lock)); +} + +static inline int wait_for_ranges(struct range_rwlock_tree *tree, + struct range_rwlock *lock, long state) +{ + int ret = 0; + + while (true) { + set_current_state(state); + + /* do we need to go to sleep? */ + if (!lock->blocking_ranges) + break; + + if (unlikely(signal_pending_state(state, current))) { + struct interval_tree_node *node; + unsigned long flags; + DEFINE_WAKE_Q(wake_q); + + ret = -EINTR; + /* + * We're not taking the lock after all, cleanup + * after ourselves. + */ + spin_lock_irqsave(&tree->lock, flags); + + range_lock_clear_reader(lock); + __range_tree_remove(tree, lock); + + if (!__range_intersects_intree(tree, lock)) + goto unlock; + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, + lock->node.last) { + struct range_rwlock *blked; + blked = range_entry(node, + struct range_rwlock, node); + + if (range_lock_is_reader(lock) && + range_lock_is_reader(blked)) + continue; + + /* unaccount for threads _we_ are blocking */ + if (lock->seqnum < blked->seqnum) + range_rwlock_unblock(blked, &wake_q); + } + + unlock: + spin_unlock_irqrestore(&tree->lock, flags); + wake_up_q(&wake_q); + break; + } + + schedule(); + } + + __set_current_state(TASK_RUNNING); + return ret; +} + +static __always_inline int __sched +__range_read_lock_common(struct range_rwlock_tree *tree, + struct range_rwlock *lock, long state) +{ + struct interval_tree_node *node; + unsigned long flags; + + spin_lock_irqsave(&tree->lock, flags); + range_lock_set_reader(lock); + + if (!__range_intersects_intree(tree, lock)) + goto insert; + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, lock->node.last) { + struct range_rwlock *blocked_lock; + blocked_lock = range_entry(node, struct range_rwlock, node); + + if (!range_lock_is_reader(blocked_lock)) + lock->blocking_ranges++; + } +insert: + __range_tree_insert(tree, lock); + + lock->waiter = current; + spin_unlock_irqrestore(&tree->lock, flags); + + return wait_for_ranges(tree, lock, state); +} + +/** + * range_read_lock - Lock for reading + * @tree: interval tree + * @lock: the range lock to be locked + * + * Returns when the lock has been acquired or sleep until + * until there are no overlapping ranges. + */ +void range_read_lock(struct range_rwlock_tree *tree, struct range_rwlock *lock) +{ + might_sleep(); + __range_read_lock_common(tree, lock, TASK_UNINTERRUPTIBLE); +} +EXPORT_SYMBOL_GPL(range_read_lock); + +/** + * range_read_lock_interruptible - Lock for reading (interruptible) + * @tree: interval tree + * @lock: the range lock to be locked + * + * Lock the range like range_read_lock(), and return 0 if the + * lock has been acquired or sleep until until there are no + * overlapping ranges. If a signal arrives while waiting for the + * lock then this function returns -EINTR. + */ +int range_read_lock_interruptible(struct range_rwlock_tree *tree, + struct range_rwlock *lock) +{ + might_sleep(); + return __range_read_lock_common(tree, lock, TASK_INTERRUPTIBLE); +} +EXPORT_SYMBOL_GPL(range_read_lock_interruptible); + +/** + * range_read_lock_killable - Lock for reading (killable) + * @tree: interval tree + * @lock: the range lock to be locked + * + * Lock the range like range_read_lock(), and return 0 if the + * lock has been acquired or sleep until until there are no + * overlapping ranges. If a signal arrives while waiting for the + * lock then this function returns -EINTR. + */ +int range_read_lock_killable(struct range_rwlock_tree *tree, + struct range_rwlock *lock) +{ + might_sleep(); + return __range_read_lock_common(tree, lock, TASK_KILLABLE); +} +EXPORT_SYMBOL_GPL(range_read_lock_killable); + +/** + * range_read_trylock - Trylock for reading + * @tree: interval tree + * @lock: the range lock to be trylocked + * + * The trylock is against the range itself, not the @tree->lock. + * + * Returns 1 if successful, 0 if contention (must block to acquire). + */ +int range_read_trylock(struct range_rwlock_tree *tree, struct range_rwlock *lock) +{ + int ret = true; + unsigned long flags; + struct interval_tree_node *node; + + spin_lock_irqsave(&tree->lock, flags); + + if (!__range_intersects_intree(tree, lock)) + goto insert; + + /* + * We have overlapping ranges in the tree, ensure that we can + * in fact share the lock. + */ + range_interval_tree_foreach(node, &tree->root, + lock->node.start, lock->node.last) { + struct range_rwlock *blocked_lock; + blocked_lock = range_entry(node, struct range_rwlock, node); + + if (!range_lock_is_reader(blocked_lock)) { + ret = false; + goto unlock; + } + } +insert: + range_lock_set_reader(lock); + __range_tree_insert(tree, lock); +unlock: + spin_unlock_irqrestore(&tree->lock, flags); + + return ret; +} +EXPORT_SYMBOL_GPL(range_read_trylock); + +/** + * range_read_unlock - Unlock for reading + * @tree: interval tree + * @lock: the range lock to be unlocked + * + * Wakes any blocked readers, when @lock is the only conflicting range. + * + * It is not allowed to unlock an unacquired read lock. + */ +void range_read_unlock(struct range_rwlock_tree *tree, struct range_rwlock *lock) +{ + struct interval_tree_node *node; + unsigned long flags; + DEFINE_WAKE_Q(wake_q); + + spin_lock_irqsave(&tree->lock, flags); + + range_lock_clear_reader(lock); + __range_tree_remove(tree, lock); + + if (!__range_intersects_intree(tree, lock)) { + /* nobody to wakeup, we're done */ + spin_unlock_irqrestore(&tree->lock, flags); + return; + } + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, lock->node.last) { + struct range_rwlock *blocked_lock; + blocked_lock = range_entry(node, struct range_rwlock, node); + + if (!range_lock_is_reader(blocked_lock)) + range_rwlock_unblock(blocked_lock, &wake_q); + } + + spin_unlock_irqrestore(&tree->lock, flags); + wake_up_q(&wake_q); +} +EXPORT_SYMBOL_GPL(range_read_unlock); + +static __always_inline int __sched +__range_write_lock_common(struct range_rwlock_tree *tree, + struct range_rwlock *lock, long state) +{ + struct interval_tree_node *node; + unsigned long flags; + + spin_lock_irqsave(&tree->lock, flags); + + if (!__range_intersects_intree(tree, lock)) + goto insert; + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, lock->node.last) { + /* + * As a writer, we always consider an existing node. We + * need to block; either the intersecting node is another + * writer or we have a reader that needs to finish. + */ + lock->blocking_ranges++; + } +insert: + __range_tree_insert(tree, lock); + + lock->waiter = current; + spin_unlock_irqrestore(&tree->lock, flags); + + return wait_for_ranges(tree, lock, state); +} + +/** + * range_write_lock - Lock for writing + * @tree: interval tree + * @lock: the range lock to be locked + * + * Returns when the lock has been acquired or sleep until + * until there are no overlapping ranges. + */ +void range_write_lock(struct range_rwlock_tree *tree, struct range_rwlock *lock) +{ + might_sleep(); + __range_write_lock_common(tree, lock, TASK_UNINTERRUPTIBLE); +} +EXPORT_SYMBOL_GPL(range_write_lock); + +/** + * range_write_lock_interruptible - Lock for writing (interruptible) + * @tree: interval tree + * @lock: the range lock to be locked + * + * Lock the range like range_write_lock(), and return 0 if the + * lock has been acquired or sleep until until there are no + * overlapping ranges. If a signal arrives while waiting for the + * lock then this function returns -EINTR. + */ +int range_write_lock_interruptible(struct range_rwlock_tree *tree, + struct range_rwlock *lock) +{ + might_sleep(); + return __range_write_lock_common(tree, lock, TASK_INTERRUPTIBLE); +} +EXPORT_SYMBOL_GPL(range_write_lock_interruptible); + +/** + * range_write_lock_killable - Lock for writing (killable) + * @tree: interval tree + * @lock: the range lock to be locked + * + * Lock the range like range_write_lock(), and return 0 if the + * lock has been acquired or sleep until until there are no + * overlapping ranges. If a signal arrives while waiting for the + * lock then this function returns -EINTR. + */ +int range_write_lock_killable(struct range_rwlock_tree *tree, + struct range_rwlock *lock) +{ + might_sleep(); + return __range_write_lock_common(tree, lock, TASK_KILLABLE); +} +EXPORT_SYMBOL_GPL(range_write_lock_killable); + +/** + * range_write_trylock - Trylock for writing + * @tree: interval tree + * @lock: the range lock to be trylocked + * + * The trylock is against the range itself, not the @tree->lock. + * + * Returns 1 if successful, 0 if contention (must block to acquire). + */ +int range_write_trylock(struct range_rwlock_tree *tree, struct range_rwlock *lock) +{ + int intersects; + unsigned long flags; + + spin_lock_irqsave(&tree->lock, flags); + intersects = __range_intersects_intree(tree, lock); + + if (!intersects) { + range_lock_clear_reader(lock); + __range_tree_insert(tree, lock); + } + + spin_unlock_irqrestore(&tree->lock, flags); + + return !intersects; +} +EXPORT_SYMBOL_GPL(range_write_trylock); + +/** + * range_write_unlock - Unlock for writing + * @tree: interval tree + * @lock: the range lock to be unlocked + * + * Wakes any blocked readers, when @lock is the only conflicting range. + * + * It is not allowed to unlock an unacquired write lock. + */ +void range_write_unlock(struct range_rwlock_tree *tree, struct range_rwlock *lock) +{ + struct interval_tree_node *node; + unsigned long flags; + DEFINE_WAKE_Q(wake_q); + + spin_lock_irqsave(&tree->lock, flags); + + range_lock_clear_reader(lock); + __range_tree_remove(tree, lock); + + if (!__range_intersects_intree(tree, lock)) { + /* nobody to wakeup, we're done */ + spin_unlock_irqrestore(&tree->lock, flags); + return; + } + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, lock->node.last) { + struct range_rwlock *blocked_lock; + blocked_lock = range_entry(node, struct range_rwlock, node); + + range_rwlock_unblock(blocked_lock, &wake_q); + } + + spin_unlock_irqrestore(&tree->lock, flags); + wake_up_q(&wake_q); +} +EXPORT_SYMBOL_GPL(range_write_unlock); + +/** + * range_downgrade_write - Downgrade write range lock to read lock + * @tree: interval tree + * @lock: the range lock to be downgraded + * + * Wakes any blocked readers, when @lock is the only conflicting range. + * + * It is not allowed to downgrade an unacquired write lock. + */ +void range_downgrade_write(struct range_rwlock_tree *tree, + struct range_rwlock *lock) +{ + unsigned long flags; + struct interval_tree_node *node; + DEFINE_WAKE_Q(wake_q); + + spin_lock_irqsave(&tree->lock, flags); + + WARN_ON(range_lock_is_reader(lock)); + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, lock->node.last) { + struct range_rwlock *blocked_lock; + blocked_lock = range_entry(node, struct range_rwlock, node); + + /* + * Unaccount for any blocked reader lock. Wakeup if possible. + */ + if (range_lock_is_reader(blocked_lock)) + range_rwlock_unblock(blocked_lock, &wake_q); + } + + range_lock_set_reader(lock); + spin_unlock_irqrestore(&tree->lock, flags); + wake_up_q(&wake_q); +} +EXPORT_SYMBOL_GPL(range_downgrade_write); -- 2.12.0