Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
On 07/19/2016 02:42 PM, Waiman Long wrote: On 07/18/2016 07:38 PM, Tejun Heo wrote: +/* + * include/linux/dlock-list.h + * + * A distributed (per-cpu) set of lists each of which is protected by its + * own spinlock, but acts like a single consolidated list to the callers. + * + * The dlock_list_head_percpu structure contains the spinlock, the other + * dlock_list_node structures only contains a pointer to the spinlock in + * dlock_list_head_percpu. + */ The more I think about it, the more bothered I'm about the dlock_list name. For the most part, this isn't different from other percpu data structures in the kernel. Sure, it might benefit from doing Nth cpu, but so are other percpu data structures and it's not just "distributed lock" list either. The list itself is percpu, not just locking. Can we please go back to percpu_list? Christoph, what do you think? As I said before, I don't mind reverting the name back to percpu_list. I am just waiting for a final agreement. I have just sent out an update dlock-list patch that incorporates all the feedbacks that I got so far except the name change. I will be on vacation next week. After I come back, we can continue our discussion if the name should be reverted back to percpu_list or not. Cheers, Longman
Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
On Wed, Jul 20, 2016 at 07:48:13PM -0500, Christoph Lameter wrote: > On Wed, 20 Jul 2016, Waiman Long wrote: > > > Christoph, are you OK with Tejun's request to revert the name back to > > percpu_list? Or do you still think the current name is better? > > The percpu structure contains a spinlock and may be remotely accessed? You > are aware that other percpu variables that share the same cacheline will > be negatively impacted by accesses from other processors? > > The role of percpu areas are to have memory areas where the code can > expect that cachelines are exclusively there for that processor. > > How frequent are the remote accesses? If this is rare then ok. Remote access will be the common case on traversal and removal from the superblock inode list. Under memory reclaim, the access should at least be from a CPU on the same node (as inode reclaim is NUMA aware). However, any other inode eviction event (e.g. inode unlink) removing it from the sb list will essentially be from a random CPU. Traversals (such as from the sync code, or cache invalidations) will run on a single CPU, so alomst all access from them will be be remote. So it's really only a per-cpu structure for list addition Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
On 07/20/2016 08:48 PM, Christoph Lameter wrote: On Wed, 20 Jul 2016, Waiman Long wrote: Christoph, are you OK with Tejun's request to revert the name back to percpu_list? Or do you still think the current name is better? The percpu structure contains a spinlock and may be remotely accessed? You are aware that other percpu variables that share the same cacheline will be negatively impacted by accesses from other processors? The spinlock can be remotely accessed during deletion as well as the iteration of all the percpu lists. Iteration of all the percpu data is not a new thing as it may also be done in percpu_counter when calling percpu_counter_sum(). Yes, remote access can have a negative performance impact on the access of other percpu data that happen to reside in the same cacheline. The role of percpu areas are to have memory areas where the code can expect that cachelines are exclusively there for that processor. How frequent are the remote accesses? If this is rare then ok. I can't say how often that will happen for the dlock list. If the thread that create the inodes does not migrate to other CPU, the deletion of the inode should also happens in the same CPU. One way to reduce the performance impact is to make the percpu head structure cacheline aligned at the expense of some wasted space. I could add an additional parameter to alloc_dlock_list_head() to force cacheline alignment if the caller wish to do so. What do think about that? Cheers, Longman
Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
On Wed, 20 Jul 2016, Waiman Long wrote: > Christoph, are you OK with Tejun's request to revert the name back to > percpu_list? Or do you still think the current name is better? The percpu structure contains a spinlock and may be remotely accessed? You are aware that other percpu variables that share the same cacheline will be negatively impacted by accesses from other processors? The role of percpu areas are to have memory areas where the code can expect that cachelines are exclusively there for that processor. How frequent are the remote accesses? If this is rare then ok.
Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
On 07/19/2016 02:42 PM, Waiman Long wrote: On 07/18/2016 07:38 PM, Tejun Heo wrote: +/* + * include/linux/dlock-list.h + * + * A distributed (per-cpu) set of lists each of which is protected by its + * own spinlock, but acts like a single consolidated list to the callers. + * + * The dlock_list_head_percpu structure contains the spinlock, the other + * dlock_list_node structures only contains a pointer to the spinlock in + * dlock_list_head_percpu. + */ The more I think about it, the more bothered I'm about the dlock_list name. For the most part, this isn't different from other percpu data structures in the kernel. Sure, it might benefit from doing Nth cpu, but so are other percpu data structures and it's not just "distributed lock" list either. The list itself is percpu, not just locking. Can we please go back to percpu_list? Christoph, what do you think? As I said before, I don't mind reverting the name back to percpu_list. I am just waiting for a final agreement. Christoph, are you OK with Tejun's request to revert the name back to percpu_list? Or do you still think the current name is better? I am almost done with my next version of the patch. This is the only thing that is still outstanding. Thanks, Longman Thanks, Longman
Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
On 07/19/2016 03:23 PM, Tejun Heo wrote: Hello, On Tue, Jul 19, 2016 at 02:42:31PM -0400, Waiman Long wrote: +int alloc_dlock_list_head(struct dlock_list_head *dlist) +{ + struct dlock_list_head dlist_tmp; + int cpu; + + dlist_tmp.head = alloc_percpu(struct dlock_list_head_percpu); + if (!dlist_tmp.head) + return -ENOMEM; + + for_each_possible_cpu(cpu) { + struct dlock_list_head_percpu *head; + + head = per_cpu_ptr(dlist_tmp.head, cpu); + INIT_LIST_HEAD(&head->list); + head->lock = __SPIN_LOCK_UNLOCKED(&head->lock); + lockdep_set_class(&head->lock,&dlock_list_key); + } + + dlist->head = dlist_tmp.head; Just use dlist->head directly or use local __perpcu head pointer? I just don't want to expose the structure to world until it is fully initialized. If you think I am over-cautious, I can use dlist->head as suggested. I don't think it makes any actual difference. No strong opinion either way. Just use local __percpu head pointer then? I have run sparse on dlock_list.c. There is no need to use the __percpu tag here. The head gets assigned the result of per_cpu_ptr() which has no __percpu annotation. I actually got sparse warning if I used the __percpu tag. Cheers, Longman
Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
On 07/19/2016 03:23 PM, Tejun Heo wrote: Hello, On Tue, Jul 19, 2016 at 02:42:31PM -0400, Waiman Long wrote: On 07/18/2016 07:38 PM, Tejun Heo wrote: +struct dlock_list_node { + struct list_head list; + spinlock_t *lockptr; +}; Wouldn't it be better to point to dlock_list_percpu? I could. However, the only thing that matter is the spinlock that protects the list entry. Yeah, we can get back to this when it's actually necessary. It just looked a bit weird to me. +/* + * The dlock list iteration functions which return true if iteration has + * to be continued. + */ +extern bool dlock_list_next(struct dlock_list_head *dlist, + struct dlock_list_iter *iter); +extern bool dlock_list_next_safe(struct dlock_list_head *dlist, +struct dlock_list_iter *iter); Why not return dlock_list_node * for the current node? That'd more conventional and allows dlock_list_iter to be opaque. Yes, I can make it return dlock_list_node *. However, to make dlock_list_iter opaque, I will have to dynamically allocate the structure. That will add an extra memory allocation and free calls as well as handling the error case of running out of memory. I don't think that is worth doing at this point. Sure, keep it defined in the header file. Just don't require users to reach into it and add a comment saying that the struct is opaque to its users. Will do so. +int alloc_dlock_list_head(struct dlock_list_head *dlist) +{ + struct dlock_list_head dlist_tmp; + int cpu; + + dlist_tmp.head = alloc_percpu(struct dlock_list_head_percpu); + if (!dlist_tmp.head) + return -ENOMEM; + + for_each_possible_cpu(cpu) { + struct dlock_list_head_percpu *head; + + head = per_cpu_ptr(dlist_tmp.head, cpu); + INIT_LIST_HEAD(&head->list); + head->lock = __SPIN_LOCK_UNLOCKED(&head->lock); + lockdep_set_class(&head->lock,&dlock_list_key); + } + + dlist->head = dlist_tmp.head; Just use dlist->head directly or use local __perpcu head pointer? I just don't want to expose the structure to world until it is fully initialized. If you think I am over-cautious, I can use dlist->head as suggested. I don't think it makes any actual difference. No strong opinion either way. Just use local __percpu head pointer then? + return 0; +} +EXPORT_SYMBOL(alloc_dlock_list_head); Does this actually need to be exported? If so, it might be a better idea to start with EXPORT_SYMBOL_GPL(). For the current use case, we probably don't need to export the symbols. Other use cases may require that. I will change it to use the version instead. If it's not immediately necessary, it's best to not export at all. I will remove the EXPORT statements. +void dlock_list_del(struct dlock_list_node *node) +{ + spinlock_t *lock = READ_ONCE(node->lockptr); + + if (unlikely(!lock)) { + WARN_ONCE(1, + "dlock_list_del: node 0x%lx has no associated lock\n", + (unsigned long)node); Maybe "if (WARN_ONCE(!lock...)"? WARN_ONCE implies unlikely. OK, will do that. + return; + } + + spin_lock(lock); + if (likely(lock == node->lockptr)) { + list_del_init(&node->list); + node->lockptr = NULL; + } else { + /* +* This path should never be executed. +*/ + WARN_ON_ONCE(1); + } This still kinda bothers me because this pretty much requires the users to have strong synchronization around the operations and makes it unusable in situations where opportunistic behaviors are acceptable. It negates the usefulness quite a bit. I understand your concern. I will make it retry again with the new lock. It doesn't necessarily have to retry but shouldn't break down when used in an opportunistic racy way - e.g. if adds and removes race, the order of operations isn't clearly defined as such any outcome is fine as long as the list maintains its integrity. I am going to retry it if the lock changes to different one and ignore it if it becomes NULL. +/** + * dlock_list_next_safe - Removal-safe iterator of dlock list + * @dlist: Pointer to the dlock_list_head structure + * @iter : Pointer to the dlock list iterator structure + * Return: true if the next entry is found, false if all the entries iterated + * + * The iterator has to be properly initialized before calling this function. + * This iteration function is safe with respect to list entry removal. + * However, it cannot correctly iterate newly added entries right after the + * current one. + */ This still looks wrong to me. If you want to provide the two variants of iterations, can't you just implement one next function and build the two types of iterations on top of it? I have been thinking about making dlock_list_next_cpu() the real externa
Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
Hello, On Tue, Jul 19, 2016 at 02:42:31PM -0400, Waiman Long wrote: > On 07/18/2016 07:38 PM, Tejun Heo wrote: > > > +struct dlock_list_node { > > > + struct list_head list; > > > + spinlock_t *lockptr; > > > +}; > > Wouldn't it be better to point to dlock_list_percpu? > > I could. However, the only thing that matter is the spinlock that protects > the list entry. Yeah, we can get back to this when it's actually necessary. It just looked a bit weird to me. > > > +/* > > > + * The dlock list iteration functions which return true if iteration has > > > + * to be continued. > > > + */ > > > +extern bool dlock_list_next(struct dlock_list_head *dlist, > > > + struct dlock_list_iter *iter); > > > +extern bool dlock_list_next_safe(struct dlock_list_head *dlist, > > > + struct dlock_list_iter *iter); > > Why not return dlock_list_node * for the current node? That'd more > > conventional and allows dlock_list_iter to be opaque. > > Yes, I can make it return dlock_list_node *. > > However, to make dlock_list_iter opaque, I will have to dynamically allocate > the structure. That will add an extra memory allocation and free calls as > well as handling the error case of running out of memory. I don't think that > is worth doing at this point. Sure, keep it defined in the header file. Just don't require users to reach into it and add a comment saying that the struct is opaque to its users. > > > +int alloc_dlock_list_head(struct dlock_list_head *dlist) > > > +{ > > > + struct dlock_list_head dlist_tmp; > > > + int cpu; > > > + > > > + dlist_tmp.head = alloc_percpu(struct dlock_list_head_percpu); > > > + if (!dlist_tmp.head) > > > + return -ENOMEM; > > > + > > > + for_each_possible_cpu(cpu) { > > > + struct dlock_list_head_percpu *head; > > > + > > > + head = per_cpu_ptr(dlist_tmp.head, cpu); > > > + INIT_LIST_HEAD(&head->list); > > > + head->lock = __SPIN_LOCK_UNLOCKED(&head->lock); > > > + lockdep_set_class(&head->lock,&dlock_list_key); > > > + } > > > + > > > + dlist->head = dlist_tmp.head; > > Just use dlist->head directly or use local __perpcu head pointer? > > I just don't want to expose the structure to world until it is fully > initialized. If you think I am over-cautious, I can use dlist->head as > suggested. I don't think it makes any actual difference. No strong opinion either way. Just use local __percpu head pointer then? > > > + return 0; > > > +} > > > +EXPORT_SYMBOL(alloc_dlock_list_head); > > Does this actually need to be exported? If so, it might be a better > > idea to start with EXPORT_SYMBOL_GPL(). > > For the current use case, we probably don't need to export the symbols. > Other use cases may require that. I will change it to use the version > instead. If it's not immediately necessary, it's best to not export at all. > > > +void dlock_list_del(struct dlock_list_node *node) > > > +{ > > > + spinlock_t *lock = READ_ONCE(node->lockptr); > > > + > > > + if (unlikely(!lock)) { > > > + WARN_ONCE(1, > > > + "dlock_list_del: node 0x%lx has no associated lock\n", > > > + (unsigned long)node); > > Maybe "if (WARN_ONCE(!lock...)"? WARN_ONCE implies unlikely. > > OK, will do that. > > > > + return; > > > + } > > > + > > > + spin_lock(lock); > > > + if (likely(lock == node->lockptr)) { > > > + list_del_init(&node->list); > > > + node->lockptr = NULL; > > > + } else { > > > + /* > > > + * This path should never be executed. > > > + */ > > > + WARN_ON_ONCE(1); > > > + } > > This still kinda bothers me because this pretty much requires the > > users to have strong synchronization around the operations and makes > > it unusable in situations where opportunistic behaviors are > > acceptable. It negates the usefulness quite a bit. > > I understand your concern. I will make it retry again with the new lock. It doesn't necessarily have to retry but shouldn't break down when used in an opportunistic racy way - e.g. if adds and removes race, the order of operations isn't clearly defined as such any outcome is fine as long as the list maintains its integrity. > > > +/** > > > + * dlock_list_next_safe - Removal-safe iterator of dlock list > > > + * @dlist: Pointer to the dlock_list_head structure > > > + * @iter : Pointer to the dlock list iterator structure > > > + * Return: true if the next entry is found, false if all the entries > > > iterated > > > + * > > > + * The iterator has to be properly initialized before calling this > > > function. > > > + * This iteration function is safe with respect to list entry removal. > > > + * However, it cannot correctly iterate newly added entries right after > > > the > > > + * current one. > > > + */ > > This still looks wrong to me. If you want to provide the two variants > > of iterations, can't you just implement one next function and build > > the two types
Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
On 07/19/2016 01:00 AM, Al Viro wrote: On Fri, Jul 15, 2016 at 01:39:40PM -0400, Waiman Long wrote: +struct dlock_list_head_percpu { + struct list_head list; + spinlock_t lock; +}; +#define DLOCK_LIST_HEAD_PERCPU_INIT(name) \ + { \ + .list.prev =&name.list, \ + .list.next =&name.list, \ + .list.lock = __SPIN_LOCK_UNLOCKED(name),\ What's .list.lock and how does that even compile? Yes, it is a typo. This macro is not used. That is why there is no compilation error. I will remove it from the patch. +extern bool dlock_list_next(struct dlock_list_head *dlist, + struct dlock_list_iter *iter); Ugh... Why not dlist_for_each_entry(), seeing that all users end up with the same boilerplate? Right, I could make a dlock_list_for_each_entry() that encapsulate the boilerplate. I will work on that. Thanks, Longman
Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
On 07/18/2016 07:38 PM, Tejun Heo wrote: Hello, Waiman. On Fri, Jul 15, 2016 at 01:39:40PM -0400, Waiman Long wrote: Suggested-by: Tejun Heo Not sure I should be on suggested-by given that this wasn't my idea at all. I put the tag there because of your suggestion to restructure the data structure which did make the patch better. I can remove that tag if you think it is not appropriate. +/* + * include/linux/dlock-list.h + * + * A distributed (per-cpu) set of lists each of which is protected by its + * own spinlock, but acts like a single consolidated list to the callers. + * + * The dlock_list_head_percpu structure contains the spinlock, the other + * dlock_list_node structures only contains a pointer to the spinlock in + * dlock_list_head_percpu. + */ The more I think about it, the more bothered I'm about the dlock_list name. For the most part, this isn't different from other percpu data structures in the kernel. Sure, it might benefit from doing Nth cpu, but so are other percpu data structures and it's not just "distributed lock" list either. The list itself is percpu, not just locking. Can we please go back to percpu_list? Christoph, what do you think? As I said before, I don't mind reverting the name back to percpu_list. I am just waiting for a final agreement. +struct dlock_list_node { + struct list_head list; + spinlock_t *lockptr; +}; Wouldn't it be better to point to dlock_list_percpu? I could. However, the only thing that matter is the spinlock that protects the list entry. +#define DLOCK_LIST_HEAD_PERCPU_INIT(name) \ + { \ + .list.prev =&name.list, \ + .list.next =&name.list, \ Use LIST_HEAD_INIT()? Also, why do we even need the initializers if the data structure can only be dynamically allocated. In fact, do the definitions even need to be exposed in the header? I put it there for completeness sake. You are right. That macro isn't currently used. I will remove it in the next iteration of the patch. + .list.lock = __SPIN_LOCK_UNLOCKED(name),\ + } + +/* + * dlock list iteration state + */ +struct dlock_list_iter { + int cpu; ^ I'm not sure lining up with space here is common in kernel. OK, I will remove the extra spaces to make it more conformant to the kernel style. + spinlock_t *lock; + struct list_head*head; /* List head of current per-cpu list */ + struct dlock_list_node *curr; + struct dlock_list_node *next; +}; + +#define DLOCK_LIST_ITER_INIT() \ ^ Don't we usually omit () in these cases? Good point. Will remove the unneeded (). + { \ + .cpu = -1, \ + } + +#define DEFINE_DLOCK_LIST_ITER(s) \ + struct dlock_list_iter s = DLOCK_LIST_ITER_INIT() + +static inline void init_dlock_list_iter(struct dlock_list_iter *iter) +{ + *iter = (struct dlock_list_iter)DLOCK_LIST_ITER_INIT(); +} + +#define DLOCK_LIST_NODE_INIT(name) \ + { \ + .list.prev =&name.list, \ + .list.next =&name.list, \ ^ LIST_HEAD_INIT()? Will make the change. + } + +static inline void init_dlock_list_node(struct dlock_list_node *node) +{ + INIT_LIST_HEAD(&node->list); + node->lockptr = NULL; Why not use DLOCK_LIST_NODE_INIT()? Yes, I can make the change. +} + +/* + * Check if all the dlock lists are empty + * + * This can be a pretty expensive function call. If this function is required + * in a performance critical path, we may have to maintain a global count + * of the list entries in the global dlock_list_head structure instead. + */ /** function comment please. Sure. +static inline bool dlock_list_empty(struct dlock_list_head *dlist) +{ + int cpu; + + for_each_possible_cpu(cpu) + if (!list_empty(&per_cpu_ptr(dlist->head, cpu)->list)) + return false; + return true; +} + +/* + * Allocation and freeing of dlock list + */ +extern int alloc_dlock_list_head(struct dlock_list_head *dlist); ^ ditto with alignment Will remove the extra space. +extern void free_dlock_list_head(struct dlock_list_head *dlist); + +/* + * The dlock list iteration functions which return true if iteration has + * to be continued. + */ +extern bool dlock_list_next(struct dlock_list_head *dlist, + struct dlock_list_iter *iter); +extern bool dlock_list_next_safe(struct dlock_list_head *dlist, +struct dlock_list_iter *iter); Why not return dlo
Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
On Fri, Jul 15, 2016 at 01:39:40PM -0400, Waiman Long wrote: > +struct dlock_list_head_percpu { > + struct list_head list; > + spinlock_t lock; > +}; > +#define DLOCK_LIST_HEAD_PERCPU_INIT(name)\ > + { \ > + .list.prev = &name.list,\ > + .list.next = &name.list,\ > + .list.lock = __SPIN_LOCK_UNLOCKED(name),\ What's .list.lock and how does that even compile? > +extern bool dlock_list_next(struct dlock_list_head *dlist, > + struct dlock_list_iter *iter); Ugh... Why not dlist_for_each_entry(), seeing that all users end up with the same boilerplate?
Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
Hello, Waiman. On Fri, Jul 15, 2016 at 01:39:40PM -0400, Waiman Long wrote: > Suggested-by: Tejun Heo Not sure I should be on suggested-by given that this wasn't my idea at all. > +/* > + * include/linux/dlock-list.h > + * > + * A distributed (per-cpu) set of lists each of which is protected by its > + * own spinlock, but acts like a single consolidated list to the callers. > + * > + * The dlock_list_head_percpu structure contains the spinlock, the other > + * dlock_list_node structures only contains a pointer to the spinlock in > + * dlock_list_head_percpu. > + */ The more I think about it, the more bothered I'm about the dlock_list name. For the most part, this isn't different from other percpu data structures in the kernel. Sure, it might benefit from doing Nth cpu, but so are other percpu data structures and it's not just "distributed lock" list either. The list itself is percpu, not just locking. Can we please go back to percpu_list? Christoph, what do you think? > +struct dlock_list_node { > + struct list_head list; > + spinlock_t *lockptr; > +}; Wouldn't it be better to point to dlock_list_percpu? > +#define DLOCK_LIST_HEAD_PERCPU_INIT(name)\ > + { \ > + .list.prev = &name.list,\ > + .list.next = &name.list,\ Use LIST_HEAD_INIT()? Also, why do we even need the initializers if the data structure can only be dynamically allocated. In fact, do the definitions even need to be exposed in the header? > + .list.lock = __SPIN_LOCK_UNLOCKED(name),\ > + } > + > +/* > + * dlock list iteration state > + */ > +struct dlock_list_iter { > + int cpu; ^ I'm not sure lining up with space here is common in kernel. > + spinlock_t *lock; > + struct list_head*head; /* List head of current per-cpu list */ > + struct dlock_list_node *curr; > + struct dlock_list_node *next; > +}; > + > +#define DLOCK_LIST_ITER_INIT() \ ^ Don't we usually omit () in these cases? > + { \ > + .cpu = -1, \ > + } > + > +#define DEFINE_DLOCK_LIST_ITER(s)\ > + struct dlock_list_iter s = DLOCK_LIST_ITER_INIT() > + > +static inline void init_dlock_list_iter(struct dlock_list_iter *iter) > +{ > + *iter = (struct dlock_list_iter)DLOCK_LIST_ITER_INIT(); > +} > + > +#define DLOCK_LIST_NODE_INIT(name) \ > + { \ > + .list.prev = &name.list,\ > + .list.next = &name.list,\ ^ LIST_HEAD_INIT()? > + } > + > +static inline void init_dlock_list_node(struct dlock_list_node *node) > +{ > + INIT_LIST_HEAD(&node->list); > + node->lockptr = NULL; Why not use DLOCK_LIST_NODE_INIT()? > +} > + > +/* > + * Check if all the dlock lists are empty > + * > + * This can be a pretty expensive function call. If this function is required > + * in a performance critical path, we may have to maintain a global count > + * of the list entries in the global dlock_list_head structure instead. > + */ /** function comment please. > +static inline bool dlock_list_empty(struct dlock_list_head *dlist) > +{ > + int cpu; > + > + for_each_possible_cpu(cpu) > + if (!list_empty(&per_cpu_ptr(dlist->head, cpu)->list)) > + return false; > + return true; > +} > + > +/* > + * Allocation and freeing of dlock list > + */ > +extern int alloc_dlock_list_head(struct dlock_list_head *dlist); ^ ditto with alignment > +extern void free_dlock_list_head(struct dlock_list_head *dlist); > + > +/* > + * The dlock list iteration functions which return true if iteration has > + * to be continued. > + */ > +extern bool dlock_list_next(struct dlock_list_head *dlist, > + struct dlock_list_iter *iter); > +extern bool dlock_list_next_safe(struct dlock_list_head *dlist, > + struct dlock_list_iter *iter); Why not return dlock_list_node * for the current node? That'd more conventional and allows dlock_list_iter to be opaque. > diff --git a/lib/dlock-list.c b/lib/dlock-list.c > new file mode 100644 > index 000..af4a9f3 > --- /dev/null > +++ b/lib/dlock-list.c ... > +int alloc_dlock_list_head(struct dlock_list_head *dlist) > +{ > + struct dlock_list_head dlist_tmp; > + int cpu; > + > + dlist_tmp.head = alloc_percpu(struct dlock_list_head_percpu); > + if (!dlist_tmp.head) > + return -ENOMEM; > + > + for_each_possible_cpu(cpu) { > + struct dlock_list_head_percpu *head; > + > + head = per_cpu_ptr(dlist_tmp.head, cpu); > + INIT_
[PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists
Linked list is used everywhere in the Linux kernel. However, if many threads are trying to add or delete entries into the same linked list, it can create a performance bottleneck. This patch introduces a new list APIs that provide a set of distributed lists (one per CPU), each of which is protected by its own spinlock. To the callers, however, the set of lists acts like a single consolidated list. This allows list entries insertion and deletion operations to happen in parallel instead of being serialized with a global list and lock. List entry insertion is strictly per cpu. List deletion, however, can happen in a cpu other than the one that did the insertion. So we still need lock to protect the list. Because of that, there may still be a small amount of contention when deletion is being done. A new header file include/linux/dlock-list.h will be added with the associated dlock_list_head and dlock_list_node structures. The following functions are provided to manage the per-cpu list: 1. int init_dlock_list_head(struct dlock_list_head *dlist) 2. void dlock_list_add(struct dlock_list_node *node, struct dlock_list_head *dlist) 3. void dlock_list_del(struct dlock_list *node) Iteration of all the list entries within a group of per-cpu lists is done by calling either the dlock_list_next() or dlock_list_next_safe() functions in a while loop. They correspond to the list_for_each_entry() and list_for_each_entry_safe() macros respectively. The iteration states are keep in a dlock_list_iter structure that is passed to the iteration functions. Suggested-by: Tejun Heo Signed-off-by: Waiman Long Reviewed-by: Jan Kara --- include/linux/dlock-list.h | 135 +++ lib/Makefile |2 +- lib/dlock-list.c | 254 3 files changed, 390 insertions(+), 1 deletions(-) create mode 100644 include/linux/dlock-list.h create mode 100644 lib/dlock-list.c diff --git a/include/linux/dlock-list.h b/include/linux/dlock-list.h new file mode 100644 index 000..2647b7d --- /dev/null +++ b/include/linux/dlock-list.h @@ -0,0 +1,135 @@ +/* + * Distributed and locked list + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * (C) Copyright 2016 Hewlett-Packard Enterprise Development LP + * + * Authors: Waiman Long + */ +#ifndef __LINUX_DLOCK_LIST_H +#define __LINUX_DLOCK_LIST_H + +#include +#include +#include + +/* + * include/linux/dlock-list.h + * + * A distributed (per-cpu) set of lists each of which is protected by its + * own spinlock, but acts like a single consolidated list to the callers. + * + * The dlock_list_head_percpu structure contains the spinlock, the other + * dlock_list_node structures only contains a pointer to the spinlock in + * dlock_list_head_percpu. + */ +struct dlock_list_head_percpu { + struct list_head list; + spinlock_t lock; +}; + +struct dlock_list_head { + struct dlock_list_head_percpu __percpu *head; +}; + +/* + * dlock list node data structure + */ +struct dlock_list_node { + struct list_head list; + spinlock_t *lockptr; +}; + +#define DLOCK_LIST_HEAD_PERCPU_INIT(name) \ + { \ + .list.prev = &name.list,\ + .list.next = &name.list,\ + .list.lock = __SPIN_LOCK_UNLOCKED(name),\ + } + +/* + * dlock list iteration state + */ +struct dlock_list_iter { + int cpu; + spinlock_t *lock; + struct list_head*head; /* List head of current per-cpu list */ + struct dlock_list_node *curr; + struct dlock_list_node *next; +}; + +#define DLOCK_LIST_ITER_INIT() \ + { \ + .cpu = -1, \ + } + +#define DEFINE_DLOCK_LIST_ITER(s) \ + struct dlock_list_iter s = DLOCK_LIST_ITER_INIT() + +static inline void init_dlock_list_iter(struct dlock_list_iter *iter) +{ + *iter = (struct dlock_list_iter)DLOCK_LIST_ITER_INIT(); +} + +#define DLOCK_LIST_NODE_INIT(name) \ + { \ + .list.prev = &name.list,\ + .list.next = &name.list,\ + } + +static inline void init_dlock_list_node(struct dlock_list_node *node) +{ + INIT_LIST_HEAD(&node->list); +