from:"Rao Shoaib"

[PATCH v2 1/1] af_unix: Allow Unix sockets to raise SIGURG

2021-02-10 Thread rao . shoaib

From: Rao Shoaib 

TCP sockets allow SIGURG to be sent to the process holding the other
end of the socket.  Extend Unix sockets to have the same ability
but only if the data length is zero.

The API is the same in that the sender uses sendmsg() with MSG_OOB to
raise SIGURG.  Unix sockets behave in the same way as TCP sockets with
SO_OOBINLINE set.

SIGURG is ignored by default, so applications which do not know about this
feature will be unaffected.  In addition to installing a SIGURG handler,
the receiving application must call F_SETOWN or F_SETOWN_EX to indicate
which process or thread should receive the signal.

Signed-off-by: Rao Shoaib 
Signed-off-by: Matthew Wilcox (Oracle) 
---
 net/unix/af_unix.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 92784e5..65f6179 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1840,7 +1840,8 @@ static int unix_stream_sendmsg(struct socket *sock, 
struct msghdr *msg,
return err;
 
err = -EOPNOTSUPP;
-   if (msg->msg_flags_OOB)
+
+   if (len && (msg->msg_flags & MSG_OOB))
goto out_err;
 
if (msg->msg_namelen) {
@@ -1856,6 +1857,9 @@ static int unix_stream_sendmsg(struct socket *sock, 
struct msghdr *msg,
if (sk->sk_shutdown & SEND_SHUTDOWN)
goto pipe_err;
 
+   if (msg->msg_flags & MSG_OOB)
+   sk_send_sigurg(other);
+
while (sent < len) {
size = len - sent;
 
-- 
1.8.3.1

[PATCH v1 0/1] rxe driver should dynamically caclculate inline data size

2019-10-23 Thread rao Shoaib

From: Rao Shoaib 

Resending because of typo in the email addresses.

Currently rxe driver has a hard coded value for inline data size, where as mlx5 
driver calculates the size of inline data and number of SGE's to use based on 
the values in the qp request. Some applications depend on this behavior. This 
patch changes rxe to dynamically calculate the values.

Rao Shoaib (1):
  rxe: calculate inline data size based on requested values

 drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
 drivers/infiniband/sw/rxe/rxe_qp.c| 4 
 2 files changed, 5 insertions(+), 1 deletion(-)

-- 
1.8.3.1

[PATCH v1 1/1] rxe: calculate inline data size based on requested values

2019-10-23 Thread rao Shoaib

From: Rao Shoaib 

rxe driver has a hard coded value for the size of inline data, where as
mlx5 driver calculates number of SGE's and inline data size based on the
values in the qp request. This patch modifies rxe driver to do the same
so that applications can work seamlessly across drivers.

Signed-off-by: Rao Shoaib 
---
 drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
 drivers/infiniband/sw/rxe/rxe_qp.c| 4 
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_param.h 
b/drivers/infiniband/sw/rxe/rxe_param.h
index 1b596fb..657f9303 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -68,7 +68,6 @@ enum rxe_device_param {
RXE_HW_VER  = 0,
RXE_MAX_QP  = 0x1,
RXE_MAX_QP_WR   = 0x4000,
-   RXE_MAX_INLINE_DATA = 400,
RXE_DEVICE_CAP_FLAGS= IB_DEVICE_BAD_PKEY_CNTR
| IB_DEVICE_BAD_QKEY_CNTR
| IB_DEVICE_AUTO_PATH_MIG
@@ -81,6 +80,7 @@ enum rxe_device_param {
| IB_DEVICE_MEM_MGT_EXTENSIONS,
RXE_MAX_SGE = 32,
RXE_MAX_SGE_RD  = 32,
+   RXE_MAX_INLINE_DATA = RXE_MAX_SGE * sizeof(struct ib_sge),
RXE_MAX_CQ  = 16384,
RXE_MAX_LOG_CQE = 15,
RXE_MAX_MR  = 2 * 1024,
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c 
b/drivers/infiniband/sw/rxe/rxe_qp.c
index aeea994..45b5da5 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -229,6 +229,7 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, struct 
rxe_qp *qp,
 {
int err;
int wqe_size;
+   unsigned int inline_size;
 
err = sock_create_kern(_net, AF_INET, SOCK_DGRAM, 0, >sk);
if (err < 0)
@@ -244,6 +245,9 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, struct 
rxe_qp *qp,
 sizeof(struct rxe_send_wqe) +
 qp->sq.max_inline);
 
+   inline_size = wqe_size - sizeof(struct rxe_send_wqe);
+   qp->sq.max_inline = inline_size;
+   init->cap.max_inline_data = inline_size;
qp->sq.queue = rxe_queue_init(rxe,
  >sq.max_wr,
  wqe_size);
-- 
1.8.3.1

Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-04 Thread Rao Shoaib




On 04/04/2018 12:16 AM, Rao Shoaib wrote:



On 04/03/2018 07:23 PM, Matthew Wilcox wrote:

On Tue, Apr 03, 2018 at 05:55:55PM -0700, Rao Shoaib wrote:

On 04/03/2018 01:58 PM, Matthew Wilcox wrote:

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.

I have just familiarized myself with what IDR is by reading your 
article. If

I am incorrect please correct me.

The list and head you have pointed are only used  if the container 
can not
be allocated. That could happen with IDR as well. Note that the 
containers

are allocated at boot time and are re-used.

No, it can't happen with the IDR.  The IDR can always contain one entry
without allocating anything.  If you fail to allocate the second entry,
just free the first entry.


IDR seems to have some overhead, such as I have to specifically add the
pointer and free the ID, plus radix tree maintenance.

... what?  Adding a pointer is simply idr_alloc(), and you get back an
integer telling you which index it has.  Your data structure has its
own set of overhead.
The only overhead is a pointer that points to the head and an int to 
keep count. If I use idr, I would have to allocate an struct idr which 
is much larger. idr_alloc()/idr_destroy() operations are much more 
costly than updating two pointers. As the pointers are stored in 
slots/nodes corresponding to the id, I would  have to retrieve the 
pointers by calling idr_remove() to pass them to be freed, the 
slots/nodes would constantly be allocated and freed.


IDR is a very useful interface for allocating/managing ID's but I 
really do not see the justification for using it over here, perhaps 
you can elaborate more on the benefits and also on how I can just pass 
the array to be freed.


Shoaib

I may have mis-understood your comment. You are probably suggesting that 
I use IDR instead of allocating following containers.


+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;


IDR uses radix_tree_node which allocates following two arrays. since I 
do not need any ID's why not just use the radix_tree_node directly, but 
I do not need a radix tree either, so why not just use an array. That is 
what I am doing.


void __rcu  *slots[RADIX_TREE_MAP_SIZE];
unsigned long   tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS]; ==> Not 
needed


As far as allocation failure is concerned, the allocation has to be done 
at run time. If the allocation of a container can fail, so can the 
allocation of radix_tree_node as it also requires memory.


I really do not see any advantages of using IDR. The structure I have is 
much simpler and does exactly what I need.


Shoaib

Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-04 Thread Rao Shoaib




On 04/04/2018 12:16 AM, Rao Shoaib wrote:



On 04/03/2018 07:23 PM, Matthew Wilcox wrote:

On Tue, Apr 03, 2018 at 05:55:55PM -0700, Rao Shoaib wrote:

On 04/03/2018 01:58 PM, Matthew Wilcox wrote:

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.

I have just familiarized myself with what IDR is by reading your 
article. If

I am incorrect please correct me.

The list and head you have pointed are only used  if the container 
can not
be allocated. That could happen with IDR as well. Note that the 
containers

are allocated at boot time and are re-used.

No, it can't happen with the IDR.  The IDR can always contain one entry
without allocating anything.  If you fail to allocate the second entry,
just free the first entry.


IDR seems to have some overhead, such as I have to specifically add the
pointer and free the ID, plus radix tree maintenance.

... what?  Adding a pointer is simply idr_alloc(), and you get back an
integer telling you which index it has.  Your data structure has its
own set of overhead.
The only overhead is a pointer that points to the head and an int to 
keep count. If I use idr, I would have to allocate an struct idr which 
is much larger. idr_alloc()/idr_destroy() operations are much more 
costly than updating two pointers. As the pointers are stored in 
slots/nodes corresponding to the id, I would  have to retrieve the 
pointers by calling idr_remove() to pass them to be freed, the 
slots/nodes would constantly be allocated and freed.


IDR is a very useful interface for allocating/managing ID's but I 
really do not see the justification for using it over here, perhaps 
you can elaborate more on the benefits and also on how I can just pass 
the array to be freed.


Shoaib

I may have mis-understood your comment. You are probably suggesting that 
I use IDR instead of allocating following containers.


+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;


IDR uses radix_tree_node which allocates following two arrays. since I 
do not need any ID's why not just use the radix_tree_node directly, but 
I do not need a radix tree either, so why not just use an array. That is 
what I am doing.


void __rcu  *slots[RADIX_TREE_MAP_SIZE];
unsigned long   tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS]; ==> Not 
needed


As far as allocation failure is concerned, the allocation has to be done 
at run time. If the allocation of a container can fail, so can the 
allocation of radix_tree_node as it also requires memory.


I really do not see any advantages of using IDR. The structure I have is 
much simpler and does exactly what I need.


Shoaib

Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-04 Thread Rao Shoaib




On 04/02/2018 10:20 AM, Christopher Lameter wrote:

On Sun, 1 Apr 2018, rao.sho...@oracle.com wrote:


kfree_rcu() should use the new kfree_bulk() interface for freeing
rcu structures as it is more efficient.

It would be even better if this approach could also use

kmem_cache_free_bulk()

or

kfree_bulk()
Sorry I do not understand your comment. The patch is using kfree_bulk() 
which is an inline function.


Shoaib

Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-04 Thread Rao Shoaib




On 04/02/2018 10:20 AM, Christopher Lameter wrote:

On Sun, 1 Apr 2018, rao.sho...@oracle.com wrote:


kfree_rcu() should use the new kfree_bulk() interface for freeing
rcu structures as it is more efficient.

It would be even better if this approach could also use

kmem_cache_free_bulk()

or

kfree_bulk()
Sorry I do not understand your comment. The patch is using kfree_bulk() 
which is an inline function.


Shoaib

Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-04 Thread Rao Shoaib




On 04/03/2018 07:23 PM, Matthew Wilcox wrote:

On Tue, Apr 03, 2018 at 05:55:55PM -0700, Rao Shoaib wrote:

On 04/03/2018 01:58 PM, Matthew Wilcox wrote:

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.


I have just familiarized myself with what IDR is by reading your article. If
I am incorrect please correct me.

The list and head you have pointed are only used  if the container can not
be allocated. That could happen with IDR as well. Note that the containers
are allocated at boot time and are re-used.

No, it can't happen with the IDR.  The IDR can always contain one entry
without allocating anything.  If you fail to allocate the second entry,
just free the first entry.


IDR seems to have some overhead, such as I have to specifically add the
pointer and free the ID, plus radix tree maintenance.

... what?  Adding a pointer is simply idr_alloc(), and you get back an
integer telling you which index it has.  Your data structure has its
own set of overhead.
The only overhead is a pointer that points to the head and an int to 
keep count. If I use idr, I would have to allocate an struct idr which 
is much larger. idr_alloc()/idr_destroy() operations are much more 
costly than updating two pointers. As the pointers are stored in 
slots/nodes corresponding to the id, I would  have to retrieve the 
pointers by calling idr_remove() to pass them to be freed, the 
slots/nodes would constantly be allocated and freed.


IDR is a very useful interface for allocating/managing ID's but I really 
do not see the justification for using it over here, perhaps you can 
elaborate more on the benefits and also on how I can just pass the array 
to be freed.


Shoaib

Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-04 Thread Rao Shoaib




On 04/03/2018 07:23 PM, Matthew Wilcox wrote:

On Tue, Apr 03, 2018 at 05:55:55PM -0700, Rao Shoaib wrote:

On 04/03/2018 01:58 PM, Matthew Wilcox wrote:

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.


I have just familiarized myself with what IDR is by reading your article. If
I am incorrect please correct me.

The list and head you have pointed are only used  if the container can not
be allocated. That could happen with IDR as well. Note that the containers
are allocated at boot time and are re-used.

No, it can't happen with the IDR.  The IDR can always contain one entry
without allocating anything.  If you fail to allocate the second entry,
just free the first entry.


IDR seems to have some overhead, such as I have to specifically add the
pointer and free the ID, plus radix tree maintenance.

... what?  Adding a pointer is simply idr_alloc(), and you get back an
integer telling you which index it has.  Your data structure has its
own set of overhead.
The only overhead is a pointer that points to the head and an int to 
keep count. If I use idr, I would have to allocate an struct idr which 
is much larger. idr_alloc()/idr_destroy() operations are much more 
costly than updating two pointers. As the pointers are stored in 
slots/nodes corresponding to the id, I would  have to retrieve the 
pointers by calling idr_remove() to pass them to be freed, the 
slots/nodes would constantly be allocated and freed.


IDR is a very useful interface for allocating/managing ID's but I really 
do not see the justification for using it over here, perhaps you can 
elaborate more on the benefits and also on how I can just pass the array 
to be freed.


Shoaib

Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-03 Thread Rao Shoaib



On 04/03/2018 01:58 PM, Matthew Wilcox wrote:

On Tue, Apr 03, 2018 at 10:22:53AM -0700, rao.sho...@oracle.com wrote:

+++ b/mm/slab.h
@@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
  } kmalloc_info[];
  
+#define	RCU_MAX_ACCUMULATE_SIZE	25

+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head *rbf_list_head;
+   int rbf_list_size;
+   int rbf_cpu;
+   int rbf_empty;
+   int rbf_polled;
+   boolrbf_init;
+   boolrbf_monitor;
+};

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.

I have just familiarized myself with what IDR is by reading your 
article. If I am incorrect please correct me.


The list and head you have pointed are only used  if the container can 
not be allocated. That could happen with IDR as well. Note that the 
containers are allocated at boot time and are re-used.


IDR seems to have some overhead, such as I have to specifically add the 
pointer and free the ID, plus radix tree maintenance.


The change would also require retesting. So I would like to keep the 
current design.


Regards,

Shoaib

Re: [PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-03 Thread Rao Shoaib



On 04/03/2018 01:58 PM, Matthew Wilcox wrote:

On Tue, Apr 03, 2018 at 10:22:53AM -0700, rao.sho...@oracle.com wrote:

+++ b/mm/slab.h
@@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
  } kmalloc_info[];
  
+#define	RCU_MAX_ACCUMULATE_SIZE	25

+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head *rbf_list_head;
+   int rbf_list_size;
+   int rbf_cpu;
+   int rbf_empty;
+   int rbf_polled;
+   boolrbf_init;
+   boolrbf_monitor;
+};

I think you might be better off with an IDR.  The IDR can always
contain one entry, so there's no need for this 'rbf_list_head' or
__rcu_bulk_schedule_list.  The IDR contains its first 64 entries in
an array (if that array can be allocated), so it's compatible with the
kfree_bulk() interface.

I have just familiarized myself with what IDR is by reading your 
article. If I am incorrect please correct me.


The list and head you have pointed are only used  if the container can 
not be allocated. That could happen with IDR as well. Note that the 
containers are allocated at boot time and are re-used.


IDR seems to have some overhead, such as I have to specifically add the 
pointer and free the ID, plus radix tree maintenance.


The change would also require retesting. So I would like to keep the 
current design.


Regards,

Shoaib

[PATCH 0/2] Move kfree_rcu out of rcu code and use kfree_bulk

2018-04-03 Thread rao . shoaib

From: Rao Shoaib <rao.sho...@oracle.com>

This patch moves kfree_call_rcu() out of rcu related code to
mm/slab_common and updates kfree_rcu() to use new bulk memory free
functions as they are more efficient.

This is a resubmission of the previous patch.

Changes since last submission
Surrounded code with 'CONFIG_TREE_RCU || CONFIG_PREEMPT_RCU'
to separate tinyurl definitions.

Diff of the changes:

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 6338fb6..102a93f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,8 +55,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
-/* only for use by kfree_call_rcu() */
-void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
@@ -210,6 +208,8 @@ do { \
 
 #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
 #include 
+/* only for use by kfree_call_rcu() */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 #elif defined(CONFIG_TINY_RCU)
 #include 
 #else
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 6e8afff..f126d08 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1526,6 +1526,7 @@ void kzfree(const void *p)
 }
 EXPORT_SYMBOL(kzfree);
 
+#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
 static DEFINE_PER_CPU(struct rcu_bulk_free, cpu_rbf);
 
 /* drain if atleast these many objects */
@@ -1696,6 +1697,7 @@ void kfree_call_rcu(struct rcu_head *head,
__rcu_bulk_free(head, func);
 }
 EXPORT_SYMBOL_GPL(kfree_call_rcu);
+#endif

Previous Changes:

1) checkpatch.pl has been fixed, so kfree_rcu macro is much simpler

2) To handle preemption, preempt_enable()/preempt_disable() statements
   have been added to __rcu_bulk_free().

Rao Shoaib (2):
  Move kfree_call_rcu() to slab_common.c
  kfree_rcu() should use kfree_bulk() interface

 include/linux/mm.h   |   5 ++
 include/linux/rcupdate.h |  43 +---
 include/linux/rcutiny.h  |   8 ++-
 include/linux/rcutree.h  |   2 -
 include/linux/slab.h |  42 
 kernel/rcu/tree.c|  24 +++
 kernel/sysctl.c  |  40 +++
 mm/slab.h|  23 +++
 mm/slab_common.c | 174 +++
 9 files changed, 304 insertions(+), 57 deletions(-)

-- 
2.7.4

[PATCH 0/2] Move kfree_rcu out of rcu code and use kfree_bulk

2018-04-03 Thread rao . shoaib

From: Rao Shoaib 

This patch moves kfree_call_rcu() out of rcu related code to
mm/slab_common and updates kfree_rcu() to use new bulk memory free
functions as they are more efficient.

This is a resubmission of the previous patch.

Changes since last submission
Surrounded code with 'CONFIG_TREE_RCU || CONFIG_PREEMPT_RCU'
to separate tinyurl definitions.

Diff of the changes:

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 6338fb6..102a93f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,8 +55,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
-/* only for use by kfree_call_rcu() */
-void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
@@ -210,6 +208,8 @@ do { \
 
 #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
 #include 
+/* only for use by kfree_call_rcu() */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 #elif defined(CONFIG_TINY_RCU)
 #include 
 #else
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 6e8afff..f126d08 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1526,6 +1526,7 @@ void kzfree(const void *p)
 }
 EXPORT_SYMBOL(kzfree);
 
+#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
 static DEFINE_PER_CPU(struct rcu_bulk_free, cpu_rbf);
 
 /* drain if atleast these many objects */
@@ -1696,6 +1697,7 @@ void kfree_call_rcu(struct rcu_head *head,
__rcu_bulk_free(head, func);
 }
 EXPORT_SYMBOL_GPL(kfree_call_rcu);
+#endif

Previous Changes:

1) checkpatch.pl has been fixed, so kfree_rcu macro is much simpler

2) To handle preemption, preempt_enable()/preempt_disable() statements
   have been added to __rcu_bulk_free().

Rao Shoaib (2):
  Move kfree_call_rcu() to slab_common.c
  kfree_rcu() should use kfree_bulk() interface

 include/linux/mm.h   |   5 ++
 include/linux/rcupdate.h |  43 +---
 include/linux/rcutiny.h  |   8 ++-
 include/linux/rcutree.h  |   2 -
 include/linux/slab.h |  42 
 kernel/rcu/tree.c|  24 +++
 kernel/sysctl.c  |  40 +++
 mm/slab.h|  23 +++
 mm/slab_common.c | 174 +++
 9 files changed, 304 insertions(+), 57 deletions(-)

-- 
2.7.4

[PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-03 Thread rao . shoaib

From: Rao Shoaib <rao.sho...@oracle.com>

kfree_rcu() should use the new kfree_bulk() interface for freeing
rcu structures as it is more efficient.

Signed-off-by: Rao Shoaib <rao.sho...@oracle.com>
---
 include/linux/mm.h   |   5 ++
 include/linux/rcupdate.h |   4 +-
 include/linux/rcutiny.h  |   8 ++-
 kernel/sysctl.c  |  40 
 mm/slab.h|  23 +++
 mm/slab_common.c | 166 ++-
 6 files changed, 242 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..fb1e54c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2673,5 +2673,10 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_kfree_rcu_drain_limit;
+extern int sysctl_kfree_rcu_poll_limit;
+extern int sysctl_kfree_rcu_empty_limit;
+extern int sysctl_kfree_rcu_caching_allowed;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 6338fb6..102a93f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,8 +55,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
-/* only for use by kfree_call_rcu() */
-void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
@@ -210,6 +208,8 @@ do { \
 
 #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
 #include 
+/* only for use by kfree_call_rcu() */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 #elif defined(CONFIG_TINY_RCU)
 #include 
 #else
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ce9beec..b9e9025 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -84,10 +84,16 @@ static inline void synchronize_sched_expedited(void)
synchronize_sched();
 }
 
+static inline void call_rcu_lazy(struct rcu_head *head,
+rcu_callback_t func)
+{
+   call_rcu(head, func);
+}
+
 static inline void kfree_call_rcu(struct rcu_head *head,
  rcu_callback_t func)
 {
-   call_rcu(head, func);
+   call_rcu_lazy(head, func);
 }
 
 #define rcu_note_context_switch(preempt) \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f98f28c..ab70c99 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1650,6 +1650,46 @@ static struct ctl_table vm_table[] = {
.extra2 = (void *)_rnd_compat_bits_max,
},
 #endif
+   {
+   .procname   = "kfree_rcu_drain_limit",
+   .data   = _kfree_rcu_drain_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_drain_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_poll_limit",
+   .data   = _kfree_rcu_poll_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_poll_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_empty_limit",
+   .data   = _kfree_rcu_empty_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_empty_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
+   {
+   .procname   = "kfree_rcu_caching_allowed",
+   .data   = _kfree_rcu_caching_allowed,
+   .maxlen = sizeof(sysctl_kfree_rcu_caching_allowed),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
{ }
 };
 
diff --git a/mm/slab.h b/mm/slab.h
index 5181323..a332ea6 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
 } kmalloc_info[];
 
+#defineRCU_MAX_ACCUMULATE_SIZE 25
+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_contai

[PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-04-03 Thread rao . shoaib

From: Rao Shoaib <rao.sho...@oracle.com>

kfree_call_rcu does not belong in linux/rcupdate.h and should be moved to
slab_common.c

Signed-off-by: Rao Shoaib <rao.sho...@oracle.com>
---
 include/linux/rcupdate.h | 43 +++
 include/linux/rcutree.h  |  2 --
 include/linux/slab.h | 42 ++
 kernel/rcu/tree.c| 24 ++--
 mm/slab_common.c | 10 ++
 5 files changed, 65 insertions(+), 56 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 043d047..6338fb6 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,6 +55,9 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
+/* only for use by kfree_call_rcu() */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
+
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
@@ -837,45 +840,6 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define __is_kfree_rcu_offset(offset) ((offset) < 4096)
 
 /*
- * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
- */
-#define __kfree_rcu(head, offset) \
-   do { \
-   BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
-   kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
-   } while (0)
-
-/**
- * kfree_rcu() - kfree an object after a grace period.
- * @ptr:   pointer to kfree
- * @rcu_head:  the name of the struct rcu_head within the type of @ptr.
- *
- * Many rcu callbacks functions just call kfree() on the base structure.
- * These functions are trivial, but their size adds up, and furthermore
- * when they are used in a kernel module, that module must invoke the
- * high-latency rcu_barrier() function at module-unload time.
- *
- * The kfree_rcu() function handles this issue.  Rather than encoding a
- * function address in the embedded rcu_head structure, kfree_rcu() instead
- * encodes the offset of the rcu_head structure within the base structure.
- * Because the functions are not allowed in the low-order 4096 bytes of
- * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
- * If the offset is larger than 4095 bytes, a compile-time error will
- * be generated in __kfree_rcu().  If this error is triggered, you can
- * either fall back to use of call_rcu() or rearrange the structure to
- * position the rcu_head structure into the first 4096 bytes.
- *
- * Note that the allowable offset might decrease in the future, for example,
- * to allow something like kmem_cache_free_rcu().
- *
- * The BUILD_BUG_ON check must not involve any function calls, hence the
- * checks are done in macros here.
- */
-#define kfree_rcu(ptr, rcu_head)   \
-   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
-
-
-/*
  * Place this after a lock-acquisition primitive to guarantee that
  * an UNLOCK+LOCK pair acts as a full barrier.  This guarantee applies
  * if the UNLOCK and LOCK are executed by the same CPU or if the
@@ -887,5 +851,4 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define smp_mb__after_unlock_lock()do { } while (0)
 #endif /* #else #ifdef CONFIG_ARCH_WEAK_RELEASE_ACQUIRE */
 
-
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index fd996cd..567ef58 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -48,8 +48,6 @@ void synchronize_rcu_bh(void);
 void synchronize_sched_expedited(void);
 void synchronize_rcu_expedited(void);
 
-void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
-
 /**
  * synchronize_rcu_bh_expedited - Brute-force RCU-bh grace period
  *
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 231abc8..116e870 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -355,6 +355,48 @@ void *__kmalloc(size_t size, gfp_t flags) 
__assume_kmalloc_alignment __malloc;
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags) 
__assume_slab_alignment __malloc;
 void kmem_cache_free(struct kmem_cache *, void *);
 
+void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
+
+/* Helper macro for kfree_rcu() to prevent argument-expansion eyestrain. */
+#define __kfree_rcu(head, offset) \
+   do { \
+   unsigned long __of = (unsigned long)offset; \
+   BUILD_BUG_ON(!__is_kfree_rcu_offset(__of)); \
+   kfree_call_rcu(head, (rcu_callback_t)(__of));   \
+   } while (0)
+
+/**
+ * kfree_rcu() - kfree an object after a grace period.
+ * @ptr:   pointer to kfree
+ * @rcu_name:  the name of the struct rcu_head within the type of @ptr.
+ *
+ * Many rcu callbacks functions just call kfree()

[PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-03 Thread rao . shoaib

From: Rao Shoaib 

kfree_rcu() should use the new kfree_bulk() interface for freeing
rcu structures as it is more efficient.

Signed-off-by: Rao Shoaib 
---
 include/linux/mm.h   |   5 ++
 include/linux/rcupdate.h |   4 +-
 include/linux/rcutiny.h  |   8 ++-
 kernel/sysctl.c  |  40 
 mm/slab.h|  23 +++
 mm/slab_common.c | 166 ++-
 6 files changed, 242 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..fb1e54c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2673,5 +2673,10 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_kfree_rcu_drain_limit;
+extern int sysctl_kfree_rcu_poll_limit;
+extern int sysctl_kfree_rcu_empty_limit;
+extern int sysctl_kfree_rcu_caching_allowed;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 6338fb6..102a93f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,8 +55,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
-/* only for use by kfree_call_rcu() */
-void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
@@ -210,6 +208,8 @@ do { \
 
 #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
 #include 
+/* only for use by kfree_call_rcu() */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
 #elif defined(CONFIG_TINY_RCU)
 #include 
 #else
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ce9beec..b9e9025 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -84,10 +84,16 @@ static inline void synchronize_sched_expedited(void)
synchronize_sched();
 }
 
+static inline void call_rcu_lazy(struct rcu_head *head,
+rcu_callback_t func)
+{
+   call_rcu(head, func);
+}
+
 static inline void kfree_call_rcu(struct rcu_head *head,
  rcu_callback_t func)
 {
-   call_rcu(head, func);
+   call_rcu_lazy(head, func);
 }
 
 #define rcu_note_context_switch(preempt) \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f98f28c..ab70c99 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1650,6 +1650,46 @@ static struct ctl_table vm_table[] = {
.extra2 = (void *)_rnd_compat_bits_max,
},
 #endif
+   {
+   .procname   = "kfree_rcu_drain_limit",
+   .data   = _kfree_rcu_drain_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_drain_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_poll_limit",
+   .data   = _kfree_rcu_poll_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_poll_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_empty_limit",
+   .data   = _kfree_rcu_empty_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_empty_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
+   {
+   .procname   = "kfree_rcu_caching_allowed",
+   .data   = _kfree_rcu_caching_allowed,
+   .maxlen = sizeof(sysctl_kfree_rcu_caching_allowed),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
{ }
 };
 
diff --git a/mm/slab.h b/mm/slab.h
index 5181323..a332ea6 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
 } kmalloc_info[];
 
+#defineRCU_MAX_ACCUMULATE_SIZE 25
+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_con

[PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-04-03 Thread rao . shoaib

From: Rao Shoaib 

kfree_call_rcu does not belong in linux/rcupdate.h and should be moved to
slab_common.c

Signed-off-by: Rao Shoaib 
---
 include/linux/rcupdate.h | 43 +++
 include/linux/rcutree.h  |  2 --
 include/linux/slab.h | 42 ++
 kernel/rcu/tree.c| 24 ++--
 mm/slab_common.c | 10 ++
 5 files changed, 65 insertions(+), 56 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 043d047..6338fb6 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,6 +55,9 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
+/* only for use by kfree_call_rcu() */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
+
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
@@ -837,45 +840,6 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define __is_kfree_rcu_offset(offset) ((offset) < 4096)
 
 /*
- * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
- */
-#define __kfree_rcu(head, offset) \
-   do { \
-   BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
-   kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
-   } while (0)
-
-/**
- * kfree_rcu() - kfree an object after a grace period.
- * @ptr:   pointer to kfree
- * @rcu_head:  the name of the struct rcu_head within the type of @ptr.
- *
- * Many rcu callbacks functions just call kfree() on the base structure.
- * These functions are trivial, but their size adds up, and furthermore
- * when they are used in a kernel module, that module must invoke the
- * high-latency rcu_barrier() function at module-unload time.
- *
- * The kfree_rcu() function handles this issue.  Rather than encoding a
- * function address in the embedded rcu_head structure, kfree_rcu() instead
- * encodes the offset of the rcu_head structure within the base structure.
- * Because the functions are not allowed in the low-order 4096 bytes of
- * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
- * If the offset is larger than 4095 bytes, a compile-time error will
- * be generated in __kfree_rcu().  If this error is triggered, you can
- * either fall back to use of call_rcu() or rearrange the structure to
- * position the rcu_head structure into the first 4096 bytes.
- *
- * Note that the allowable offset might decrease in the future, for example,
- * to allow something like kmem_cache_free_rcu().
- *
- * The BUILD_BUG_ON check must not involve any function calls, hence the
- * checks are done in macros here.
- */
-#define kfree_rcu(ptr, rcu_head)   \
-   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
-
-
-/*
  * Place this after a lock-acquisition primitive to guarantee that
  * an UNLOCK+LOCK pair acts as a full barrier.  This guarantee applies
  * if the UNLOCK and LOCK are executed by the same CPU or if the
@@ -887,5 +851,4 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define smp_mb__after_unlock_lock()do { } while (0)
 #endif /* #else #ifdef CONFIG_ARCH_WEAK_RELEASE_ACQUIRE */
 
-
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index fd996cd..567ef58 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -48,8 +48,6 @@ void synchronize_rcu_bh(void);
 void synchronize_sched_expedited(void);
 void synchronize_rcu_expedited(void);
 
-void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
-
 /**
  * synchronize_rcu_bh_expedited - Brute-force RCU-bh grace period
  *
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 231abc8..116e870 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -355,6 +355,48 @@ void *__kmalloc(size_t size, gfp_t flags) 
__assume_kmalloc_alignment __malloc;
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags) 
__assume_slab_alignment __malloc;
 void kmem_cache_free(struct kmem_cache *, void *);
 
+void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
+
+/* Helper macro for kfree_rcu() to prevent argument-expansion eyestrain. */
+#define __kfree_rcu(head, offset) \
+   do { \
+   unsigned long __of = (unsigned long)offset; \
+   BUILD_BUG_ON(!__is_kfree_rcu_offset(__of)); \
+   kfree_call_rcu(head, (rcu_callback_t)(__of));   \
+   } while (0)
+
+/**
+ * kfree_rcu() - kfree an object after a grace period.
+ * @ptr:   pointer to kfree
+ * @rcu_name:  the name of the struct rcu_head within the type of @ptr.
+ *
+ * Many rcu callbacks functions just call kfree() on the base structure.
+ * These functions are trivial, but

[PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-04-01 Thread rao . shoaib

From: Rao Shoaib <rao.sho...@oracle.com>

kfree_call_rcu does not belong in linux/rcupdate.h and should be moved to
slab_common.c

Signed-off-by: Rao Shoaib <rao.sho...@oracle.com>
---
 include/linux/rcupdate.h | 43 +++
 include/linux/rcutree.h  |  2 --
 include/linux/slab.h | 42 ++
 kernel/rcu/tree.c| 24 ++--
 mm/slab_common.c | 10 ++
 5 files changed, 65 insertions(+), 56 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 043d047..6338fb6 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,6 +55,9 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
+/* only for use by kfree_call_rcu() */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
+
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
@@ -837,45 +840,6 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define __is_kfree_rcu_offset(offset) ((offset) < 4096)
 
 /*
- * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
- */
-#define __kfree_rcu(head, offset) \
-   do { \
-   BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
-   kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
-   } while (0)
-
-/**
- * kfree_rcu() - kfree an object after a grace period.
- * @ptr:   pointer to kfree
- * @rcu_head:  the name of the struct rcu_head within the type of @ptr.
- *
- * Many rcu callbacks functions just call kfree() on the base structure.
- * These functions are trivial, but their size adds up, and furthermore
- * when they are used in a kernel module, that module must invoke the
- * high-latency rcu_barrier() function at module-unload time.
- *
- * The kfree_rcu() function handles this issue.  Rather than encoding a
- * function address in the embedded rcu_head structure, kfree_rcu() instead
- * encodes the offset of the rcu_head structure within the base structure.
- * Because the functions are not allowed in the low-order 4096 bytes of
- * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
- * If the offset is larger than 4095 bytes, a compile-time error will
- * be generated in __kfree_rcu().  If this error is triggered, you can
- * either fall back to use of call_rcu() or rearrange the structure to
- * position the rcu_head structure into the first 4096 bytes.
- *
- * Note that the allowable offset might decrease in the future, for example,
- * to allow something like kmem_cache_free_rcu().
- *
- * The BUILD_BUG_ON check must not involve any function calls, hence the
- * checks are done in macros here.
- */
-#define kfree_rcu(ptr, rcu_head)   \
-   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
-
-
-/*
  * Place this after a lock-acquisition primitive to guarantee that
  * an UNLOCK+LOCK pair acts as a full barrier.  This guarantee applies
  * if the UNLOCK and LOCK are executed by the same CPU or if the
@@ -887,5 +851,4 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define smp_mb__after_unlock_lock()do { } while (0)
 #endif /* #else #ifdef CONFIG_ARCH_WEAK_RELEASE_ACQUIRE */
 
-
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index fd996cd..567ef58 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -48,8 +48,6 @@ void synchronize_rcu_bh(void);
 void synchronize_sched_expedited(void);
 void synchronize_rcu_expedited(void);
 
-void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
-
 /**
  * synchronize_rcu_bh_expedited - Brute-force RCU-bh grace period
  *
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 231abc8..116e870 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -355,6 +355,48 @@ void *__kmalloc(size_t size, gfp_t flags) 
__assume_kmalloc_alignment __malloc;
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags) 
__assume_slab_alignment __malloc;
 void kmem_cache_free(struct kmem_cache *, void *);
 
+void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
+
+/* Helper macro for kfree_rcu() to prevent argument-expansion eyestrain. */
+#define __kfree_rcu(head, offset) \
+   do { \
+   unsigned long __of = (unsigned long)offset; \
+   BUILD_BUG_ON(!__is_kfree_rcu_offset(__of)); \
+   kfree_call_rcu(head, (rcu_callback_t)(__of));   \
+   } while (0)
+
+/**
+ * kfree_rcu() - kfree an object after a grace period.
+ * @ptr:   pointer to kfree
+ * @rcu_name:  the name of the struct rcu_head within the type of @ptr.
+ *
+ * Many rcu callbacks functions just call kfree()

[PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-04-01 Thread rao . shoaib

From: Rao Shoaib 

kfree_call_rcu does not belong in linux/rcupdate.h and should be moved to
slab_common.c

Signed-off-by: Rao Shoaib 
---
 include/linux/rcupdate.h | 43 +++
 include/linux/rcutree.h  |  2 --
 include/linux/slab.h | 42 ++
 kernel/rcu/tree.c| 24 ++--
 mm/slab_common.c | 10 ++
 5 files changed, 65 insertions(+), 56 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 043d047..6338fb6 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,6 +55,9 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
+/* only for use by kfree_call_rcu() */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
+
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
@@ -837,45 +840,6 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define __is_kfree_rcu_offset(offset) ((offset) < 4096)
 
 /*
- * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
- */
-#define __kfree_rcu(head, offset) \
-   do { \
-   BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
-   kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
-   } while (0)
-
-/**
- * kfree_rcu() - kfree an object after a grace period.
- * @ptr:   pointer to kfree
- * @rcu_head:  the name of the struct rcu_head within the type of @ptr.
- *
- * Many rcu callbacks functions just call kfree() on the base structure.
- * These functions are trivial, but their size adds up, and furthermore
- * when they are used in a kernel module, that module must invoke the
- * high-latency rcu_barrier() function at module-unload time.
- *
- * The kfree_rcu() function handles this issue.  Rather than encoding a
- * function address in the embedded rcu_head structure, kfree_rcu() instead
- * encodes the offset of the rcu_head structure within the base structure.
- * Because the functions are not allowed in the low-order 4096 bytes of
- * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
- * If the offset is larger than 4095 bytes, a compile-time error will
- * be generated in __kfree_rcu().  If this error is triggered, you can
- * either fall back to use of call_rcu() or rearrange the structure to
- * position the rcu_head structure into the first 4096 bytes.
- *
- * Note that the allowable offset might decrease in the future, for example,
- * to allow something like kmem_cache_free_rcu().
- *
- * The BUILD_BUG_ON check must not involve any function calls, hence the
- * checks are done in macros here.
- */
-#define kfree_rcu(ptr, rcu_head)   \
-   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
-
-
-/*
  * Place this after a lock-acquisition primitive to guarantee that
  * an UNLOCK+LOCK pair acts as a full barrier.  This guarantee applies
  * if the UNLOCK and LOCK are executed by the same CPU or if the
@@ -887,5 +851,4 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define smp_mb__after_unlock_lock()do { } while (0)
 #endif /* #else #ifdef CONFIG_ARCH_WEAK_RELEASE_ACQUIRE */
 
-
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index fd996cd..567ef58 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -48,8 +48,6 @@ void synchronize_rcu_bh(void);
 void synchronize_sched_expedited(void);
 void synchronize_rcu_expedited(void);
 
-void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
-
 /**
  * synchronize_rcu_bh_expedited - Brute-force RCU-bh grace period
  *
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 231abc8..116e870 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -355,6 +355,48 @@ void *__kmalloc(size_t size, gfp_t flags) 
__assume_kmalloc_alignment __malloc;
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags) 
__assume_slab_alignment __malloc;
 void kmem_cache_free(struct kmem_cache *, void *);
 
+void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
+
+/* Helper macro for kfree_rcu() to prevent argument-expansion eyestrain. */
+#define __kfree_rcu(head, offset) \
+   do { \
+   unsigned long __of = (unsigned long)offset; \
+   BUILD_BUG_ON(!__is_kfree_rcu_offset(__of)); \
+   kfree_call_rcu(head, (rcu_callback_t)(__of));   \
+   } while (0)
+
+/**
+ * kfree_rcu() - kfree an object after a grace period.
+ * @ptr:   pointer to kfree
+ * @rcu_name:  the name of the struct rcu_head within the type of @ptr.
+ *
+ * Many rcu callbacks functions just call kfree() on the base structure.
+ * These functions are trivial, but

[PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-01 Thread rao . shoaib

From: Rao Shoaib <rao.sho...@oracle.com>

kfree_rcu() should use the new kfree_bulk() interface for freeing
rcu structures as it is more efficient.

Signed-off-by: Rao Shoaib <rao.sho...@oracle.com>
---
 include/linux/mm.h  |   5 ++
 include/linux/rcutiny.h |   8 ++-
 kernel/sysctl.c |  40 
 mm/slab.h   |  23 +++
 mm/slab_common.c| 164 +++-
 5 files changed, 238 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..fb1e54c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2673,5 +2673,10 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_kfree_rcu_drain_limit;
+extern int sysctl_kfree_rcu_poll_limit;
+extern int sysctl_kfree_rcu_empty_limit;
+extern int sysctl_kfree_rcu_caching_allowed;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ce9beec..b9e9025 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -84,10 +84,16 @@ static inline void synchronize_sched_expedited(void)
synchronize_sched();
 }
 
+static inline void call_rcu_lazy(struct rcu_head *head,
+rcu_callback_t func)
+{
+   call_rcu(head, func);
+}
+
 static inline void kfree_call_rcu(struct rcu_head *head,
  rcu_callback_t func)
 {
-   call_rcu(head, func);
+   call_rcu_lazy(head, func);
 }
 
 #define rcu_note_context_switch(preempt) \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f98f28c..ab70c99 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1650,6 +1650,46 @@ static struct ctl_table vm_table[] = {
.extra2 = (void *)_rnd_compat_bits_max,
},
 #endif
+   {
+   .procname   = "kfree_rcu_drain_limit",
+   .data   = _kfree_rcu_drain_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_drain_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_poll_limit",
+   .data   = _kfree_rcu_poll_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_poll_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_empty_limit",
+   .data   = _kfree_rcu_empty_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_empty_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
+   {
+   .procname   = "kfree_rcu_caching_allowed",
+   .data   = _kfree_rcu_caching_allowed,
+   .maxlen = sizeof(sysctl_kfree_rcu_caching_allowed),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
{ }
 };
 
diff --git a/mm/slab.h b/mm/slab.h
index 5181323..a332ea6 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
 } kmalloc_info[];
 
+#defineRCU_MAX_ACCUMULATE_SIZE 25
+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head *rbf_list_head;
+   int rbf_list_size;
+   int rbf_cpu;
+   int rbf_empty;
+   int rbf_polled;
+   boolrbf_init;
+   boolrbf_monitor;
+};
+
 #ifndef CONFIG_SLOB
 /* Kmalloc array related functions */
 void setup_kmalloc_cache_index_table(void);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 2ea9866..6e8afff 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -1525,13 +1526,174 @@ void kzfree(const void *p)
 }
 EXPORT_SYMBOL(kzfree);
 
+static DEFINE_PER_CPU(struct rcu_bulk_free, cpu_rbf);
+
+/* drain if atleast these many objects */
+int sysctl_

[PATCH 0/2] Move kfree_rcu out of rcu code and use kfree_bulk

2018-04-01 Thread rao . shoaib

From: Rao Shoaib <rao.sho...@oracle.com>

This patch moves kfree_call_rcu() out of rcu related code to
mm/slab_common and updates kfree_rcu() to use new bulk memory free
functions as they are more efficient.

This is a resubmission of the previous patch.

Changes:

1) checkpatch.pl has been fixed, so kfree_rcu macro is much simpler

2) To handle preemption, preempt_enable()/preempt_disable() statements
   have been added to __rcu_bulk_free().


Rao Shoaib (2):
  Move kfree_call_rcu() to slab_common.c
  kfree_rcu() should use kfree_bulk() interface

 include/linux/mm.h   |   5 ++
 include/linux/rcupdate.h |  43 +---
 include/linux/rcutiny.h  |   8 ++-
 include/linux/rcutree.h  |   2 -
 include/linux/slab.h |  42 
 kernel/rcu/tree.c|  24 +++
 kernel/sysctl.c  |  40 +++
 mm/slab.h|  23 +++
 mm/slab_common.c | 172 +++
 9 files changed, 302 insertions(+), 57 deletions(-)

-- 
2.7.4

[PATCH 2/2] kfree_rcu() should use kfree_bulk() interface

2018-04-01 Thread rao . shoaib

From: Rao Shoaib 

kfree_rcu() should use the new kfree_bulk() interface for freeing
rcu structures as it is more efficient.

Signed-off-by: Rao Shoaib 
---
 include/linux/mm.h  |   5 ++
 include/linux/rcutiny.h |   8 ++-
 kernel/sysctl.c |  40 
 mm/slab.h   |  23 +++
 mm/slab_common.c| 164 +++-
 5 files changed, 238 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..fb1e54c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2673,5 +2673,10 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_kfree_rcu_drain_limit;
+extern int sysctl_kfree_rcu_poll_limit;
+extern int sysctl_kfree_rcu_empty_limit;
+extern int sysctl_kfree_rcu_caching_allowed;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ce9beec..b9e9025 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -84,10 +84,16 @@ static inline void synchronize_sched_expedited(void)
synchronize_sched();
 }
 
+static inline void call_rcu_lazy(struct rcu_head *head,
+rcu_callback_t func)
+{
+   call_rcu(head, func);
+}
+
 static inline void kfree_call_rcu(struct rcu_head *head,
  rcu_callback_t func)
 {
-   call_rcu(head, func);
+   call_rcu_lazy(head, func);
 }
 
 #define rcu_note_context_switch(preempt) \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f98f28c..ab70c99 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1650,6 +1650,46 @@ static struct ctl_table vm_table[] = {
.extra2 = (void *)_rnd_compat_bits_max,
},
 #endif
+   {
+   .procname   = "kfree_rcu_drain_limit",
+   .data   = _kfree_rcu_drain_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_drain_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_poll_limit",
+   .data   = _kfree_rcu_poll_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_poll_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_empty_limit",
+   .data   = _kfree_rcu_empty_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_empty_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
+   {
+   .procname   = "kfree_rcu_caching_allowed",
+   .data   = _kfree_rcu_caching_allowed,
+   .maxlen = sizeof(sysctl_kfree_rcu_caching_allowed),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
{ }
 };
 
diff --git a/mm/slab.h b/mm/slab.h
index 5181323..a332ea6 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -80,6 +80,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
 } kmalloc_info[];
 
+#defineRCU_MAX_ACCUMULATE_SIZE 25
+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head *rbf_list_head;
+   int rbf_list_size;
+   int rbf_cpu;
+   int rbf_empty;
+   int rbf_polled;
+   boolrbf_init;
+   boolrbf_monitor;
+};
+
 #ifndef CONFIG_SLOB
 /* Kmalloc array related functions */
 void setup_kmalloc_cache_index_table(void);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 2ea9866..6e8afff 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -1525,13 +1526,174 @@ void kzfree(const void *p)
 }
 EXPORT_SYMBOL(kzfree);
 
+static DEFINE_PER_CPU(struct rcu_bulk_free, cpu_rbf);
+
+/* drain if atleast these many objects */
+int sysctl_kfree_rcu_drain_limit __read_mostly = 10;
+
+/* time t

[PATCH 0/2] Move kfree_rcu out of rcu code and use kfree_bulk

2018-04-01 Thread rao . shoaib

From: Rao Shoaib 

This patch moves kfree_call_rcu() out of rcu related code to
mm/slab_common and updates kfree_rcu() to use new bulk memory free
functions as they are more efficient.

This is a resubmission of the previous patch.

Changes:

1) checkpatch.pl has been fixed, so kfree_rcu macro is much simpler

2) To handle preemption, preempt_enable()/preempt_disable() statements
   have been added to __rcu_bulk_free().


Rao Shoaib (2):
  Move kfree_call_rcu() to slab_common.c
  kfree_rcu() should use kfree_bulk() interface

 include/linux/mm.h   |   5 ++
 include/linux/rcupdate.h |  43 +---
 include/linux/rcutiny.h  |   8 ++-
 include/linux/rcutree.h  |   2 -
 include/linux/slab.h |  42 
 kernel/rcu/tree.c|  24 +++
 kernel/sysctl.c  |  40 +++
 mm/slab.h|  23 +++
 mm/slab_common.c | 172 +++
 9 files changed, 302 insertions(+), 57 deletions(-)

-- 
2.7.4

[PATCH 1/1] MACRO_ARG_REUSE in checkpatch.pl is confused about * in typeof

2018-03-31 Thread rao . shoaib

From: Rao Shoaib <rao.sho...@oracle.com>

Example:

CHECK: Macro argument reuse 'ptr' - possible side-effects?
+#define kfree_rcu(ptr, rcu_name)   \
+   do {\
+   unsigned long __off = offsetof(typeof(*(ptr)), rcu_name); \
+   struct rcu_head *__rptr = (void *)ptr + __off; \
+   __kfree_rcu(__rptr, __off); \
+   } while (0)

Fix supplied by Joe Perches.

Signed-off-by: Rao Shoaib <rao.sho...@oracle.com>
---
 scripts/checkpatch.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 3d40403..def6bb2 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -4998,7 +4998,7 @@ sub process {
next if ($arg =~ /\.\.\./);
next if ($arg =~ /^type$/i);
my $tmp_stmt = $define_stmt;
-   $tmp_stmt =~ 
s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*\s*$arg\s*\)*\b//g;
+   $tmp_stmt =~ 
s/\b(?:typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*(?:\s*\*\s*)*\s*\(*\s*$arg\s*\)*\b//g;
$tmp_stmt =~ s/\#+\s*$arg\b//g;
$tmp_stmt =~ s/\b$arg\s*\#\#//g;
my $use_cnt = $tmp_stmt =~ s/\b$arg\b//g;
-- 
2.7.4

[PATCH 1/1] MACRO_ARG_REUSE in checkpatch.pl is confused about * in typeof

2018-03-31 Thread rao . shoaib

From: Rao Shoaib 

Example:

CHECK: Macro argument reuse 'ptr' - possible side-effects?
+#define kfree_rcu(ptr, rcu_name)   \
+   do {\
+   unsigned long __off = offsetof(typeof(*(ptr)), rcu_name); \
+   struct rcu_head *__rptr = (void *)ptr + __off; \
+   __kfree_rcu(__rptr, __off); \
+   } while (0)

Fix supplied by Joe Perches.

Signed-off-by: Rao Shoaib 
---
 scripts/checkpatch.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 3d40403..def6bb2 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -4998,7 +4998,7 @@ sub process {
next if ($arg =~ /\.\.\./);
next if ($arg =~ /^type$/i);
my $tmp_stmt = $define_stmt;
-   $tmp_stmt =~ 
s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*\s*$arg\s*\)*\b//g;
+   $tmp_stmt =~ 
s/\b(?:typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*(?:\s*\*\s*)*\s*\(*\s*$arg\s*\)*\b//g;
$tmp_stmt =~ s/\#+\s*$arg\b//g;
$tmp_stmt =~ s/\b$arg\s*\#\#//g;
my $use_cnt = $tmp_stmt =~ s/\b$arg\b//g;
-- 
2.7.4

Re: [PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-03-26 Thread Rao Shoaib


Folks,

Is anyone working on resolving the check patch issue as I am waiting to 
resubmit my patch. Will it be fine if I submitted the patch with the 
original macro as the check is in-correct.


I do not speak perl but I can do the process work. If folks think Joe's 
fix is fine I can submit it and perhaps someone can review it ?


Regards,

Shoaib


On 01/04/2018 10:46 PM, Joe Perches wrote:

On Thu, 2018-01-04 at 16:07 -0800, Matthew Wilcox wrote:

On Thu, Jan 04, 2018 at 03:47:32PM -0800, Paul E. McKenney wrote:

I was under the impression that typeof did not actually evaluate its
argument, but rather only returned its type.  And there are a few macros
with this pattern in mainline.

Or am I confused about what typeof does?

I think checkpatch is confused by the '*' in the typeof argument:

$ git diff |./scripts/checkpatch.pl --strict
CHECK: Macro argument reuse 'ptr' - possible side-effects?
#29: FILE: include/linux/rcupdate.h:896:
+#define kfree_rcu(ptr, rcu_head)\
+   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))

If one removes the '*', the warning goes away.

I'm no perlista, but Joe, would this regexp modification make sense?

+++ b/scripts/checkpatch.pl
@@ -4957,7 +4957,7 @@ sub process {
 next if ($arg =~ /\.\.\./);
 next if ($arg =~ /^type$/i);
 my $tmp_stmt = $define_stmt;
-   $tmp_stmt =~ 
s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*\s*$arg\s*\)*\b//g;
+   $tmp_stmt =~ 
s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*\**\(*\s*$arg\s*\)*\b//g;

I supposed ideally it'd be more like

$tmp_stmt =~ 
s/\b(?:typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*(?:\s*\*\s*)*\s*\(*\s*$arg\s*\)*\b//g;

Adding ?: at the start to not capture and
(?:\s*\*\s*)* for any number of * with any
surrounding spacings.

Re: [PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-03-26 Thread Rao Shoaib


Folks,

Is anyone working on resolving the check patch issue as I am waiting to 
resubmit my patch. Will it be fine if I submitted the patch with the 
original macro as the check is in-correct.


I do not speak perl but I can do the process work. If folks think Joe's 
fix is fine I can submit it and perhaps someone can review it ?


Regards,

Shoaib


On 01/04/2018 10:46 PM, Joe Perches wrote:

On Thu, 2018-01-04 at 16:07 -0800, Matthew Wilcox wrote:

On Thu, Jan 04, 2018 at 03:47:32PM -0800, Paul E. McKenney wrote:

I was under the impression that typeof did not actually evaluate its
argument, but rather only returned its type.  And there are a few macros
with this pattern in mainline.

Or am I confused about what typeof does?

I think checkpatch is confused by the '*' in the typeof argument:

$ git diff |./scripts/checkpatch.pl --strict
CHECK: Macro argument reuse 'ptr' - possible side-effects?
#29: FILE: include/linux/rcupdate.h:896:
+#define kfree_rcu(ptr, rcu_head)\
+   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))

If one removes the '*', the warning goes away.

I'm no perlista, but Joe, would this regexp modification make sense?

+++ b/scripts/checkpatch.pl
@@ -4957,7 +4957,7 @@ sub process {
 next if ($arg =~ /\.\.\./);
 next if ($arg =~ /^type$/i);
 my $tmp_stmt = $define_stmt;
-   $tmp_stmt =~ 
s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*\s*$arg\s*\)*\b//g;
+   $tmp_stmt =~ 
s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*\**\(*\s*$arg\s*\)*\b//g;

I supposed ideally it'd be more like

$tmp_stmt =~ 
s/\b(?:typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*(?:\s*\*\s*)*\s*\(*\s*$arg\s*\)*\b//g;

Adding ?: at the start to not capture and
(?:\s*\*\s*)* for any number of * with any
surrounding spacings.

Re: [PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-01-04 Thread Rao Shoaib




On 01/04/2018 04:07 PM, Matthew Wilcox wrote:

On Thu, Jan 04, 2018 at 03:47:32PM -0800, Paul E. McKenney wrote:

I was under the impression that typeof did not actually evaluate its
argument, but rather only returned its type.  And there are a few macros
with this pattern in mainline.

Or am I confused about what typeof does?

I think checkpatch is confused by the '*' in the typeof argument:

Yup.


$ git diff |./scripts/checkpatch.pl --strict
CHECK: Macro argument reuse 'ptr' - possible side-effects?
#29: FILE: include/linux/rcupdate.h:896:
+#define kfree_rcu(ptr, rcu_head)\
+   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))

If one removes the '*', the warning goes away.

I'm no perlista, but Joe, would this regexp modification make sense?

+++ b/scripts/checkpatch.pl
@@ -4957,7 +4957,7 @@ sub process {
 next if ($arg =~ /\.\.\./);
 next if ($arg =~ /^type$/i);
 my $tmp_stmt = $define_stmt;
-   $tmp_stmt =~ 
s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*\s*$arg\s*\)*\b//g;
+   $tmp_stmt =~ 
s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*\**\(*\s*$arg\s*\)*\b//g;
 $tmp_stmt =~ s/\#+\s*$arg\b//g;
 $tmp_stmt =~ s/\b$arg\s*\#\#//g;
 my $use_cnt = $tmp_stmt =~ s/\b$arg\b//g;

Thanks a lot for digging into this. I had to try several variations for 
the warning to go away and don't remember the reason for each change. I 
am not perl literate and the regular expression sacred me ;-).


Shoaib

Re: [PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-01-04 Thread Rao Shoaib




On 01/04/2018 04:07 PM, Matthew Wilcox wrote:

On Thu, Jan 04, 2018 at 03:47:32PM -0800, Paul E. McKenney wrote:

I was under the impression that typeof did not actually evaluate its
argument, but rather only returned its type.  And there are a few macros
with this pattern in mainline.

Or am I confused about what typeof does?

I think checkpatch is confused by the '*' in the typeof argument:

Yup.


$ git diff |./scripts/checkpatch.pl --strict
CHECK: Macro argument reuse 'ptr' - possible side-effects?
#29: FILE: include/linux/rcupdate.h:896:
+#define kfree_rcu(ptr, rcu_head)\
+   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))

If one removes the '*', the warning goes away.

I'm no perlista, but Joe, would this regexp modification make sense?

+++ b/scripts/checkpatch.pl
@@ -4957,7 +4957,7 @@ sub process {
 next if ($arg =~ /\.\.\./);
 next if ($arg =~ /^type$/i);
 my $tmp_stmt = $define_stmt;
-   $tmp_stmt =~ 
s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*\s*$arg\s*\)*\b//g;
+   $tmp_stmt =~ 
s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*\**\(*\s*$arg\s*\)*\b//g;
 $tmp_stmt =~ s/\#+\s*$arg\b//g;
 $tmp_stmt =~ s/\b$arg\s*\#\#//g;
 my $use_cnt = $tmp_stmt =~ s/\b$arg\b//g;

Thanks a lot for digging into this. I had to try several variations for 
the warning to go away and don't remember the reason for each change. I 
am not perl literate and the regular expression sacred me ;-).


Shoaib

Re: [PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-01-04 Thread Rao Shoaib




On 01/04/2018 01:46 PM, Matthew Wilcox wrote:

On Thu, Jan 04, 2018 at 01:27:49PM -0800, Rao Shoaib wrote:

On 01/04/2018 12:35 PM, Rao Shoaib wrote:

As far as your previous comments are concerned, only the following one
has not been addressed. Can you please elaborate as I do not understand
the comment. The code was expanded because the new macro expansion check
fails. Based on Matthew Wilcox's comment I have reverted rcu_head_name
back to rcu_head.

It turns out I did not remember the real reason for the change. With the
macro rewritten, using rcu_head as a macro argument does not work because it
conflicts with the name of the type 'struct rcu_head' used in the macro. I
have renamed the macro argument to rcu_name.

Shoaib

+#define kfree_rcu(ptr, rcu_head_name) \
+    do { \
+    typeof(ptr) __ptr = ptr;    \
+    unsigned long __off = offsetof(typeof(*(__ptr)), \
+  rcu_head_name); \
+    struct rcu_head *__rptr = (void *)__ptr + __off; \
+    __kfree_rcu(__rptr, __off); \
+    } while (0)

why do you want to open code this?

But why are you changing this macro at all?  If it was to avoid the
double-mention of "ptr", then you haven't done that.
I have -- I do not get the error because ptr is being assigned only one. 
If you have a better way than let me know and I will be happy to make 
the change.


Shoaib.

Re: [PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-01-04 Thread Rao Shoaib




On 01/04/2018 01:46 PM, Matthew Wilcox wrote:

On Thu, Jan 04, 2018 at 01:27:49PM -0800, Rao Shoaib wrote:

On 01/04/2018 12:35 PM, Rao Shoaib wrote:

As far as your previous comments are concerned, only the following one
has not been addressed. Can you please elaborate as I do not understand
the comment. The code was expanded because the new macro expansion check
fails. Based on Matthew Wilcox's comment I have reverted rcu_head_name
back to rcu_head.

It turns out I did not remember the real reason for the change. With the
macro rewritten, using rcu_head as a macro argument does not work because it
conflicts with the name of the type 'struct rcu_head' used in the macro. I
have renamed the macro argument to rcu_name.

Shoaib

+#define kfree_rcu(ptr, rcu_head_name) \
+    do { \
+    typeof(ptr) __ptr = ptr;    \
+    unsigned long __off = offsetof(typeof(*(__ptr)), \
+  rcu_head_name); \
+    struct rcu_head *__rptr = (void *)__ptr + __off; \
+    __kfree_rcu(__rptr, __off); \
+    } while (0)

why do you want to open code this?

But why are you changing this macro at all?  If it was to avoid the
double-mention of "ptr", then you haven't done that.
I have -- I do not get the error because ptr is being assigned only one. 
If you have a better way than let me know and I will be happy to make 
the change.


Shoaib.

Re: [PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-01-04 Thread Rao Shoaib




On 01/04/2018 12:35 PM, Rao Shoaib wrote:

Hi Boqun,

Thanks a lot for all your guidance and for catching the cut and paster 
error. Please see inline.



On 01/03/2018 05:38 PM, Boqun Feng wrote:


But you introduced a bug here, you should use rcu_state_p instead of
_sched_state as the third parameter for __call_rcu().

Please re-read:

https://marc.info/?l=linux-mm=151390529209639

, and there are other comments, which you still haven't resolved in this
version. You may want to write a better commit log to explain the
reasons of each modifcation and fix bugs or typos in your previous
version. That's how review process works ;-)

Regards,
Boqun


This is definitely a serious error. Thanks for catching this.

As far as your previous comments are concerned, only the following one 
has not been addressed. Can you please elaborate as I do not 
understand the comment. The code was expanded because the new macro 
expansion check fails. Based on Matthew Wilcox's comment I have 
reverted rcu_head_name back to rcu_head.
It turns out I did not remember the real reason for the change. With the 
macro rewritten, using rcu_head as a macro argument does not work 
because it conflicts with the name of the type 'struct rcu_head' used in 
the macro. I have renamed the macro argument to rcu_name.


Shoaib



+#define kfree_rcu(ptr, rcu_head_name) \
+    do { \
+    typeof(ptr) __ptr = ptr;    \
+    unsigned long __off = offsetof(typeof(*(__ptr)), \
+  rcu_head_name); \
+    struct rcu_head *__rptr = (void *)__ptr + __off; \
+    __kfree_rcu(__rptr, __off); \
+    } while (0)


why do you want to open code this?

Does the following text for the commit log looks better.

kfree_rcu() should use the new kfree_bulk() interface for freeing rcu 
structures


The newly implemented kfree_bulk() interfaces are more efficient, 
using the interfaces for freeing rcu structures has shown performance 
improvements in synthetic benchmarks that allocate and free rcu 
structures at a high rate.


Shoaib

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org

Re: [PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-01-04 Thread Rao Shoaib




On 01/04/2018 12:35 PM, Rao Shoaib wrote:

Hi Boqun,

Thanks a lot for all your guidance and for catching the cut and paster 
error. Please see inline.



On 01/03/2018 05:38 PM, Boqun Feng wrote:


But you introduced a bug here, you should use rcu_state_p instead of
_sched_state as the third parameter for __call_rcu().

Please re-read:

https://marc.info/?l=linux-mm=151390529209639

, and there are other comments, which you still haven't resolved in this
version. You may want to write a better commit log to explain the
reasons of each modifcation and fix bugs or typos in your previous
version. That's how review process works ;-)

Regards,
Boqun


This is definitely a serious error. Thanks for catching this.

As far as your previous comments are concerned, only the following one 
has not been addressed. Can you please elaborate as I do not 
understand the comment. The code was expanded because the new macro 
expansion check fails. Based on Matthew Wilcox's comment I have 
reverted rcu_head_name back to rcu_head.
It turns out I did not remember the real reason for the change. With the 
macro rewritten, using rcu_head as a macro argument does not work 
because it conflicts with the name of the type 'struct rcu_head' used in 
the macro. I have renamed the macro argument to rcu_name.


Shoaib



+#define kfree_rcu(ptr, rcu_head_name) \
+    do { \
+    typeof(ptr) __ptr = ptr;    \
+    unsigned long __off = offsetof(typeof(*(__ptr)), \
+  rcu_head_name); \
+    struct rcu_head *__rptr = (void *)__ptr + __off; \
+    __kfree_rcu(__rptr, __off); \
+    } while (0)


why do you want to open code this?

Does the following text for the commit log looks better.

kfree_rcu() should use the new kfree_bulk() interface for freeing rcu 
structures


The newly implemented kfree_bulk() interfaces are more efficient, 
using the interfaces for freeing rcu structures has shown performance 
improvements in synthetic benchmarks that allocate and free rcu 
structures at a high rate.


Shoaib

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org

Re: [PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-01-04 Thread Rao Shoaib


Hi Boqun,

Thanks a lot for all your guidance and for catching the cut and paster 
error. Please see inline.



On 01/03/2018 05:38 PM, Boqun Feng wrote:


But you introduced a bug here, you should use rcu_state_p instead of
_sched_state as the third parameter for __call_rcu().

Please re-read:

https://marc.info/?l=linux-mm=151390529209639

, and there are other comments, which you still haven't resolved in this
version. You may want to write a better commit log to explain the
reasons of each modifcation and fix bugs or typos in your previous
version. That's how review process works ;-)

Regards,
Boqun


This is definitely a serious error. Thanks for catching this.

As far as your previous comments are concerned, only the following one 
has not been addressed. Can you please elaborate as I do not understand 
the comment. The code was expanded because the new macro expansion check 
fails. Based on Matthew Wilcox's comment I have reverted rcu_head_name 
back to rcu_head.



+#define kfree_rcu(ptr, rcu_head_name)  \
+   do { \
+   typeof(ptr) __ptr = ptr;\
+   unsigned long __off = offsetof(typeof(*(__ptr)), \
+ rcu_head_name); \
+   struct rcu_head *__rptr = (void *)__ptr + __off; \
+   __kfree_rcu(__rptr, __off); \
+   } while (0)


why do you want to open code this?

Does the following text for the commit log looks better.

kfree_rcu() should use the new kfree_bulk() interface for freeing rcu 
structures


The newly implemented kfree_bulk() interfaces are more efficient, using 
the interfaces for freeing rcu structures has shown performance 
improvements in synthetic benchmarks that allocate and free rcu 
structures at a high rate.


Shoaib

Re: [PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-01-04 Thread Rao Shoaib


Hi Boqun,

Thanks a lot for all your guidance and for catching the cut and paster 
error. Please see inline.



On 01/03/2018 05:38 PM, Boqun Feng wrote:


But you introduced a bug here, you should use rcu_state_p instead of
_sched_state as the third parameter for __call_rcu().

Please re-read:

https://marc.info/?l=linux-mm=151390529209639

, and there are other comments, which you still haven't resolved in this
version. You may want to write a better commit log to explain the
reasons of each modifcation and fix bugs or typos in your previous
version. That's how review process works ;-)

Regards,
Boqun


This is definitely a serious error. Thanks for catching this.

As far as your previous comments are concerned, only the following one 
has not been addressed. Can you please elaborate as I do not understand 
the comment. The code was expanded because the new macro expansion check 
fails. Based on Matthew Wilcox's comment I have reverted rcu_head_name 
back to rcu_head.



+#define kfree_rcu(ptr, rcu_head_name)  \
+   do { \
+   typeof(ptr) __ptr = ptr;\
+   unsigned long __off = offsetof(typeof(*(__ptr)), \
+ rcu_head_name); \
+   struct rcu_head *__rptr = (void *)__ptr + __off; \
+   __kfree_rcu(__rptr, __off); \
+   } while (0)


why do you want to open code this?

Does the following text for the commit log looks better.

kfree_rcu() should use the new kfree_bulk() interface for freeing rcu 
structures


The newly implemented kfree_bulk() interfaces are more efficient, using 
the interfaces for freeing rcu structures has shown performance 
improvements in synthetic benchmarks that allocate and free rcu 
structures at a high rate.


Shoaib

Re: [PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-01-02 Thread Rao Shoaib




On 01/02/2018 02:23 PM, Matthew Wilcox wrote:

On Tue, Jan 02, 2018 at 12:11:37PM -0800, rao.sho...@oracle.com wrote:

-#define kfree_rcu(ptr, rcu_head)   \
-   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
+#define kfree_rcu(ptr, rcu_head_name)  \
+   do { \
+   typeof(ptr) __ptr = ptr;\
+   unsigned long __off = offsetof(typeof(*(__ptr)), \
+ rcu_head_name); \
+   struct rcu_head *__rptr = (void *)__ptr + __off; \
+   __kfree_rcu(__rptr, __off); \
+   } while (0)

I feel like you're trying to help people understand the code better,
but using longer names can really work against that.  Reverting to
calling the parameter 'rcu_head' lets you not split the line:
I think it is a matter of preference, what is the issue with line 
splitting ?
Coming from a background other than Linux I find it very annoying that 
Linux allows variables names that are meaning less. Linux does not even 
enforce adding a prefix for structure members, so trying to find out 
where a member is used or set is impossible using cscope.
I can not change the Linux requirements so I will go ahead and make the 
change in the next rev.




+#define kfree_rcu(ptr, rcu_head)   \
+   do { \
+   typeof(ptr) __ptr = ptr;\
+   unsigned long __off = offsetof(typeof(*(__ptr)), rcu_head); \
+   struct rcu_head *__rptr = (void *)__ptr + __off; \
+   __kfree_rcu(__rptr, __off); \
+   } while (0)

Also, I don't understand why you're bothering to create __ptr here.
I understand the desire to not mention the same argument more than once,
but you have 'ptr' twice anyway.

And it's good practice to enclose macro arguments in parentheses in case
the user has done something really tricksy like pass in "p + 1".

In summary, I don't see anything fundamentally better in your rewrite
of kfree_rcu().  The previous version is more succinct, and to my
mind, easier to understand.
I did not want to make thins change but it is required due to the new 
tests added for macro expansion where the same name as in the macro can 
not be used twice. It takes care of the 'p + 1' hazard that you refer to 
above.



+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func)
+{
+   __call_rcu(head, func, _sched_state, -1, 1);
+}
-void kfree_call_rcu(struct rcu_head *head,
-   rcu_callback_t func)
-{
-   __call_rcu(head, func, rcu_state_p, -1, 1);
-}

You've silently changed this.  Why?  It might well be the right change,
but it at least merits mentioning in the changelog.
This was to address a comment about me not changing the tiny 
implementation to be same as the tree implementation.


Shoaib


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org

Re: [PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-01-02 Thread Rao Shoaib




On 01/02/2018 02:23 PM, Matthew Wilcox wrote:

On Tue, Jan 02, 2018 at 12:11:37PM -0800, rao.sho...@oracle.com wrote:

-#define kfree_rcu(ptr, rcu_head)   \
-   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
+#define kfree_rcu(ptr, rcu_head_name)  \
+   do { \
+   typeof(ptr) __ptr = ptr;\
+   unsigned long __off = offsetof(typeof(*(__ptr)), \
+ rcu_head_name); \
+   struct rcu_head *__rptr = (void *)__ptr + __off; \
+   __kfree_rcu(__rptr, __off); \
+   } while (0)

I feel like you're trying to help people understand the code better,
but using longer names can really work against that.  Reverting to
calling the parameter 'rcu_head' lets you not split the line:
I think it is a matter of preference, what is the issue with line 
splitting ?
Coming from a background other than Linux I find it very annoying that 
Linux allows variables names that are meaning less. Linux does not even 
enforce adding a prefix for structure members, so trying to find out 
where a member is used or set is impossible using cscope.
I can not change the Linux requirements so I will go ahead and make the 
change in the next rev.




+#define kfree_rcu(ptr, rcu_head)   \
+   do { \
+   typeof(ptr) __ptr = ptr;\
+   unsigned long __off = offsetof(typeof(*(__ptr)), rcu_head); \
+   struct rcu_head *__rptr = (void *)__ptr + __off; \
+   __kfree_rcu(__rptr, __off); \
+   } while (0)

Also, I don't understand why you're bothering to create __ptr here.
I understand the desire to not mention the same argument more than once,
but you have 'ptr' twice anyway.

And it's good practice to enclose macro arguments in parentheses in case
the user has done something really tricksy like pass in "p + 1".

In summary, I don't see anything fundamentally better in your rewrite
of kfree_rcu().  The previous version is more succinct, and to my
mind, easier to understand.
I did not want to make thins change but it is required due to the new 
tests added for macro expansion where the same name as in the macro can 
not be used twice. It takes care of the 'p + 1' hazard that you refer to 
above.



+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func)
+{
+   __call_rcu(head, func, _sched_state, -1, 1);
+}
-void kfree_call_rcu(struct rcu_head *head,
-   rcu_callback_t func)
-{
-   __call_rcu(head, func, rcu_state_p, -1, 1);
-}

You've silently changed this.  Why?  It might well be the right change,
but it at least merits mentioning in the changelog.
This was to address a comment about me not changing the tiny 
implementation to be same as the tree implementation.


Shoaib


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org

[PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-01-02 Thread rao . shoaib

From: Rao Shoaib <rao.sho...@oracle.com>

Signed-off-by: Rao Shoaib <rao.sho...@oracle.com>
---
 include/linux/rcupdate.h | 43 +++
 include/linux/rcutree.h  |  2 --
 include/linux/slab.h | 44 
 kernel/rcu/tree.c| 24 ++--
 mm/slab_common.c | 10 ++
 5 files changed, 67 insertions(+), 56 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index a6ddc42..23ed728 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,6 +55,9 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
+/* only for use by kfree_call_rcu() */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
+
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
@@ -838,45 +841,6 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define __is_kfree_rcu_offset(offset) ((offset) < 4096)
 
 /*
- * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
- */
-#define __kfree_rcu(head, offset) \
-   do { \
-   BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
-   kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
-   } while (0)
-
-/**
- * kfree_rcu() - kfree an object after a grace period.
- * @ptr:   pointer to kfree
- * @rcu_head:  the name of the struct rcu_head within the type of @ptr.
- *
- * Many rcu callbacks functions just call kfree() on the base structure.
- * These functions are trivial, but their size adds up, and furthermore
- * when they are used in a kernel module, that module must invoke the
- * high-latency rcu_barrier() function at module-unload time.
- *
- * The kfree_rcu() function handles this issue.  Rather than encoding a
- * function address in the embedded rcu_head structure, kfree_rcu() instead
- * encodes the offset of the rcu_head structure within the base structure.
- * Because the functions are not allowed in the low-order 4096 bytes of
- * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
- * If the offset is larger than 4095 bytes, a compile-time error will
- * be generated in __kfree_rcu().  If this error is triggered, you can
- * either fall back to use of call_rcu() or rearrange the structure to
- * position the rcu_head structure into the first 4096 bytes.
- *
- * Note that the allowable offset might decrease in the future, for example,
- * to allow something like kmem_cache_free_rcu().
- *
- * The BUILD_BUG_ON check must not involve any function calls, hence the
- * checks are done in macros here.
- */
-#define kfree_rcu(ptr, rcu_head)   \
-   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
-
-
-/*
  * Place this after a lock-acquisition primitive to guarantee that
  * an UNLOCK+LOCK pair acts as a full barrier.  This guarantee applies
  * if the UNLOCK and LOCK are executed by the same CPU or if the
@@ -888,5 +852,4 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define smp_mb__after_unlock_lock()do { } while (0)
 #endif /* #else #ifdef CONFIG_ARCH_WEAK_RELEASE_ACQUIRE */
 
-
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 37d6fd3..7746b19 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -48,8 +48,6 @@ void synchronize_rcu_bh(void);
 void synchronize_sched_expedited(void);
 void synchronize_rcu_expedited(void);
 
-void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
-
 /**
  * synchronize_rcu_bh_expedited - Brute-force RCU-bh grace period
  *
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 50697a1..a71f6a78 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -342,6 +342,50 @@ void *__kmalloc(size_t size, gfp_t flags) 
__assume_kmalloc_alignment __malloc;
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags) 
__assume_slab_alignment __malloc;
 void kmem_cache_free(struct kmem_cache *, void *);
 
+void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
+
+/* Helper macro for kfree_rcu() to prevent argument-expansion eyestrain. */
+#define __kfree_rcu(head, offset) \
+   do { \
+   unsigned long __o = (unsigned long)offset; \
+   BUILD_BUG_ON(!__is_kfree_rcu_offset(__o)); \
+   kfree_call_rcu(head, (rcu_callback_t)(__o)); \
+   } while (0)
+
+/**
+ * kfree_rcu() - kfree an object after a grace period.
+ * @ptr:   pointer to kfree
+ * @rcu_head:  the name of the struct rcu_head within the type of @ptr.
+ *
+ * Many rcu callbacks functions just call kfree() on the base structure.
+ * These functions are trivial, but their size adds up, and furthermore

[PATCH 1/2] Move kfree_call_rcu() to slab_common.c

2018-01-02 Thread rao . shoaib

From: Rao Shoaib 

Signed-off-by: Rao Shoaib 
---
 include/linux/rcupdate.h | 43 +++
 include/linux/rcutree.h  |  2 --
 include/linux/slab.h | 44 
 kernel/rcu/tree.c| 24 ++--
 mm/slab_common.c | 10 ++
 5 files changed, 67 insertions(+), 56 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index a6ddc42..23ed728 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,6 +55,9 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
+/* only for use by kfree_call_rcu() */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
+
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
@@ -838,45 +841,6 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define __is_kfree_rcu_offset(offset) ((offset) < 4096)
 
 /*
- * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
- */
-#define __kfree_rcu(head, offset) \
-   do { \
-   BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
-   kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
-   } while (0)
-
-/**
- * kfree_rcu() - kfree an object after a grace period.
- * @ptr:   pointer to kfree
- * @rcu_head:  the name of the struct rcu_head within the type of @ptr.
- *
- * Many rcu callbacks functions just call kfree() on the base structure.
- * These functions are trivial, but their size adds up, and furthermore
- * when they are used in a kernel module, that module must invoke the
- * high-latency rcu_barrier() function at module-unload time.
- *
- * The kfree_rcu() function handles this issue.  Rather than encoding a
- * function address in the embedded rcu_head structure, kfree_rcu() instead
- * encodes the offset of the rcu_head structure within the base structure.
- * Because the functions are not allowed in the low-order 4096 bytes of
- * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
- * If the offset is larger than 4095 bytes, a compile-time error will
- * be generated in __kfree_rcu().  If this error is triggered, you can
- * either fall back to use of call_rcu() or rearrange the structure to
- * position the rcu_head structure into the first 4096 bytes.
- *
- * Note that the allowable offset might decrease in the future, for example,
- * to allow something like kmem_cache_free_rcu().
- *
- * The BUILD_BUG_ON check must not involve any function calls, hence the
- * checks are done in macros here.
- */
-#define kfree_rcu(ptr, rcu_head)   \
-   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
-
-
-/*
  * Place this after a lock-acquisition primitive to guarantee that
  * an UNLOCK+LOCK pair acts as a full barrier.  This guarantee applies
  * if the UNLOCK and LOCK are executed by the same CPU or if the
@@ -888,5 +852,4 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define smp_mb__after_unlock_lock()do { } while (0)
 #endif /* #else #ifdef CONFIG_ARCH_WEAK_RELEASE_ACQUIRE */
 
-
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 37d6fd3..7746b19 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -48,8 +48,6 @@ void synchronize_rcu_bh(void);
 void synchronize_sched_expedited(void);
 void synchronize_rcu_expedited(void);
 
-void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
-
 /**
  * synchronize_rcu_bh_expedited - Brute-force RCU-bh grace period
  *
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 50697a1..a71f6a78 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -342,6 +342,50 @@ void *__kmalloc(size_t size, gfp_t flags) 
__assume_kmalloc_alignment __malloc;
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags) 
__assume_slab_alignment __malloc;
 void kmem_cache_free(struct kmem_cache *, void *);
 
+void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
+
+/* Helper macro for kfree_rcu() to prevent argument-expansion eyestrain. */
+#define __kfree_rcu(head, offset) \
+   do { \
+   unsigned long __o = (unsigned long)offset; \
+   BUILD_BUG_ON(!__is_kfree_rcu_offset(__o)); \
+   kfree_call_rcu(head, (rcu_callback_t)(__o)); \
+   } while (0)
+
+/**
+ * kfree_rcu() - kfree an object after a grace period.
+ * @ptr:   pointer to kfree
+ * @rcu_head:  the name of the struct rcu_head within the type of @ptr.
+ *
+ * Many rcu callbacks functions just call kfree() on the base structure.
+ * These functions are trivial, but their size adds up, and furthermore
+ * when they are used in a kernel module, that module m

[PATCH 2/2] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2018-01-02 Thread rao . shoaib

From: Rao Shoaib <rao.sho...@oracle.com>

Signed-off-by: Rao Shoaib <rao.sho...@oracle.com>
---
 include/linux/mm.h  |   5 ++
 include/linux/rcutiny.h |   8 ++-
 kernel/sysctl.c |  40 
 mm/slab.h   |  23 +++
 mm/slab_common.c| 161 +++-
 5 files changed, 235 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ea818ff..8ae4f25 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2669,5 +2669,10 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_kfree_rcu_drain_limit;
+extern int sysctl_kfree_rcu_poll_limit;
+extern int sysctl_kfree_rcu_empty_limit;
+extern int sysctl_kfree_rcu_caching_allowed;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index b3dbf95..af28107 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -84,10 +84,16 @@ static inline void synchronize_sched_expedited(void)
synchronize_sched();
 }
 
+static inline void call_rcu_lazy(struct rcu_head *head,
+rcu_callback_t func)
+{
+   call_rcu(head, func);
+}
+
 static inline void kfree_call_rcu(struct rcu_head *head,
  rcu_callback_t func)
 {
-   call_rcu(head, func);
+   call_rcu_lazy(head, func);
 }
 
 #define rcu_note_context_switch(preempt) \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 557d467..47b48f7 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1655,6 +1655,46 @@ static struct ctl_table vm_table[] = {
.extra2 = (void *)_rnd_compat_bits_max,
},
 #endif
+   {
+   .procname   = "kfree_rcu_drain_limit",
+   .data   = _kfree_rcu_drain_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_drain_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_poll_limit",
+   .data   = _kfree_rcu_poll_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_poll_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_empty_limit",
+   .data   = _kfree_rcu_empty_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_empty_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
+   {
+   .procname   = "kfree_rcu_caching_allowed",
+   .data   = _kfree_rcu_caching_allowed,
+   .maxlen = sizeof(sysctl_kfree_rcu_caching_allowed),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
{ }
 };
 
diff --git a/mm/slab.h b/mm/slab.h
index ad657ff..2541f70 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -78,6 +78,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
 } kmalloc_info[];
 
+#defineRCU_MAX_ACCUMULATE_SIZE 25
+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head *rbf_list_head;
+   int rbf_list_size;
+   int rbf_cpu;
+   int rbf_empty;
+   int rbf_polled;
+   boolrbf_init;
+   boolrbf_monitor;
+};
+
 unsigned long calculate_alignment(slab_flags_t flags,
unsigned long align, unsigned long size);
 
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 0d8a63b..8987737 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -1483,13 +1484,171 @@ void kzfree(const void *p)
 }
 EXPORT_SYMBOL(kzfree);
 
+static DEFINE_PER_CPU(struct rcu_bulk_free, cpu_rbf);
+
+/* drain if atleast these many objects */
+int sysctl_kfree_rcu_drain_limit __read_mostly = 10;
+
+/* time to poll if fewer than drain_limit */
+int sysc

[PATCH 2/2] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2018-01-02 Thread rao . shoaib

From: Rao Shoaib 

Signed-off-by: Rao Shoaib 
---
 include/linux/mm.h  |   5 ++
 include/linux/rcutiny.h |   8 ++-
 kernel/sysctl.c |  40 
 mm/slab.h   |  23 +++
 mm/slab_common.c| 161 +++-
 5 files changed, 235 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ea818ff..8ae4f25 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2669,5 +2669,10 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_kfree_rcu_drain_limit;
+extern int sysctl_kfree_rcu_poll_limit;
+extern int sysctl_kfree_rcu_empty_limit;
+extern int sysctl_kfree_rcu_caching_allowed;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index b3dbf95..af28107 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -84,10 +84,16 @@ static inline void synchronize_sched_expedited(void)
synchronize_sched();
 }
 
+static inline void call_rcu_lazy(struct rcu_head *head,
+rcu_callback_t func)
+{
+   call_rcu(head, func);
+}
+
 static inline void kfree_call_rcu(struct rcu_head *head,
  rcu_callback_t func)
 {
-   call_rcu(head, func);
+   call_rcu_lazy(head, func);
 }
 
 #define rcu_note_context_switch(preempt) \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 557d467..47b48f7 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1655,6 +1655,46 @@ static struct ctl_table vm_table[] = {
.extra2 = (void *)_rnd_compat_bits_max,
},
 #endif
+   {
+   .procname   = "kfree_rcu_drain_limit",
+   .data   = _kfree_rcu_drain_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_drain_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_poll_limit",
+   .data   = _kfree_rcu_poll_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_poll_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_empty_limit",
+   .data   = _kfree_rcu_empty_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_empty_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
+   {
+   .procname   = "kfree_rcu_caching_allowed",
+   .data   = _kfree_rcu_caching_allowed,
+   .maxlen = sizeof(sysctl_kfree_rcu_caching_allowed),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
{ }
 };
 
diff --git a/mm/slab.h b/mm/slab.h
index ad657ff..2541f70 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -78,6 +78,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
 } kmalloc_info[];
 
+#defineRCU_MAX_ACCUMULATE_SIZE 25
+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head *rbf_list_head;
+   int rbf_list_size;
+   int rbf_cpu;
+   int rbf_empty;
+   int rbf_polled;
+   boolrbf_init;
+   boolrbf_monitor;
+};
+
 unsigned long calculate_alignment(slab_flags_t flags,
unsigned long align, unsigned long size);
 
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 0d8a63b..8987737 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -1483,13 +1484,171 @@ void kzfree(const void *p)
 }
 EXPORT_SYMBOL(kzfree);
 
+static DEFINE_PER_CPU(struct rcu_bulk_free, cpu_rbf);
+
+/* drain if atleast these many objects */
+int sysctl_kfree_rcu_drain_limit __read_mostly = 10;
+
+/* time to poll if fewer than drain_limit */
+int sysctl_kfree_rcu_poll_limit __read_mostly = 5;
+
+/

Re: [PATCH] Move kfree_call_rcu() to slab_common.c

2017-12-21 Thread Rao Shoaib




On 12/21/2017 04:36 AM, Matthew Wilcox wrote:

On Thu, Dec 21, 2017 at 12:19:47AM -0800, rao.sho...@oracle.com wrote:

This patch moves kfree_call_rcu() and related macros out of rcu code. A new
function __call_rcu_lazy() is created for calling __call_rcu() with the lazy
flag.

Something you probably didn't know ... there are two RCU implementations
in the kernel; Tree and Tiny.  It looks like you've only added
__call_rcu_lazy() to Tree and you'll also need to add it to Tiny.

I left it out on purpose because the call in tiny is a little different

rcutiny.h:

static inline void kfree_call_rcu(struct rcu_head *head,
                  void (*func)(struct rcu_head *rcu))
{
    call_rcu(head, func);
}

tree.c:

void kfree_call_rcu(struct rcu_head *head,
            void (*func)(struct rcu_head *rcu))
{
    __call_rcu(head, func, rcu_state_p, -1, 1);
}
EXPORT_SYMBOL_GPL(kfree_call_rcu);

If we want the code to be exactly same I can create a lazy version for 
tiny as well. However,  I don not know where to move kfree_call_rcu() 
from it's current home in rcutiny.h though. Any thoughts ?



Also moving macros generated following checkpatch noise. I do not know
how to silence checkpatch as there is nothing wrong.

CHECK: Macro argument reuse 'offset' - possible side-effects?
#91: FILE: include/linux/slab.h:348:
+#define __kfree_rcu(head, offset) \
+   do { \
+   BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
+   kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
+   } while (0)

What checkpatch is warning you about here is that somebody might call

__kfree_rcu(p, a++);

and this would expand into

do { \
BUILD_BUG_ON(!__is_kfree_rcu_offset(a++)); \
kfree_call_rcu(p, (rcu_callback_t)(unsigned long)(a++)); \
} while (0)

which would increment 'a' twice, and cause pain and suffering.

That's pretty unlikely usage of __kfree_rcu(), but I suppose it's not
impossible.  We have various hacks to get around this kind of thing;
for example I might do this as::

#define __kfree_rcu(head, offset) \
do { \
unsigned long __o = offset;
BUILD_BUG_ON(!__is_kfree_rcu_offset(__o)); \
kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(__o)); \
} while (0)

Now offset is only evaluated once per invocation of the macro.  The other
two warnings are the same problem.

Thanks. I was not sure if I was required to fix the noise or based on 
inspection the noise could be ignored. I will make the change and resubmit.


Shoaib

Re: [PATCH] Move kfree_call_rcu() to slab_common.c

2017-12-21 Thread Rao Shoaib




On 12/21/2017 04:36 AM, Matthew Wilcox wrote:

On Thu, Dec 21, 2017 at 12:19:47AM -0800, rao.sho...@oracle.com wrote:

This patch moves kfree_call_rcu() and related macros out of rcu code. A new
function __call_rcu_lazy() is created for calling __call_rcu() with the lazy
flag.

Something you probably didn't know ... there are two RCU implementations
in the kernel; Tree and Tiny.  It looks like you've only added
__call_rcu_lazy() to Tree and you'll also need to add it to Tiny.

I left it out on purpose because the call in tiny is a little different

rcutiny.h:

static inline void kfree_call_rcu(struct rcu_head *head,
                  void (*func)(struct rcu_head *rcu))
{
    call_rcu(head, func);
}

tree.c:

void kfree_call_rcu(struct rcu_head *head,
            void (*func)(struct rcu_head *rcu))
{
    __call_rcu(head, func, rcu_state_p, -1, 1);
}
EXPORT_SYMBOL_GPL(kfree_call_rcu);

If we want the code to be exactly same I can create a lazy version for 
tiny as well. However,  I don not know where to move kfree_call_rcu() 
from it's current home in rcutiny.h though. Any thoughts ?



Also moving macros generated following checkpatch noise. I do not know
how to silence checkpatch as there is nothing wrong.

CHECK: Macro argument reuse 'offset' - possible side-effects?
#91: FILE: include/linux/slab.h:348:
+#define __kfree_rcu(head, offset) \
+   do { \
+   BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
+   kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
+   } while (0)

What checkpatch is warning you about here is that somebody might call

__kfree_rcu(p, a++);

and this would expand into

do { \
BUILD_BUG_ON(!__is_kfree_rcu_offset(a++)); \
kfree_call_rcu(p, (rcu_callback_t)(unsigned long)(a++)); \
} while (0)

which would increment 'a' twice, and cause pain and suffering.

That's pretty unlikely usage of __kfree_rcu(), but I suppose it's not
impossible.  We have various hacks to get around this kind of thing;
for example I might do this as::

#define __kfree_rcu(head, offset) \
do { \
unsigned long __o = offset;
BUILD_BUG_ON(!__is_kfree_rcu_offset(__o)); \
kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(__o)); \
} while (0)

Now offset is only evaluated once per invocation of the macro.  The other
two warnings are the same problem.

Thanks. I was not sure if I was required to fix the noise or based on 
inspection the noise could be ignored. I will make the change and resubmit.


Shoaib

[PATCH] Move kfree_call_rcu() to slab_common.c

2017-12-21 Thread rao . shoaib

From: Rao Shoaib <rao.sho...@oracle.com>

This patch moves kfree_call_rcu() and related macros out of rcu code. A new
function __call_rcu_lazy() is created for calling __call_rcu() with the lazy
flag. Also moving macros generated following checkpatch noise. I do not know
how to silence checkpatch as there is nothing wrong.

CHECK: Macro argument reuse 'offset' - possible side-effects?
#91: FILE: include/linux/slab.h:348:
+#define __kfree_rcu(head, offset) \
+   do { \
+   BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
+   kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
+   } while (0)

CHECK: Macro argument reuse 'ptr' - possible side-effects?
#123: FILE: include/linux/slab.h:380:
+#define kfree_rcu(ptr, rcu_head)   \
+   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))

CHECK: Macro argument reuse 'rcu_head' - possible side-effects?
#123: FILE: include/linux/slab.h:380:
+#define kfree_rcu(ptr, rcu_head)   \
+   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))

total: 0 errors, 0 warnings, 3 checks, 156 lines checked


Signed-off-by: Rao Shoaib <rao.sho...@oracle.com>
---
 include/linux/rcupdate.h | 41 ++---
 include/linux/rcutree.h  |  2 --
 include/linux/slab.h | 37 +
 kernel/rcu/tree.c| 24 ++--
 mm/slab_common.c | 10 ++
 5 files changed, 59 insertions(+), 55 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index a6ddc42..d2c25d8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,6 +55,8 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
+void __call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
+
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
@@ -838,45 +840,6 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define __is_kfree_rcu_offset(offset) ((offset) < 4096)
 
 /*
- * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
- */
-#define __kfree_rcu(head, offset) \
-   do { \
-   BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
-   kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
-   } while (0)
-
-/**
- * kfree_rcu() - kfree an object after a grace period.
- * @ptr:   pointer to kfree
- * @rcu_head:  the name of the struct rcu_head within the type of @ptr.
- *
- * Many rcu callbacks functions just call kfree() on the base structure.
- * These functions are trivial, but their size adds up, and furthermore
- * when they are used in a kernel module, that module must invoke the
- * high-latency rcu_barrier() function at module-unload time.
- *
- * The kfree_rcu() function handles this issue.  Rather than encoding a
- * function address in the embedded rcu_head structure, kfree_rcu() instead
- * encodes the offset of the rcu_head structure within the base structure.
- * Because the functions are not allowed in the low-order 4096 bytes of
- * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
- * If the offset is larger than 4095 bytes, a compile-time error will
- * be generated in __kfree_rcu().  If this error is triggered, you can
- * either fall back to use of call_rcu() or rearrange the structure to
- * position the rcu_head structure into the first 4096 bytes.
- *
- * Note that the allowable offset might decrease in the future, for example,
- * to allow something like kmem_cache_free_rcu().
- *
- * The BUILD_BUG_ON check must not involve any function calls, hence the
- * checks are done in macros here.
- */
-#define kfree_rcu(ptr, rcu_head)   \
-   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
-
-
-/*
  * Place this after a lock-acquisition primitive to guarantee that
  * an UNLOCK+LOCK pair acts as a full barrier.  This guarantee applies
  * if the UNLOCK and LOCK are executed by the same CPU or if the
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 37d6fd3..7746b19 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -48,8 +48,6 @@ void synchronize_rcu_bh(void);
 void synchronize_sched_expedited(void);
 void synchronize_rcu_expedited(void);
 
-void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
-
 /**
  * synchronize_rcu_bh_expedited - Brute-force RCU-bh grace period
  *
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 50697a1..36d6431 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -342,6 +342,43 @@ void *__kmalloc(size_t size, gfp_t flags) 
__assume_kmalloc_alignment __malloc;
 void *kmem_cache_alloc(struct kmem_cache *, g

[PATCH] Move kfree_call_rcu() to slab_common.c

2017-12-21 Thread rao . shoaib

From: Rao Shoaib 

This patch moves kfree_call_rcu() and related macros out of rcu code. A new
function __call_rcu_lazy() is created for calling __call_rcu() with the lazy
flag. Also moving macros generated following checkpatch noise. I do not know
how to silence checkpatch as there is nothing wrong.

CHECK: Macro argument reuse 'offset' - possible side-effects?
#91: FILE: include/linux/slab.h:348:
+#define __kfree_rcu(head, offset) \
+   do { \
+   BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
+   kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
+   } while (0)

CHECK: Macro argument reuse 'ptr' - possible side-effects?
#123: FILE: include/linux/slab.h:380:
+#define kfree_rcu(ptr, rcu_head)   \
+   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))

CHECK: Macro argument reuse 'rcu_head' - possible side-effects?
#123: FILE: include/linux/slab.h:380:
+#define kfree_rcu(ptr, rcu_head)   \
+   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))

total: 0 errors, 0 warnings, 3 checks, 156 lines checked


Signed-off-by: Rao Shoaib 
---
 include/linux/rcupdate.h | 41 ++---
 include/linux/rcutree.h  |  2 --
 include/linux/slab.h | 37 +
 kernel/rcu/tree.c| 24 ++--
 mm/slab_common.c | 10 ++
 5 files changed, 59 insertions(+), 55 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index a6ddc42..d2c25d8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -55,6 +55,8 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
 #definecall_rcucall_rcu_sched
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
+void __call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
+
 void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
 void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
 void synchronize_sched(void);
@@ -838,45 +840,6 @@ static inline notrace void 
rcu_read_unlock_sched_notrace(void)
 #define __is_kfree_rcu_offset(offset) ((offset) < 4096)
 
 /*
- * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
- */
-#define __kfree_rcu(head, offset) \
-   do { \
-   BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
-   kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
-   } while (0)
-
-/**
- * kfree_rcu() - kfree an object after a grace period.
- * @ptr:   pointer to kfree
- * @rcu_head:  the name of the struct rcu_head within the type of @ptr.
- *
- * Many rcu callbacks functions just call kfree() on the base structure.
- * These functions are trivial, but their size adds up, and furthermore
- * when they are used in a kernel module, that module must invoke the
- * high-latency rcu_barrier() function at module-unload time.
- *
- * The kfree_rcu() function handles this issue.  Rather than encoding a
- * function address in the embedded rcu_head structure, kfree_rcu() instead
- * encodes the offset of the rcu_head structure within the base structure.
- * Because the functions are not allowed in the low-order 4096 bytes of
- * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
- * If the offset is larger than 4095 bytes, a compile-time error will
- * be generated in __kfree_rcu().  If this error is triggered, you can
- * either fall back to use of call_rcu() or rearrange the structure to
- * position the rcu_head structure into the first 4096 bytes.
- *
- * Note that the allowable offset might decrease in the future, for example,
- * to allow something like kmem_cache_free_rcu().
- *
- * The BUILD_BUG_ON check must not involve any function calls, hence the
- * checks are done in macros here.
- */
-#define kfree_rcu(ptr, rcu_head)   \
-   __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
-
-
-/*
  * Place this after a lock-acquisition primitive to guarantee that
  * an UNLOCK+LOCK pair acts as a full barrier.  This guarantee applies
  * if the UNLOCK and LOCK are executed by the same CPU or if the
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 37d6fd3..7746b19 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -48,8 +48,6 @@ void synchronize_rcu_bh(void);
 void synchronize_sched_expedited(void);
 void synchronize_rcu_expedited(void);
 
-void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
-
 /**
  * synchronize_rcu_bh_expedited - Brute-force RCU-bh grace period
  *
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 50697a1..36d6431 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -342,6 +342,43 @@ void *__kmalloc(size_t size, gfp_t flags) 
__assume_kmalloc_alignment __malloc;
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t flags) 
__assume_slab_alignment __malloc;
 voi

Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread Rao Shoaib




On 12/19/2017 02:12 PM, Matthew Wilcox wrote:

On Tue, Dec 19, 2017 at 09:41:58PM +0100, Jesper Dangaard Brouer wrote:

If I had to implement this: I would choose to do the optimization in
__rcu_process_callbacks() create small on-call-stack ptr-array for
kfree_bulk().  I would only optimize the case that call kfree()
directly.  In the while(list) loop I would defer calling
__rcu_reclaim() for __is_kfree_rcu_offset(head->func), and instead add
them to the ptr-array (and flush if the array is full in loop, and
kfree_bulk flush after loop).

The real advantage of kfree_bulk() comes from amortizing the per kfree
(behind-the-scenes) sync cost.  There is an additional benefit, because
objects comes from RCU and will hit a slower path in SLUB.   The SLUB
allocator is very fast for objects that gets recycled quickly (short
lifetime), non-locked (cpu-local) double-cmpxchg.  But slower for
longer-lived/more-outstanding objects, as this hits a slower code-path,
fully locked (cross-cpu) double-cmpxchg.

Something like this ...  (compile tested only)

Considerably less code; Rao, what do you think?

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 59c471de342a..5ac4ed077233 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -174,20 +174,19 @@ static inline void debug_rcu_head_unqueue(struct rcu_head 
*head)
  }
  #endif/* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
  
-void kfree(const void *);

-
  /*
   * Reclaim the specified callback, either by invoking it (non-lazy case)
   * or freeing it directly (lazy case).  Return true if lazy, false otherwise.
   */
-static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
+static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head, void 
**kfree,
+   unsigned int *idx)
  {
unsigned long offset = (unsigned long)head->func;
  
  	rcu_lock_acquire(_callback_map);

if (__is_kfree_rcu_offset(offset)) {
RCU_TRACE(trace_rcu_invoke_kfree_callback(rn, head, offset);)
-   kfree((void *)head - offset);
+   kfree[*idx++] = (void *)head - offset;
rcu_lock_release(_callback_map);
return true;
} else {
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f9c0ca2ccf0c..7e13979b4697 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2725,6 +2725,8 @@ static void rcu_do_batch(struct rcu_state *rsp, struct 
rcu_data *rdp)
struct rcu_head *rhp;
struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
long bl, count;
+   void *to_free[16];
+   unsigned int to_free_idx = 0;
  
  	/* If no callbacks are ready, just return. */

if (!rcu_segcblist_ready_cbs(>cblist)) {
@@ -2755,8 +2757,10 @@ static void rcu_do_batch(struct rcu_state *rsp, struct 
rcu_data *rdp)
rhp = rcu_cblist_dequeue();
for (; rhp; rhp = rcu_cblist_dequeue()) {
debug_rcu_head_unqueue(rhp);
-   if (__rcu_reclaim(rsp->name, rhp))
+   if (__rcu_reclaim(rsp->name, rhp, to_free, _free_idx))
rcu_cblist_dequeued_lazy();
+   if (to_free_idx == 16)
+   kfree_bulk(16, to_free);
/*
 * Stop only if limit reached and CPU has something to do.
 * Note: The rcl structure counts down from zero.
@@ -2766,6 +2770,8 @@ static void rcu_do_batch(struct rcu_state *rsp, struct 
rcu_data *rdp)
 (!is_idle_task(current) && !rcu_is_callbacks_kthread(
break;
}
+   if (to_free_idx)
+   kfree_bulk(to_free_idx, to_free);
  
  	local_irq_save(flags);

count = -rcl.len;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index db85ca3975f1..4127be06759b 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2189,6 +2189,8 @@ static int rcu_nocb_kthread(void *arg)
struct rcu_head *next;
struct rcu_head **tail;
struct rcu_data *rdp = arg;
+   void *to_free[16];
+   unsigned int to_free_idx = 0;
  
  	/* Each pass through this loop invokes one batch of callbacks */

for (;;) {
@@ -2226,13 +2228,18 @@ static int rcu_nocb_kthread(void *arg)
}
debug_rcu_head_unqueue(list);
local_bh_disable();
-   if (__rcu_reclaim(rdp->rsp->name, list))
+   if (__rcu_reclaim(rdp->rsp->name, list, to_free,
+   _free_idx))
cl++;
c++;
+   if (to_free_idx == 16)
+   kfree_bulk(16, to_free);
local_bh_enable();
cond_resched_rcu_qs();
list = next;
}
+   if (to_free_idx)
+

Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread Rao Shoaib




On 12/19/2017 02:12 PM, Matthew Wilcox wrote:

On Tue, Dec 19, 2017 at 09:41:58PM +0100, Jesper Dangaard Brouer wrote:

If I had to implement this: I would choose to do the optimization in
__rcu_process_callbacks() create small on-call-stack ptr-array for
kfree_bulk().  I would only optimize the case that call kfree()
directly.  In the while(list) loop I would defer calling
__rcu_reclaim() for __is_kfree_rcu_offset(head->func), and instead add
them to the ptr-array (and flush if the array is full in loop, and
kfree_bulk flush after loop).

The real advantage of kfree_bulk() comes from amortizing the per kfree
(behind-the-scenes) sync cost.  There is an additional benefit, because
objects comes from RCU and will hit a slower path in SLUB.   The SLUB
allocator is very fast for objects that gets recycled quickly (short
lifetime), non-locked (cpu-local) double-cmpxchg.  But slower for
longer-lived/more-outstanding objects, as this hits a slower code-path,
fully locked (cross-cpu) double-cmpxchg.

Something like this ...  (compile tested only)

Considerably less code; Rao, what do you think?

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 59c471de342a..5ac4ed077233 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -174,20 +174,19 @@ static inline void debug_rcu_head_unqueue(struct rcu_head 
*head)
  }
  #endif/* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
  
-void kfree(const void *);

-
  /*
   * Reclaim the specified callback, either by invoking it (non-lazy case)
   * or freeing it directly (lazy case).  Return true if lazy, false otherwise.
   */
-static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
+static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head, void 
**kfree,
+   unsigned int *idx)
  {
unsigned long offset = (unsigned long)head->func;
  
  	rcu_lock_acquire(_callback_map);

if (__is_kfree_rcu_offset(offset)) {
RCU_TRACE(trace_rcu_invoke_kfree_callback(rn, head, offset);)
-   kfree((void *)head - offset);
+   kfree[*idx++] = (void *)head - offset;
rcu_lock_release(_callback_map);
return true;
} else {
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f9c0ca2ccf0c..7e13979b4697 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2725,6 +2725,8 @@ static void rcu_do_batch(struct rcu_state *rsp, struct 
rcu_data *rdp)
struct rcu_head *rhp;
struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
long bl, count;
+   void *to_free[16];
+   unsigned int to_free_idx = 0;
  
  	/* If no callbacks are ready, just return. */

if (!rcu_segcblist_ready_cbs(>cblist)) {
@@ -2755,8 +2757,10 @@ static void rcu_do_batch(struct rcu_state *rsp, struct 
rcu_data *rdp)
rhp = rcu_cblist_dequeue();
for (; rhp; rhp = rcu_cblist_dequeue()) {
debug_rcu_head_unqueue(rhp);
-   if (__rcu_reclaim(rsp->name, rhp))
+   if (__rcu_reclaim(rsp->name, rhp, to_free, _free_idx))
rcu_cblist_dequeued_lazy();
+   if (to_free_idx == 16)
+   kfree_bulk(16, to_free);
/*
 * Stop only if limit reached and CPU has something to do.
 * Note: The rcl structure counts down from zero.
@@ -2766,6 +2770,8 @@ static void rcu_do_batch(struct rcu_state *rsp, struct 
rcu_data *rdp)
 (!is_idle_task(current) && !rcu_is_callbacks_kthread(
break;
}
+   if (to_free_idx)
+   kfree_bulk(to_free_idx, to_free);
  
  	local_irq_save(flags);

count = -rcl.len;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index db85ca3975f1..4127be06759b 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2189,6 +2189,8 @@ static int rcu_nocb_kthread(void *arg)
struct rcu_head *next;
struct rcu_head **tail;
struct rcu_data *rdp = arg;
+   void *to_free[16];
+   unsigned int to_free_idx = 0;
  
  	/* Each pass through this loop invokes one batch of callbacks */

for (;;) {
@@ -2226,13 +2228,18 @@ static int rcu_nocb_kthread(void *arg)
}
debug_rcu_head_unqueue(list);
local_bh_disable();
-   if (__rcu_reclaim(rdp->rsp->name, list))
+   if (__rcu_reclaim(rdp->rsp->name, list, to_free,
+   _free_idx))
cl++;
c++;
+   if (to_free_idx == 16)
+   kfree_bulk(16, to_free);
local_bh_enable();
cond_resched_rcu_qs();
list = next;
}
+   if (to_free_idx)
+

Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread Rao Shoaib




On 12/19/2017 12:41 PM, Jesper Dangaard Brouer wrote:

On Tue, 19 Dec 2017 09:52:27 -0800 rao.sho...@oracle.com wrote:


+/* Main RCU function that is called to free RCU structures */
+static void
+__rcu_bulk_free(struct rcu_head *head, rcu_callback_t func, int cpu, bool lazy)
+{
+   unsigned long offset;
+   void *ptr;
+   struct rcu_bulk_free *rbf;
+   struct rcu_bulk_free_container *rbfc = NULL;
+
+   rbf = this_cpu_ptr(_rbf);
+
+   if (unlikely(!rbf->rbf_init)) {
+   spin_lock_init(>rbf_lock);
+   rbf->rbf_cpu = smp_processor_id();
+   rbf->rbf_init = true;
+   }
+
+   /* hold lock to protect against other cpu's */
+   spin_lock_bh(>rbf_lock);

I'm not sure this will be faster.  Having to take a cross CPU lock here
(+ BH-disable) could cause scaling issues.   Hopefully this lock will
not be used intensively by other CPUs, right?


The current cost of __call_rcu() is a local_irq_save/restore (which is
quite expensive, but doesn't cause cross CPU chatter).

Later in __rcu_process_callbacks() we have a local_irq_save/restore for
the entire list, plus a per object cost doing local_bh_disable/enable.
And for each object we call __rcu_reclaim(), which in some cases
directly call kfree().


As Paul has pointed out the lock is a per-cpu lock, the only reason for 
another CPU to access this lock is if the rcu callbacks run on a 
different CPU and there is nothing the code can do to avoid that but 
that should be rare anyways.





If I had to implement this: I would choose to do the optimization in
__rcu_process_callbacks() create small on-call-stack ptr-array for
kfree_bulk().  I would only optimize the case that call kfree()
directly.  In the while(list) loop I would defer calling
__rcu_reclaim() for __is_kfree_rcu_offset(head->func), and instead add
them to the ptr-array (and flush if the array is full in loop, and
kfree_bulk flush after loop).
This is exactly what the current code is doing. It accumulates only the 
calls made to

__kfree_rcu(head, offset) ==> kfree_call_rcu() ==> __bulk_free_rcu

__kfree_rcu has a check to make sure that an offset is being passed.

When a function pointer is passed the caller has to call 
call_rcu/call_rcu_sched


Accumulating early avoids the individual cost of calling __call_rcu

Perhaps I do not understand your point.

Shoaib


The real advantage of kfree_bulk() comes from amortizing the per kfree
(behind-the-scenes) sync cost.  There is an additional benefit, because
objects comes from RCU and will hit a slower path in SLUB.   The SLUB
allocator is very fast for objects that gets recycled quickly (short
lifetime), non-locked (cpu-local) double-cmpxchg.  But slower for
longer-lived/more-outstanding objects, as this hits a slower code-path,
fully locked (cross-cpu) double-cmpxchg.


+
+   rbfc = rbf->rbf_container;
+
+   if (rbfc == NULL) {
+   if (rbf->rbf_cached_container == NULL) {
+   rbf->rbf_container =
+   kmalloc(sizeof(struct rcu_bulk_free_container),
+   GFP_ATOMIC);
+   rbf->rbf_container->rbfc_rbf = rbf;
+   } else {
+   rbf->rbf_container = rbf->rbf_cached_container;
+   rbf->rbf_container->rbfc_rbf = rbf;
+   cmpxchg(>rbf_cached_container,
+   rbf->rbf_cached_container, NULL);
+   }
+
+   if (unlikely(rbf->rbf_container == NULL)) {
+
+   /* Memory allocation failed maintain a list */
+
+   head->func = (void *)func;
+   head->next = rbf->rbf_list_head;
+   rbf->rbf_list_head = head;
+   rbf->rbf_list_size++;
+   if (rbf->rbf_list_size == RCU_MAX_ACCUMULATE_SIZE)
+   __rcu_bulk_schedule_list(rbf);
+
+   goto done;
+   }
+
+   rbfc = rbf->rbf_container;
+   rbfc->rbfc_entries = 0;
+
+   if (rbf->rbf_list_head != NULL)
+   __rcu_bulk_schedule_list(rbf);
+   }
+
+   offset = (unsigned long)func;
+   ptr = (void *)head - offset;
+
+   rbfc->rbfc_data[rbfc->rbfc_entries++] = ptr;
+   if (rbfc->rbfc_entries == RCU_MAX_ACCUMULATE_SIZE) {
+
+   WRITE_ONCE(rbf->rbf_container, NULL);
+   spin_unlock_bh(>rbf_lock);
+   call_rcu(>rbfc_rcu, __rcu_bulk_free_impl);
+   return;
+   }
+
+done:
+   if (!rbf->rbf_monitor) {
+
+   call_rcu(>rbf_rcu, __rcu_bulk_free_monitor);
+   rbf->rbf_monitor = true;
+   }
+
+   spin_unlock_bh(>rbf_lock);
+}

Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread Rao Shoaib




On 12/19/2017 12:41 PM, Jesper Dangaard Brouer wrote:

On Tue, 19 Dec 2017 09:52:27 -0800 rao.sho...@oracle.com wrote:


+/* Main RCU function that is called to free RCU structures */
+static void
+__rcu_bulk_free(struct rcu_head *head, rcu_callback_t func, int cpu, bool lazy)
+{
+   unsigned long offset;
+   void *ptr;
+   struct rcu_bulk_free *rbf;
+   struct rcu_bulk_free_container *rbfc = NULL;
+
+   rbf = this_cpu_ptr(_rbf);
+
+   if (unlikely(!rbf->rbf_init)) {
+   spin_lock_init(>rbf_lock);
+   rbf->rbf_cpu = smp_processor_id();
+   rbf->rbf_init = true;
+   }
+
+   /* hold lock to protect against other cpu's */
+   spin_lock_bh(>rbf_lock);

I'm not sure this will be faster.  Having to take a cross CPU lock here
(+ BH-disable) could cause scaling issues.   Hopefully this lock will
not be used intensively by other CPUs, right?


The current cost of __call_rcu() is a local_irq_save/restore (which is
quite expensive, but doesn't cause cross CPU chatter).

Later in __rcu_process_callbacks() we have a local_irq_save/restore for
the entire list, plus a per object cost doing local_bh_disable/enable.
And for each object we call __rcu_reclaim(), which in some cases
directly call kfree().


As Paul has pointed out the lock is a per-cpu lock, the only reason for 
another CPU to access this lock is if the rcu callbacks run on a 
different CPU and there is nothing the code can do to avoid that but 
that should be rare anyways.





If I had to implement this: I would choose to do the optimization in
__rcu_process_callbacks() create small on-call-stack ptr-array for
kfree_bulk().  I would only optimize the case that call kfree()
directly.  In the while(list) loop I would defer calling
__rcu_reclaim() for __is_kfree_rcu_offset(head->func), and instead add
them to the ptr-array (and flush if the array is full in loop, and
kfree_bulk flush after loop).
This is exactly what the current code is doing. It accumulates only the 
calls made to

__kfree_rcu(head, offset) ==> kfree_call_rcu() ==> __bulk_free_rcu

__kfree_rcu has a check to make sure that an offset is being passed.

When a function pointer is passed the caller has to call 
call_rcu/call_rcu_sched


Accumulating early avoids the individual cost of calling __call_rcu

Perhaps I do not understand your point.

Shoaib


The real advantage of kfree_bulk() comes from amortizing the per kfree
(behind-the-scenes) sync cost.  There is an additional benefit, because
objects comes from RCU and will hit a slower path in SLUB.   The SLUB
allocator is very fast for objects that gets recycled quickly (short
lifetime), non-locked (cpu-local) double-cmpxchg.  But slower for
longer-lived/more-outstanding objects, as this hits a slower code-path,
fully locked (cross-cpu) double-cmpxchg.


+
+   rbfc = rbf->rbf_container;
+
+   if (rbfc == NULL) {
+   if (rbf->rbf_cached_container == NULL) {
+   rbf->rbf_container =
+   kmalloc(sizeof(struct rcu_bulk_free_container),
+   GFP_ATOMIC);
+   rbf->rbf_container->rbfc_rbf = rbf;
+   } else {
+   rbf->rbf_container = rbf->rbf_cached_container;
+   rbf->rbf_container->rbfc_rbf = rbf;
+   cmpxchg(>rbf_cached_container,
+   rbf->rbf_cached_container, NULL);
+   }
+
+   if (unlikely(rbf->rbf_container == NULL)) {
+
+   /* Memory allocation failed maintain a list */
+
+   head->func = (void *)func;
+   head->next = rbf->rbf_list_head;
+   rbf->rbf_list_head = head;
+   rbf->rbf_list_size++;
+   if (rbf->rbf_list_size == RCU_MAX_ACCUMULATE_SIZE)
+   __rcu_bulk_schedule_list(rbf);
+
+   goto done;
+   }
+
+   rbfc = rbf->rbf_container;
+   rbfc->rbfc_entries = 0;
+
+   if (rbf->rbf_list_head != NULL)
+   __rcu_bulk_schedule_list(rbf);
+   }
+
+   offset = (unsigned long)func;
+   ptr = (void *)head - offset;
+
+   rbfc->rbfc_data[rbfc->rbfc_entries++] = ptr;
+   if (rbfc->rbfc_entries == RCU_MAX_ACCUMULATE_SIZE) {
+
+   WRITE_ONCE(rbf->rbf_container, NULL);
+   spin_unlock_bh(>rbf_lock);
+   call_rcu(>rbfc_rcu, __rcu_bulk_free_impl);
+   return;
+   }
+
+done:
+   if (!rbf->rbf_monitor) {
+
+   call_rcu(>rbf_rcu, __rcu_bulk_free_monitor);
+   rbf->rbf_monitor = true;
+   }
+
+   spin_unlock_bh(>rbf_lock);
+}

Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread Rao Shoaib




On 12/19/2017 11:33 AM, Christopher Lameter wrote:

On Tue, 19 Dec 2017, rao.sho...@oracle.com wrote:


This patch updates kfree_rcu to use new bulk memory free functions as they
are more efficient. It also moves kfree_call_rcu() out of rcu related code to
mm/slab_common.c

It would be great to have separate patches so that we can review it
properly:

1. Move the code into slab_common.c
2. The actual code changes to the kfree rcu mechanism
3. The whitespace changes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
I can certainly break down the patch and submit smaller patches as you 
have suggested.


BTW -- This is my first ever patch to Linux, so I am still learning the 
etiquette.


Shoaib

Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread Rao Shoaib




On 12/19/2017 11:33 AM, Christopher Lameter wrote:

On Tue, 19 Dec 2017, rao.sho...@oracle.com wrote:


This patch updates kfree_rcu to use new bulk memory free functions as they
are more efficient. It also moves kfree_call_rcu() out of rcu related code to
mm/slab_common.c

It would be great to have separate patches so that we can review it
properly:

1. Move the code into slab_common.c
2. The actual code changes to the kfree rcu mechanism
3. The whitespace changes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
I can certainly break down the patch and submit smaller patches as you 
have suggested.


BTW -- This is my first ever patch to Linux, so I am still learning the 
etiquette.


Shoaib

Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread Rao Shoaib




On 12/19/2017 11:30 AM, Matthew Wilcox wrote:

On Tue, Dec 19, 2017 at 09:52:27AM -0800, rao.sho...@oracle.com wrote:

@@ -129,6 +130,7 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t 
flags, size_t nr,
  
  	for (i = 0; i < nr; i++) {

void *x = p[i] = kmem_cache_alloc(s, flags);
+
if (!x) {
__kmem_cache_free_bulk(s, i, p);
return 0;

Don't mix whitespace changes with significant patches.

OK.



+/* Main RCU function that is called to free RCU structures */
+static void
+__rcu_bulk_free(struct rcu_head *head, rcu_callback_t func, int cpu, bool lazy)
+{
+   unsigned long offset;
+   void *ptr;
+   struct rcu_bulk_free *rbf;
+   struct rcu_bulk_free_container *rbfc = NULL;
+
+   rbf = this_cpu_ptr(_rbf);
+
+   if (unlikely(!rbf->rbf_init)) {
+   spin_lock_init(>rbf_lock);
+   rbf->rbf_cpu = smp_processor_id();
+   rbf->rbf_init = true;
+   }
+
+   /* hold lock to protect against other cpu's */
+   spin_lock_bh(>rbf_lock);

Are you sure we can't call kfree_rcu() from interrupt context?
I thought about it, but the interrupts are off due to acquiring the 
lock. No ?



+   rbfc = rbf->rbf_container;
+   rbfc->rbfc_entries = 0;
+
+   if (rbf->rbf_list_head != NULL)
+   __rcu_bulk_schedule_list(rbf);

You've broken RCU.  Consider this scenario:

Thread 1Thread 2Thread 3
kfree_rcu(a)
schedule()
schedule()  
gets pointer to b
kfree_rcu(b)
processes rcu callbacks
uses b

Thread 3 will free a and also free b, so thread 2 is going to use freed
memory and go splat.  You can't batch up memory to be freed without
taking into account the grace periods.
The code does not change the grace period at all. In fact it adds to the 
grace period.
The free's are accumulated in an array, when a certain limit/time is 
reached the frees are submitted
to RCU for freeing. So the grace period is maintained starting from the 
time of the last free.


In case the memory allocation fails the code uses a list that is also 
submitted to RCU for freeing.


It might make sense for RCU to batch up all the memory it's going to free
in a single grace period, and hand it all off to slub at once, but that's
not what you've done here.
I am kind of doing that but not on a per grace period but on a per cpu 
basis.



I've been doing a lot of thinking about this because I really want a
way to kfree_rcu() an object without embedding a struct rcu_head in it.
But I see no way to do that today; even if we have an external memory
allocation to point to the object to be freed, we have to keep track of
the grace periods.
I am not sure I understand. If you had external memory you can easily do 
that.
I am exactly doing that, the only reason the RCU structure is needed is 
to get the pointer to the object being freed.


Shoaib

Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread Rao Shoaib




On 12/19/2017 11:30 AM, Matthew Wilcox wrote:

On Tue, Dec 19, 2017 at 09:52:27AM -0800, rao.sho...@oracle.com wrote:

@@ -129,6 +130,7 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t 
flags, size_t nr,
  
  	for (i = 0; i < nr; i++) {

void *x = p[i] = kmem_cache_alloc(s, flags);
+
if (!x) {
__kmem_cache_free_bulk(s, i, p);
return 0;

Don't mix whitespace changes with significant patches.

OK.



+/* Main RCU function that is called to free RCU structures */
+static void
+__rcu_bulk_free(struct rcu_head *head, rcu_callback_t func, int cpu, bool lazy)
+{
+   unsigned long offset;
+   void *ptr;
+   struct rcu_bulk_free *rbf;
+   struct rcu_bulk_free_container *rbfc = NULL;
+
+   rbf = this_cpu_ptr(_rbf);
+
+   if (unlikely(!rbf->rbf_init)) {
+   spin_lock_init(>rbf_lock);
+   rbf->rbf_cpu = smp_processor_id();
+   rbf->rbf_init = true;
+   }
+
+   /* hold lock to protect against other cpu's */
+   spin_lock_bh(>rbf_lock);

Are you sure we can't call kfree_rcu() from interrupt context?
I thought about it, but the interrupts are off due to acquiring the 
lock. No ?



+   rbfc = rbf->rbf_container;
+   rbfc->rbfc_entries = 0;
+
+   if (rbf->rbf_list_head != NULL)
+   __rcu_bulk_schedule_list(rbf);

You've broken RCU.  Consider this scenario:

Thread 1Thread 2Thread 3
kfree_rcu(a)
schedule()
schedule()  
gets pointer to b
kfree_rcu(b)
processes rcu callbacks
uses b

Thread 3 will free a and also free b, so thread 2 is going to use freed
memory and go splat.  You can't batch up memory to be freed without
taking into account the grace periods.
The code does not change the grace period at all. In fact it adds to the 
grace period.
The free's are accumulated in an array, when a certain limit/time is 
reached the frees are submitted
to RCU for freeing. So the grace period is maintained starting from the 
time of the last free.


In case the memory allocation fails the code uses a list that is also 
submitted to RCU for freeing.


It might make sense for RCU to batch up all the memory it's going to free
in a single grace period, and hand it all off to slub at once, but that's
not what you've done here.
I am kind of doing that but not on a per grace period but on a per cpu 
basis.



I've been doing a lot of thinking about this because I really want a
way to kfree_rcu() an object without embedding a struct rcu_head in it.
But I see no way to do that today; even if we have an external memory
allocation to point to the object to be freed, we have to keep track of
the grace periods.
I am not sure I understand. If you had external memory you can easily do 
that.
I am exactly doing that, the only reason the RCU structure is needed is 
to get the pointer to the object being freed.


Shoaib

Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread Rao Shoaib




On 12/19/2017 11:12 AM, Matthew Wilcox wrote:

On Tue, Dec 19, 2017 at 09:52:27AM -0800, rao.sho...@oracle.com wrote:

This patch updates kfree_rcu to use new bulk memory free functions as they
are more efficient. It also moves kfree_call_rcu() out of rcu related code to
mm/slab_common.c

Signed-off-by: Rao Shoaib <rao.sho...@oracle.com>
---
  include/linux/mm.h |   5 ++
  kernel/rcu/tree.c  |  14 
  kernel/sysctl.c|  40 +++
  mm/slab.h  |  23 +++
  mm/slab_common.c   | 198 -
  5 files changed, 264 insertions(+), 16 deletions(-)

You've added an awful lot of code.  Do you have any performance measurements
that shows this to be a win?
I did some micro benchmarking when I was developing the code and did see 
performance gains -- see attached.


I tried several networking benchmarks but was not able to get any 
improvement . The reason is that these benchmarks do not exercise the 
code we are improving. So I looked at the kernel source for users of  
kfree_rcu().  It turns out that directory deletion code calls  kfree_rcu 
to free the data structure when an entry is deleted. Based on that I 
created two benchmarks.


1) make_dirs -- This benchmark creates multi level directory structure 
and than deletes it. It's the delete part  where we see the performance 
gain of about 8.3%. The creation time remains same.


This benchmark was derived from fdtree benchmark at 
https://computing.llnl.gov/?set=code=sio_downloads ==> 
https://github.com/llnl/fdtree


2) tsock -- I also noticed that a socket has an entry in a directory and 
when the socket is closed the directory entry is deleted. So I wrote a 
simple benchmark that goes in a loop a million times and opens and 
closes 10 sockets per iteration. This shows an improvement of 7.6%


I have attached the benchmarks and results. Unchanged results are for 
stock kernel, Changed are for the modified kernel.


Shoaib


make_dirs.tar
Description: Unix tar archive


tsock.tar
Description: Unix tar archive

Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread Rao Shoaib




On 12/19/2017 11:12 AM, Matthew Wilcox wrote:

On Tue, Dec 19, 2017 at 09:52:27AM -0800, rao.sho...@oracle.com wrote:

This patch updates kfree_rcu to use new bulk memory free functions as they
are more efficient. It also moves kfree_call_rcu() out of rcu related code to
mm/slab_common.c

Signed-off-by: Rao Shoaib 
---
  include/linux/mm.h |   5 ++
  kernel/rcu/tree.c  |  14 
  kernel/sysctl.c|  40 +++
  mm/slab.h  |  23 +++
  mm/slab_common.c   | 198 -
  5 files changed, 264 insertions(+), 16 deletions(-)

You've added an awful lot of code.  Do you have any performance measurements
that shows this to be a win?
I did some micro benchmarking when I was developing the code and did see 
performance gains -- see attached.


I tried several networking benchmarks but was not able to get any 
improvement . The reason is that these benchmarks do not exercise the 
code we are improving. So I looked at the kernel source for users of  
kfree_rcu().  It turns out that directory deletion code calls  kfree_rcu 
to free the data structure when an entry is deleted. Based on that I 
created two benchmarks.


1) make_dirs -- This benchmark creates multi level directory structure 
and than deletes it. It's the delete part  where we see the performance 
gain of about 8.3%. The creation time remains same.


This benchmark was derived from fdtree benchmark at 
https://computing.llnl.gov/?set=code=sio_downloads ==> 
https://github.com/llnl/fdtree


2) tsock -- I also noticed that a socket has an entry in a directory and 
when the socket is closed the directory entry is deleted. So I wrote a 
simple benchmark that goes in a loop a million times and opens and 
closes 10 sockets per iteration. This shows an improvement of 7.6%


I have attached the benchmarks and results. Unchanged results are for 
stock kernel, Changed are for the modified kernel.


Shoaib


make_dirs.tar
Description: Unix tar archive


tsock.tar
Description: Unix tar archive

[PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread rao . shoaib

From: Rao Shoaib <rao.sho...@oracle.com>

This patch updates kfree_rcu to use new bulk memory free functions as they
are more efficient. It also moves kfree_call_rcu() out of rcu related code to
mm/slab_common.c

Signed-off-by: Rao Shoaib <rao.sho...@oracle.com>
---
 include/linux/mm.h |   5 ++
 kernel/rcu/tree.c  |  14 
 kernel/sysctl.c|  40 +++
 mm/slab.h  |  23 +++
 mm/slab_common.c   | 198 -
 5 files changed, 264 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ea818ff..8ae4f25 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2669,5 +2669,10 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_kfree_rcu_drain_limit;
+extern int sysctl_kfree_rcu_poll_limit;
+extern int sysctl_kfree_rcu_empty_limit;
+extern int sysctl_kfree_rcu_caching_allowed;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f9c0ca2..69951ef 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3209,20 +3209,6 @@ void call_rcu_bh(struct rcu_head *head, rcu_callback_t 
func)
 EXPORT_SYMBOL_GPL(call_rcu_bh);
 
 /*
- * Queue an RCU callback for lazy invocation after a grace period.
- * This will likely be later named something like "call_rcu_lazy()",
- * but this change will require some way of tagging the lazy RCU
- * callbacks in the list of pending callbacks. Until then, this
- * function may only be called from __kfree_rcu().
- */
-void kfree_call_rcu(struct rcu_head *head,
-   rcu_callback_t func)
-{
-   __call_rcu(head, func, rcu_state_p, -1, 1);
-}
-EXPORT_SYMBOL_GPL(kfree_call_rcu);
-
-/*
  * Because a context switch is a grace period for RCU-sched and RCU-bh,
  * any blocking grace-period wait automatically implies a grace period
  * if there is only one CPU online at any point time during execution
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 557d467..47b48f7 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1655,6 +1655,46 @@ static struct ctl_table vm_table[] = {
.extra2 = (void *)_rnd_compat_bits_max,
},
 #endif
+   {
+   .procname   = "kfree_rcu_drain_limit",
+   .data   = _kfree_rcu_drain_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_drain_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_poll_limit",
+   .data   = _kfree_rcu_poll_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_poll_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_empty_limit",
+   .data   = _kfree_rcu_empty_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_empty_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
+   {
+   .procname   = "kfree_rcu_caching_allowed",
+   .data   = _kfree_rcu_caching_allowed,
+   .maxlen = sizeof(sysctl_kfree_rcu_caching_allowed),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
{ }
 };
 
diff --git a/mm/slab.h b/mm/slab.h
index ad657ff..2541f70 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -78,6 +78,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
 } kmalloc_info[];
 
+#defineRCU_MAX_ACCUMULATE_SIZE 25
+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head *rbf_list_head;
+   int rbf_list_size;
+   int rbf_cpu;
+   int rbf_empty;
+   int rbf_polled;
+   boolrbf_init;
+   boolrbf_monitor;
+};
+
 unsigned long calculate_alignment(slab_flags_t flags,
unsigned long align, unsigned long size);
 
diff --git a/mm/

[PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread rao . shoaib

From: Rao Shoaib 

This patch updates kfree_rcu to use new bulk memory free functions as they
are more efficient. It also moves kfree_call_rcu() out of rcu related code to
mm/slab_common.c

Signed-off-by: Rao Shoaib 
---
 include/linux/mm.h |   5 ++
 kernel/rcu/tree.c  |  14 
 kernel/sysctl.c|  40 +++
 mm/slab.h  |  23 +++
 mm/slab_common.c   | 198 -
 5 files changed, 264 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ea818ff..8ae4f25 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2669,5 +2669,10 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_kfree_rcu_drain_limit;
+extern int sysctl_kfree_rcu_poll_limit;
+extern int sysctl_kfree_rcu_empty_limit;
+extern int sysctl_kfree_rcu_caching_allowed;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f9c0ca2..69951ef 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3209,20 +3209,6 @@ void call_rcu_bh(struct rcu_head *head, rcu_callback_t 
func)
 EXPORT_SYMBOL_GPL(call_rcu_bh);
 
 /*
- * Queue an RCU callback for lazy invocation after a grace period.
- * This will likely be later named something like "call_rcu_lazy()",
- * but this change will require some way of tagging the lazy RCU
- * callbacks in the list of pending callbacks. Until then, this
- * function may only be called from __kfree_rcu().
- */
-void kfree_call_rcu(struct rcu_head *head,
-   rcu_callback_t func)
-{
-   __call_rcu(head, func, rcu_state_p, -1, 1);
-}
-EXPORT_SYMBOL_GPL(kfree_call_rcu);
-
-/*
  * Because a context switch is a grace period for RCU-sched and RCU-bh,
  * any blocking grace-period wait automatically implies a grace period
  * if there is only one CPU online at any point time during execution
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 557d467..47b48f7 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1655,6 +1655,46 @@ static struct ctl_table vm_table[] = {
.extra2 = (void *)_rnd_compat_bits_max,
},
 #endif
+   {
+   .procname   = "kfree_rcu_drain_limit",
+   .data   = _kfree_rcu_drain_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_drain_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_poll_limit",
+   .data   = _kfree_rcu_poll_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_poll_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred,
+   },
+
+   {
+   .procname   = "kfree_rcu_empty_limit",
+   .data   = _kfree_rcu_empty_limit,
+   .maxlen = sizeof(sysctl_kfree_rcu_empty_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
+   {
+   .procname   = "kfree_rcu_caching_allowed",
+   .data   = _kfree_rcu_caching_allowed,
+   .maxlen = sizeof(sysctl_kfree_rcu_caching_allowed),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+
{ }
 };
 
diff --git a/mm/slab.h b/mm/slab.h
index ad657ff..2541f70 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -78,6 +78,29 @@ extern const struct kmalloc_info_struct {
unsigned long size;
 } kmalloc_info[];
 
+#defineRCU_MAX_ACCUMULATE_SIZE 25
+
+struct rcu_bulk_free_container {
+   struct  rcu_head rbfc_rcu;
+   int rbfc_entries;
+   void*rbfc_data[RCU_MAX_ACCUMULATE_SIZE];
+   struct  rcu_bulk_free *rbfc_rbf;
+};
+
+struct rcu_bulk_free {
+   struct  rcu_head rbf_rcu; /* used to schedule monitor process */
+   spinlock_t  rbf_lock;
+   struct  rcu_bulk_free_container *rbf_container;
+   struct  rcu_bulk_free_container *rbf_cached_container;
+   struct  rcu_head *rbf_list_head;
+   int rbf_list_size;
+   int rbf_cpu;
+   int rbf_empty;
+   int rbf_polled;
+   boolrbf_init;
+   boolrbf_monitor;
+};
+
 unsigned long calculate_alignment(slab_flags_t flags,
unsigned long align, unsigned long size);
 
diff --git a/mm/slab_common.c b/mm/slab_common.c
index c8cb367..06fd12c 10

57 matches

Mail list logo