Re: [PATCH v4] IB/sa: Resolving use-after-free in ib_nl_send_msg

2020-07-07 Thread Divya Indi
Thanks Jason.

Appreciate your help and feedback for fixing this issue.

Would it be possible to access the edited version of the patch?
If yes, please share a pointer to the same.

Thanks,
Divya


On 7/2/20 12:07 PM, Jason Gunthorpe wrote:
> On Tue, Jun 23, 2020 at 07:13:09PM -0700, Divya Indi wrote:
>> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list 
>> before sending")'
>> -
>> 1. Adds the query to the request list before ib_nl_snd_msg.
>> 2. Moves ib_nl_send_msg out of spinlock, hence safe to use gfp_mask as is.
>>
>> However, if there is a delay in sending out the request (For
>> eg: Delay due to low memory situation) the timer to handle request timeout
>> might kick in before the request is sent out to ibacm via netlink.
>> ib_nl_request_timeout may release the query causing a use after free 
>> situation
>> while accessing the query in ib_nl_send_msg.
>>
>> Call Trace for the above race:
>>
>> [] ? ib_pack+0x17b/0x240 [ib_core]
>> [] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
>> [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
>> [] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
>> [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
>> [rds_rdma]
>> [] rds_rdma_cm_event_handler_cmn+0x22c/0x850
>> [rds_rdma]
>> [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
>> [] addr_handler+0x9e/0x140 [rdma_cm]
>> [] process_req+0x134/0x190 [ib_addr]
>> [] process_one_work+0x169/0x4a0
>> [] worker_thread+0x5b/0x560
>> [] ? flush_delayed_work+0x50/0x50
>> [] kthread+0xcb/0xf0
>> [] ? __schedule+0x24a/0x810
>> [] ? __schedule+0x24a/0x810
>> [] ? kthread_create_on_node+0x180/0x180
>> [] ret_from_fork+0x47/0x90
>> [] ? kthread_create_on_node+0x180/0x180
>> 
>> RIP  [] send_mad+0x33d/0x5d0 [ib_sa]
>>
>> To resolve the above issue -
>> 1. Add the req to the request list only after the request has been sent out.
>> 2. To handle the race where response comes in before adding request to
>> the request list, send(rdma_nl_multicast) and add to list while holding the
>> spinlock - request_lock.
>> 3. Use non blocking memory allocation flags for rdma_nl_multicast since it is
>> called while holding a spinlock.
>>
>> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
>> before sending")
>>
>> Signed-off-by: Divya Indi 
>> ---
>> v1:
>> - Use flag IB_SA_NL_QUERY_SENT to prevent the use-after-free.
>>
>> v2:
>> - Use atomic bit ops for setting and testing IB_SA_NL_QUERY_SENT.
>> - Rewording and adding comments.
>>
>> v3:
>> - Change approach and remove usage of IB_SA_NL_QUERY_SENT.
>> - Add req to request list only after the request has been sent out.
>> - Send and add to list while holding the spinlock (request_lock).
>> - Overide gfp_mask and use GFP_NOWAIT for rdma_nl_multicast since we
>>   need non blocking memory allocation while holding spinlock.
>>
>> v4:
>> - Formatting changes.
>> - Use GFP_NOWAIT conditionally - Only when GFP_ATOMIC is not provided by 
>> caller.
>> ---
>>  drivers/infiniband/core/sa_query.c | 41 
>> ++
>>  1 file changed, 24 insertions(+), 17 deletions(-)
> I made a few edits, and applied to for-rc
>
> Thanks,
> Jason


Re: [PATCH v4] IB/sa: Resolving use-after-free in ib_nl_send_msg

2020-06-25 Thread Divya Indi
Hi Leon,

Please find my comments inline -

On 6/25/20 3:09 AM, Leon Romanovsky wrote:
> On Tue, Jun 23, 2020 at 07:13:09PM -0700, Divya Indi wrote:
>> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list 
>> before sending")'
>> -
>> 1. Adds the query to the request list before ib_nl_snd_msg.
>> 2. Moves ib_nl_send_msg out of spinlock, hence safe to use gfp_mask as is.
>>
>> However, if there is a delay in sending out the request (For
>> eg: Delay due to low memory situation) the timer to handle request timeout
>> might kick in before the request is sent out to ibacm via netlink.
>> ib_nl_request_timeout may release the query causing a use after free 
>> situation
>> while accessing the query in ib_nl_send_msg.
>>
>> Call Trace for the above race:
>>
>> [] ? ib_pack+0x17b/0x240 [ib_core]
>> [] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
>> [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
>> [] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
>> [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
>> [rds_rdma]
>> [] rds_rdma_cm_event_handler_cmn+0x22c/0x850
>> [rds_rdma]
>> [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
>> [] addr_handler+0x9e/0x140 [rdma_cm]
>> [] process_req+0x134/0x190 [ib_addr]
>> [] process_one_work+0x169/0x4a0
>> [] worker_thread+0x5b/0x560
>> [] ? flush_delayed_work+0x50/0x50
>> [] kthread+0xcb/0xf0
>> [] ? __schedule+0x24a/0x810
>> [] ? __schedule+0x24a/0x810
>> [] ? kthread_create_on_node+0x180/0x180
>> [] ret_from_fork+0x47/0x90
>> [] ? kthread_create_on_node+0x180/0x180
>> 
>> RIP  [] send_mad+0x33d/0x5d0 [ib_sa]
>>
>> To resolve the above issue -
>> 1. Add the req to the request list only after the request has been sent out.
>> 2. To handle the race where response comes in before adding request to
>> the request list, send(rdma_nl_multicast) and add to list while holding the
>> spinlock - request_lock.
>> 3. Use non blocking memory allocation flags for rdma_nl_multicast since it is
>> called while holding a spinlock.
>>
>> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
>> before sending")
>>
>> Signed-off-by: Divya Indi 
>> ---
>> v1:
>> - Use flag IB_SA_NL_QUERY_SENT to prevent the use-after-free.
>>
>> v2:
>> - Use atomic bit ops for setting and testing IB_SA_NL_QUERY_SENT.
>> - Rewording and adding comments.
>>
>> v3:
>> - Change approach and remove usage of IB_SA_NL_QUERY_SENT.
>> - Add req to request list only after the request has been sent out.
>> - Send and add to list while holding the spinlock (request_lock).
>> - Overide gfp_mask and use GFP_NOWAIT for rdma_nl_multicast since we
>>   need non blocking memory allocation while holding spinlock.
>>
>> v4:
>> - Formatting changes.
>> - Use GFP_NOWAIT conditionally - Only when GFP_ATOMIC is not provided by 
>> caller.
>> ---
>>  drivers/infiniband/core/sa_query.c | 41 
>> ++
>>  1 file changed, 24 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/sa_query.c 
>> b/drivers/infiniband/core/sa_query.c
>> index 74e0058..9066d48 100644
>> --- a/drivers/infiniband/core/sa_query.c
>> +++ b/drivers/infiniband/core/sa_query.c
>> @@ -836,6 +836,10 @@ static int ib_nl_send_msg(struct ib_sa_query *query, 
>> gfp_t gfp_mask)
>>  void *data;
>>  struct ib_sa_mad *mad;
>>  int len;
>> +unsigned long flags;
>> +unsigned long delay;
>> +gfp_t gfp_flag;
>> +int ret;
>>
>>  mad = query->mad_buf->mad;
>>  len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
>> @@ -860,36 +864,39 @@ static int ib_nl_send_msg(struct ib_sa_query *query, 
>> gfp_t gfp_mask)
>>  /* Repair the nlmsg header length */
>>  nlmsg_end(skb, nlh);
>>
>> -return rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, gfp_mask);
>> -}
>> +gfp_flag = ((gfp_mask & GFP_ATOMIC) == GFP_ATOMIC) ? GFP_ATOMIC :
>> +GFP_NOWAIT;
> I would say that the better way will be to write something like this:
> gfp_flag |= GFP_NOWAIT;

You mean gfp_flag = gfp_mask|GFP_NOWAIT? [We dont want to modify the gfp_mask 
sent by caller]

#define GFP_ATOMIC  (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
#define GFP_KERNEL  (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
#define GFP_NOWAIT  (__GFP_KSWAPD_RECLAIM)

If a caller passes GFP_KERNEL, "gfp_mask|GFP_NOWAIT" will still have 
__GFP_RECLAIM,
__GFP_IO and __GFP_FS set which is not suitable for using under spinlock.

Thanks,
Divya

>
> Thanks


[PATCH v4] IB/sa: Resolving use-after-free in ib_nl_send_msg

2020-06-23 Thread Divya Indi
Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before 
sending")'
-
1. Adds the query to the request list before ib_nl_snd_msg.
2. Moves ib_nl_send_msg out of spinlock, hence safe to use gfp_mask as is.

However, if there is a delay in sending out the request (For
eg: Delay due to low memory situation) the timer to handle request timeout
might kick in before the request is sent out to ibacm via netlink.
ib_nl_request_timeout may release the query causing a use after free situation
while accessing the query in ib_nl_send_msg.

Call Trace for the above race:

[] ? ib_pack+0x17b/0x240 [ib_core]
[] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
[] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
[] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
[] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
[rds_rdma]
[] rds_rdma_cm_event_handler_cmn+0x22c/0x850
[rds_rdma]
[] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
[] addr_handler+0x9e/0x140 [rdma_cm]
[] process_req+0x134/0x190 [ib_addr]
[] process_one_work+0x169/0x4a0
[] worker_thread+0x5b/0x560
[] ? flush_delayed_work+0x50/0x50
[] kthread+0xcb/0xf0
[] ? __schedule+0x24a/0x810
[] ? __schedule+0x24a/0x810
[] ? kthread_create_on_node+0x180/0x180
[] ret_from_fork+0x47/0x90
[] ? kthread_create_on_node+0x180/0x180

RIP  [] send_mad+0x33d/0x5d0 [ib_sa]

To resolve the above issue -
1. Add the req to the request list only after the request has been sent out.
2. To handle the race where response comes in before adding request to
the request list, send(rdma_nl_multicast) and add to list while holding the
spinlock - request_lock.
3. Use non blocking memory allocation flags for rdma_nl_multicast since it is
called while holding a spinlock.

Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
before sending")

Signed-off-by: Divya Indi 
---
v1:
- Use flag IB_SA_NL_QUERY_SENT to prevent the use-after-free.

v2:
- Use atomic bit ops for setting and testing IB_SA_NL_QUERY_SENT.
- Rewording and adding comments.

v3:
- Change approach and remove usage of IB_SA_NL_QUERY_SENT.
- Add req to request list only after the request has been sent out.
- Send and add to list while holding the spinlock (request_lock).
- Overide gfp_mask and use GFP_NOWAIT for rdma_nl_multicast since we
  need non blocking memory allocation while holding spinlock.

v4:
- Formatting changes.
- Use GFP_NOWAIT conditionally - Only when GFP_ATOMIC is not provided by caller.
---
 drivers/infiniband/core/sa_query.c | 41 ++
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c 
b/drivers/infiniband/core/sa_query.c
index 74e0058..9066d48 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -836,6 +836,10 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t 
gfp_mask)
void *data;
struct ib_sa_mad *mad;
int len;
+   unsigned long flags;
+   unsigned long delay;
+   gfp_t gfp_flag;
+   int ret;
 
mad = query->mad_buf->mad;
len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
@@ -860,36 +864,39 @@ static int ib_nl_send_msg(struct ib_sa_query *query, 
gfp_t gfp_mask)
/* Repair the nlmsg header length */
nlmsg_end(skb, nlh);
 
-   return rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, gfp_mask);
-}
+   gfp_flag = ((gfp_mask & GFP_ATOMIC) == GFP_ATOMIC) ? GFP_ATOMIC :
+   GFP_NOWAIT;
 
-static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask)
-{
-   unsigned long flags;
-   unsigned long delay;
-   int ret;
+   spin_lock_irqsave(_nl_request_lock, flags);
+   ret =  rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, gfp_flag);
 
-   INIT_LIST_HEAD(>list);
-   query->seq = (u32)atomic_inc_return(_nl_sa_request_seq);
+   if (ret)
+   goto out;
 
-   /* Put the request on the list first.*/
-   spin_lock_irqsave(_nl_request_lock, flags);
+   /* Put the request on the list.*/
delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
query->timeout = delay + jiffies;
list_add_tail(>list, _nl_request_list);
/* Start the timeout if this is the only request */
if (ib_nl_request_list.next == >list)
queue_delayed_work(ib_nl_wq, _nl_timed_work, delay);
+
+out:
spin_unlock_irqrestore(_nl_request_lock, flags);
 
+   return ret;
+}
+
+static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask)
+{
+   int ret;
+
+   INIT_LIST_HEAD(>list);
+   query->seq = (u32)atomic_inc_return(_nl_sa_request_seq);
+
ret = ib_nl_send_msg(query, gfp_mask);
-   if (ret) {
+   if (ret)
ret = -EIO;
-   /* Remove the request */
-   spin_lock_irqsave(_nl_request_lock, flags);
-   list_del(>list);
-   spin_unlock_irqrestore(_nl_request_lock, flags);
-   }
 
return ret;
 }
-- 
1.8.3.1



Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg

2020-06-19 Thread Divya Indi
Hi Jason,

Thanks for taking the time to review!

On 6/17/20 11:24 AM, Jason Gunthorpe wrote:
> On Tue, Jun 16, 2020 at 10:56:53AM -0700, Divya Indi wrote:
>> The other option might be to use GFP_NOWAIT conditionally ie
>> (only use GFP_NOWAIT when GFP_ATOMIC is not specified in gfp_mask else
>> use GFP_ATOMIC). Eventual goal being to not have a blocking memory 
>> allocation.
> This is probably safest for now, unless you can audit all callers and
> see if they can switch to GFP_NOWAIT as well

At present the callers with GFP_ATOMIC appear to be ipoib. Might not be
feasible to change them all to GFP_NOWAIT. 

Will incorporate the review comments and send out v3 early next week.

Thanks,
Divya

>
> Jason


Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg

2020-06-16 Thread Divya Indi
Hi Leon,

Please find my comments inline -

On 6/13/20 11:41 PM, Leon Romanovsky wrote:
> On Tue, Jun 09, 2020 at 07:45:21AM -0700, Divya Indi wrote:
>> Hi Leon,
>>
>> Thanks for taking the time to review.
>>
>> Please find my comments inline -
>>
>> On 6/9/20 12:00 AM, Leon Romanovsky wrote:
>>> On Mon, Jun 08, 2020 at 07:46:16AM -0700, Divya Indi wrote:
>>>> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list 
>>>> before sending")'
>>>> -
>>>> 1. Adds the query to the request list before ib_nl_snd_msg.
>>>> 2. Removes ib_nl_send_msg from within the spinlock which also makes it
>>>> possible to allocate memory with GFP_KERNEL.
>>>>
>>>> However, if there is a delay in sending out the request (For
>>>> eg: Delay due to low memory situation) the timer to handle request timeout
>>>> might kick in before the request is sent out to ibacm via netlink.
>>>> ib_nl_request_timeout may release the query causing a use after free 
>>>> situation
>>>> while accessing the query in ib_nl_send_msg.
>>>>
>>>> Call Trace for the above race:
>>>>
>>>> [] ? ib_pack+0x17b/0x240 [ib_core]
>>>> [] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
>>>> [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
>>>> [] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
>>>> [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
>>>> [rds_rdma]
>>>> [] rds_rdma_cm_event_handler_cmn+0x22c/0x850
>>>> [rds_rdma]
>>>> [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
>>>> [] addr_handler+0x9e/0x140 [rdma_cm]
>>>> [] process_req+0x134/0x190 [ib_addr]
>>>> [] process_one_work+0x169/0x4a0
>>>> [] worker_thread+0x5b/0x560
>>>> [] ? flush_delayed_work+0x50/0x50
>>>> [] kthread+0xcb/0xf0
>>>> [] ? __schedule+0x24a/0x810
>>>> [] ? __schedule+0x24a/0x810
>>>> [] ? kthread_create_on_node+0x180/0x180
>>>> [] ret_from_fork+0x47/0x90
>>>> [] ? kthread_create_on_node+0x180/0x180
>>>> 
>>>> RIP  [] send_mad+0x33d/0x5d0 [ib_sa]
>>>>
>>>> To resolve the above issue -
>>>> 1. Add the req to the request list only after the request has been sent 
>>>> out.
>>>> 2. To handle the race where response comes in before adding request to
>>>> the request list, send(rdma_nl_multicast) and add to list while holding the
>>>> spinlock - request_lock.
>>>> 3. Use GFP_NOWAIT for rdma_nl_multicast since it is called while holding
>>>> a spinlock. In case of memory allocation failure, request will go out to 
>>>> SA.
>>>>
>>>> Signed-off-by: Divya Indi 
>>>> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
>>>> before sending")
>>> Author SOB should be after "Fixes" line.
>> My bad. Noted.
>>
>>>> ---
>>>>  drivers/infiniband/core/sa_query.c | 34 +-
>>>>  1 file changed, 17 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/core/sa_query.c 
>>>> b/drivers/infiniband/core/sa_query.c
>>>> index 74e0058..042c99b 100644
>>>> --- a/drivers/infiniband/core/sa_query.c
>>>> +++ b/drivers/infiniband/core/sa_query.c
>>>> @@ -836,6 +836,9 @@ static int ib_nl_send_msg(struct ib_sa_query *query, 
>>>> gfp_t gfp_mask)
>>>>void *data;
>>>>struct ib_sa_mad *mad;
>>>>int len;
>>>> +  unsigned long flags;
>>>> +  unsigned long delay;
>>>> +  int ret;
>>>>
>>>>mad = query->mad_buf->mad;
>>>>len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
>>>> @@ -860,35 +863,32 @@ static int ib_nl_send_msg(struct ib_sa_query *query, 
>>>> gfp_t gfp_mask)
>>>>/* Repair the nlmsg header length */
>>>>nlmsg_end(skb, nlh);
>>>>
>>>> -  return rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, gfp_mask);
>>>> +  spin_lock_irqsave(_nl_request_lock, flags);
>>>> +  ret =  rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, GFP_NOWAIT);
>>> It is hard to be convinced that this is correct solution. The mix of
>>> gfp_flags and GFP_NOWAIT at the same time and usage of
>>> ib_nl_request_lock to protect lists and suddenly rdma_nl_multicast() too
>>&

Re: [PATCH v2] sample-trace-array: Fix sleeping function called from invalid context

2020-06-09 Thread Divya Indi
On 6/9/20 6:51 AM, Kefeng Wang wrote:
>  BUG: sleeping function called from invalid context at 
> kernel/locking/mutex.c:935
>  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/5
>  1 lock held by swapper/5/0:
>   #0: 80001002bd90 (samples/ftrace/sample-trace-array.c:38){+.-.}-{0:0}, 
> at: call_timer_fn+0x8/0x3e0
>  CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.7.0+ #8
>  Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>  Call trace:
>   dump_backtrace+0x0/0x1a0
>   show_stack+0x20/0x30
>   dump_stack+0xe4/0x150
>   ___might_sleep+0x160/0x200
>   __might_sleep+0x58/0x90
>   __mutex_lock+0x64/0x948
>   mutex_lock_nested+0x3c/0x58
>   __ftrace_set_clr_event+0x44/0x88
>   trace_array_set_clr_event+0x24/0x38
>   mytimer_handler+0x34/0x40 [sample_trace_array]
>
> mutex_lock() will be called in interrupt context, using workqueue to fix it.
Fixes:
> Signed-off-by: Kefeng Wang 

Reviewed-by: Divya Indi 

> ---
> v2:
> - add include of linux/workqueue.h
> - add missing cancel_work_sync() suggested by Divya Indi
>
>  samples/ftrace/sample-trace-array.c | 18 +-
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/samples/ftrace/sample-trace-array.c 
> b/samples/ftrace/sample-trace-array.c
> index d523450d73eb..9e437f930280 100644
> --- a/samples/ftrace/sample-trace-array.c
> +++ b/samples/ftrace/sample-trace-array.c
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * Any file that uses trace points, must include the header.
> @@ -20,6 +21,16 @@ struct trace_array *tr;
>  static void mytimer_handler(struct timer_list *unused);
>  static struct task_struct *simple_tsk;
>  
> +static void trace_work_fn(struct work_struct *work)
> +{
> + /*
> +  * Disable tracing for event "sample_event".
> +  */
> + trace_array_set_clr_event(tr, "sample-subsystem", "sample_event",
> + false);
> +}
> +static DECLARE_WORK(trace_work, trace_work_fn);
> +
>  /*
>   * mytimer: Timer setup to disable tracing for event "sample_event". This
>   * timer is only for the purposes of the sample module to demonstrate access 
> of
> @@ -29,11 +40,7 @@ static DEFINE_TIMER(mytimer, mytimer_handler);
>  
>  static void mytimer_handler(struct timer_list *unused)
>  {
> - /*
> -  * Disable tracing for event "sample_event".
> -  */
> - trace_array_set_clr_event(tr, "sample-subsystem", "sample_event",
> - false);
> + schedule_work(_work);
>  }
>  
>  static void simple_thread_func(int count)
> @@ -76,6 +83,7 @@ static int simple_thread(void *arg)
>   simple_thread_func(count++);
>  
>   del_timer();
> + cancel_work_sync(_work);
>  
>   /*
>* trace_array_put() decrements the reference counter associated with


Re: [PATCH] sample-trace-array: Remove trace_array 'sample-instance'

2020-06-09 Thread Divya Indi
Reviewed-by: Divya Indi 

On 6/9/20 6:52 AM, Kefeng Wang wrote:
> Remove trace_array 'sample-instance' if kthread_run fails
> in sample_trace_array_init().
>
> Signed-off-by: Kefeng Wang 
> ---
>  samples/ftrace/sample-trace-array.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/samples/ftrace/sample-trace-array.c 
> b/samples/ftrace/sample-trace-array.c
> index 9e437f930280..6aba02a31c96 100644
> --- a/samples/ftrace/sample-trace-array.c
> +++ b/samples/ftrace/sample-trace-array.c
> @@ -115,8 +115,12 @@ static int __init sample_trace_array_init(void)
>   trace_printk_init_buffers();
>  
>   simple_tsk = kthread_run(simple_thread, NULL, "sample-instance");
> - if (IS_ERR(simple_tsk))
> + if (IS_ERR(simple_tsk)) {
> + trace_array_put(tr);
> + trace_array_destroy(tr);
>   return -1;
> + }
> +
>   return 0;
>  }
>  


Re: Review Request

2020-06-09 Thread Divya Indi
Thanks Leon, Noted!

On 6/9/20 12:03 AM, Leon Romanovsky wrote:
> On Mon, Jun 08, 2020 at 07:46:15AM -0700, Divya Indi wrote:
>> [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
>>
>> Hi,
>>
>> Please review the patch that follows.
> Please read Documentation/process/submitting-patches.rst
> 14) The canonical patch format
>
> You don't need an extra email "Review request" and Changelog should be
> put inside the patch itself under "---" marker.
>
> Thanks


Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg

2020-06-09 Thread Divya Indi
Hi Leon,

Thanks for taking the time to review.

Please find my comments inline -

On 6/9/20 12:00 AM, Leon Romanovsky wrote:
> On Mon, Jun 08, 2020 at 07:46:16AM -0700, Divya Indi wrote:
>> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list 
>> before sending")'
>> -
>> 1. Adds the query to the request list before ib_nl_snd_msg.
>> 2. Removes ib_nl_send_msg from within the spinlock which also makes it
>> possible to allocate memory with GFP_KERNEL.
>>
>> However, if there is a delay in sending out the request (For
>> eg: Delay due to low memory situation) the timer to handle request timeout
>> might kick in before the request is sent out to ibacm via netlink.
>> ib_nl_request_timeout may release the query causing a use after free 
>> situation
>> while accessing the query in ib_nl_send_msg.
>>
>> Call Trace for the above race:
>>
>> [] ? ib_pack+0x17b/0x240 [ib_core]
>> [] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
>> [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
>> [] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
>> [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
>> [rds_rdma]
>> [] rds_rdma_cm_event_handler_cmn+0x22c/0x850
>> [rds_rdma]
>> [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
>> [] addr_handler+0x9e/0x140 [rdma_cm]
>> [] process_req+0x134/0x190 [ib_addr]
>> [] process_one_work+0x169/0x4a0
>> [] worker_thread+0x5b/0x560
>> [] ? flush_delayed_work+0x50/0x50
>> [] kthread+0xcb/0xf0
>> [] ? __schedule+0x24a/0x810
>> [] ? __schedule+0x24a/0x810
>> [] ? kthread_create_on_node+0x180/0x180
>> [] ret_from_fork+0x47/0x90
>> [] ? kthread_create_on_node+0x180/0x180
>> 
>> RIP  [] send_mad+0x33d/0x5d0 [ib_sa]
>>
>> To resolve the above issue -
>> 1. Add the req to the request list only after the request has been sent out.
>> 2. To handle the race where response comes in before adding request to
>> the request list, send(rdma_nl_multicast) and add to list while holding the
>> spinlock - request_lock.
>> 3. Use GFP_NOWAIT for rdma_nl_multicast since it is called while holding
>> a spinlock. In case of memory allocation failure, request will go out to SA.
>>
>> Signed-off-by: Divya Indi 
>> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
>> before sending")
> Author SOB should be after "Fixes" line.

My bad. Noted.

>
>> ---
>>  drivers/infiniband/core/sa_query.c | 34 +-
>>  1 file changed, 17 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/sa_query.c 
>> b/drivers/infiniband/core/sa_query.c
>> index 74e0058..042c99b 100644
>> --- a/drivers/infiniband/core/sa_query.c
>> +++ b/drivers/infiniband/core/sa_query.c
>> @@ -836,6 +836,9 @@ static int ib_nl_send_msg(struct ib_sa_query *query, 
>> gfp_t gfp_mask)
>>  void *data;
>>  struct ib_sa_mad *mad;
>>  int len;
>> +unsigned long flags;
>> +unsigned long delay;
>> +int ret;
>>
>>  mad = query->mad_buf->mad;
>>  len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
>> @@ -860,35 +863,32 @@ static int ib_nl_send_msg(struct ib_sa_query *query, 
>> gfp_t gfp_mask)
>>  /* Repair the nlmsg header length */
>>  nlmsg_end(skb, nlh);
>>
>> -return rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, gfp_mask);
>> +spin_lock_irqsave(_nl_request_lock, flags);
>> +ret =  rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, GFP_NOWAIT);
> It is hard to be convinced that this is correct solution. The mix of
> gfp_flags and GFP_NOWAIT at the same time and usage of
> ib_nl_request_lock to protect lists and suddenly rdma_nl_multicast() too
> makes this code unreadable/non-maintainable.

Prior to 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
before sending"), we had ib_nl_send_msg under the spinlock ib_nl_request_lock.

ie we had -

1. Get spinlock - ib_nl_request_lock 
2. ib_nl_send_msg
2.a) rdma_nl_multicast
3. Add request to the req list
4. Arm the timer if needed.
5. Release spinlock

However, ib_nl_send_msg involved a memory allocation using GFP_KERNEL.
hence, was moved out of the spinlock. In addition, req was now being 
added prior to ib_nl_send_msg [To handle the race where response can
come in before we get a chance to add the request back to the list].

This introduced another race resulting in use-after-free.[Described in the 
commit.]

To resolve this, sending out the request and adding it to list need to 
happen while holding the request_lock.  
To ensure minimum allocations whil

Re: [PATCH] sample-trace-array: Fix sleeping function called from invalid context

2020-06-08 Thread Divya Indi
Hi Kefeng,

Thanks for catching this issue.

Please find my comments line -

On 6/8/20 12:54 AM, Kefeng Wang wrote:
>  BUG: sleeping function called from invalid context at 
> kernel/locking/mutex.c:935
>  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/5
>  1 lock held by swapper/5/0:
>   #0: 80001002bd90 (samples/ftrace/sample-trace-array.c:38){+.-.}-{0:0}, 
> at: call_timer_fn+0x8/0x3e0
>  CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.7.0+ #8
>  Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>  Call trace:
>   dump_backtrace+0x0/0x1a0
>   show_stack+0x20/0x30
>   dump_stack+0xe4/0x150
>   ___might_sleep+0x160/0x200
>   __might_sleep+0x58/0x90
>   __mutex_lock+0x64/0x948
>   mutex_lock_nested+0x3c/0x58
>   __ftrace_set_clr_event+0x44/0x88
>   trace_array_set_clr_event+0x24/0x38
>   mytimer_handler+0x34/0x40 [sample_trace_array]
>
> mutex_lock() will be called in interrupt context, using workqueueu to fix it.

/s/workqueueu/workqueue

Fixes: 89ed42495ef4 ("tracing: Sample module to demonstrate kernel access to 
Ftrace instances.")

> Signed-off-by: Kefeng Wang 
> ---
>  samples/ftrace/sample-trace-array.c | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/samples/ftrace/sample-trace-array.c 
> b/samples/ftrace/sample-trace-array.c
> index d523450d73eb..41684c7dbd7b 100644
> --- a/samples/ftrace/sample-trace-array.c
> +++ b/samples/ftrace/sample-trace-array.c
> @@ -20,6 +20,16 @@ struct trace_array *tr;
>  static void mytimer_handler(struct timer_list *unused);
>  static struct task_struct *simple_tsk;
>  
> +static void trace_work_fn(struct work_struct *work)
> +{
> + /*
> +  * Disable tracing for event "sample_event".
> +  */
> + trace_array_set_clr_event(tr, "sample-subsystem", "sample_event",
> + false);
> +}
> +static DECLARE_WORK(trace_work, trace_work_fn);
> +
>  /*
>   * mytimer: Timer setup to disable tracing for event "sample_event". This
>   * timer is only for the purposes of the sample module to demonstrate access 
> of
> @@ -29,11 +39,7 @@ static DEFINE_TIMER(mytimer, mytimer_handler);
>  
>  static void mytimer_handler(struct timer_list *unused)
>  {
> - /*
> -  * Disable tracing for event "sample_event".
> -  */
> - trace_array_set_clr_event(tr, "sample-subsystem", "sample_event",
> - false);
> + schedule_work(_work);
>  }
>  
>  static void simple_thread_func(int count)

I think we also need to use cancel_work_sync() to handle the case -
1. Module unloaded
2. Timer already ran and scheduled work to disable trace event
3. When the work runs we no longer have the relevant trace array.

static int simple_thread(void *arg)
{
.

 del_timer();

  
.
 return 0;
}

Thanks,
Divya



Re: [PATCH] sample-trace-array: Fix sleeping function called from invalid context

2020-06-08 Thread Divya Indi
Hi Steve,

Sure, I am looking into it and reviewing the patch.

Thanks,
Divya

On 6/8/20 7:37 AM, Steven Rostedt wrote:
> On Mon, 8 Jun 2020 07:54:37 +
> Kefeng Wang  wrote:
>
>>  BUG: sleeping function called from invalid context at 
>> kernel/locking/mutex.c:935
>>  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/5
>>  1 lock held by swapper/5/0:
>>   #0: 80001002bd90 (samples/ftrace/sample-trace-array.c:38){+.-.}-{0:0}, 
>> at: call_timer_fn+0x8/0x3e0
>>  CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.7.0+ #8
>>  Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>  Call trace:
>>   dump_backtrace+0x0/0x1a0
>>   show_stack+0x20/0x30
>>   dump_stack+0xe4/0x150
>>   ___might_sleep+0x160/0x200
>>   __might_sleep+0x58/0x90
>>   __mutex_lock+0x64/0x948
>>   mutex_lock_nested+0x3c/0x58
>>   __ftrace_set_clr_event+0x44/0x88
>>   trace_array_set_clr_event+0x24/0x38
>>   mytimer_handler+0x34/0x40 [sample_trace_array]
>>
>> mutex_lock() will be called in interrupt context, using workqueueu to fix it.
>>
>> Signed-off-by: Kefeng Wang 
>>
> Divya,
>
> Can you give a Reviewed-by for this?
>
> -- Steve


[PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg

2020-06-08 Thread Divya Indi
Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before 
sending")'
-
1. Adds the query to the request list before ib_nl_snd_msg.
2. Removes ib_nl_send_msg from within the spinlock which also makes it
possible to allocate memory with GFP_KERNEL.

However, if there is a delay in sending out the request (For
eg: Delay due to low memory situation) the timer to handle request timeout
might kick in before the request is sent out to ibacm via netlink.
ib_nl_request_timeout may release the query causing a use after free situation
while accessing the query in ib_nl_send_msg.

Call Trace for the above race:

[] ? ib_pack+0x17b/0x240 [ib_core]
[] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
[] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
[] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
[] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
[rds_rdma]
[] rds_rdma_cm_event_handler_cmn+0x22c/0x850
[rds_rdma]
[] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
[] addr_handler+0x9e/0x140 [rdma_cm]
[] process_req+0x134/0x190 [ib_addr]
[] process_one_work+0x169/0x4a0
[] worker_thread+0x5b/0x560
[] ? flush_delayed_work+0x50/0x50
[] kthread+0xcb/0xf0
[] ? __schedule+0x24a/0x810
[] ? __schedule+0x24a/0x810
[] ? kthread_create_on_node+0x180/0x180
[] ret_from_fork+0x47/0x90
[] ? kthread_create_on_node+0x180/0x180

RIP  [] send_mad+0x33d/0x5d0 [ib_sa]

To resolve the above issue -
1. Add the req to the request list only after the request has been sent out.
2. To handle the race where response comes in before adding request to
the request list, send(rdma_nl_multicast) and add to list while holding the
spinlock - request_lock.
3. Use GFP_NOWAIT for rdma_nl_multicast since it is called while holding
a spinlock. In case of memory allocation failure, request will go out to SA.

Signed-off-by: Divya Indi 
Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list
before sending")
---
 drivers/infiniband/core/sa_query.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c 
b/drivers/infiniband/core/sa_query.c
index 74e0058..042c99b 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -836,6 +836,9 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t 
gfp_mask)
void *data;
struct ib_sa_mad *mad;
int len;
+   unsigned long flags;
+   unsigned long delay;
+   int ret;
 
mad = query->mad_buf->mad;
len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask);
@@ -860,35 +863,32 @@ static int ib_nl_send_msg(struct ib_sa_query *query, 
gfp_t gfp_mask)
/* Repair the nlmsg header length */
nlmsg_end(skb, nlh);
 
-   return rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, gfp_mask);
+   spin_lock_irqsave(_nl_request_lock, flags);
+   ret =  rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, GFP_NOWAIT);
+   if (!ret) {
+   /* Put the request on the list.*/
+   delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
+   query->timeout = delay + jiffies;
+   list_add_tail(>list, _nl_request_list);
+   /* Start the timeout if this is the only request */
+   if (ib_nl_request_list.next == >list)
+   queue_delayed_work(ib_nl_wq, _nl_timed_work, delay);
+   }
+   spin_unlock_irqrestore(_nl_request_lock, flags);
+
+   return ret;
 }
 
 static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask)
 {
-   unsigned long flags;
-   unsigned long delay;
int ret;
 
INIT_LIST_HEAD(>list);
query->seq = (u32)atomic_inc_return(_nl_sa_request_seq);
 
-   /* Put the request on the list first.*/
-   spin_lock_irqsave(_nl_request_lock, flags);
-   delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
-   query->timeout = delay + jiffies;
-   list_add_tail(>list, _nl_request_list);
-   /* Start the timeout if this is the only request */
-   if (ib_nl_request_list.next == >list)
-   queue_delayed_work(ib_nl_wq, _nl_timed_work, delay);
-   spin_unlock_irqrestore(_nl_request_lock, flags);
-
ret = ib_nl_send_msg(query, gfp_mask);
if (ret) {
ret = -EIO;
-   /* Remove the request */
-   spin_lock_irqsave(_nl_request_lock, flags);
-   list_del(>list);
-   spin_unlock_irqrestore(_nl_request_lock, flags);
}
 
return ret;
-- 
1.8.3.1



Review Request

2020-06-08 Thread Divya Indi
[PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg

Hi,

Please review the patch that follows. 

v3 addresses the previously raised concerns. Changes include -

1. To resolve the race where the timer can kick in before request
has been sent out, we now add the request to the list after sending out
the request.

2. To handle the race where the response can come in before we got a 
chance to add the req to the list, sending and adding the request to
request list is done under spinlock - request_lock.

3. To make sure there is no blocking op/delay while holding the spinlock,
using GFP_NOWAIT for memory allocation.

Thanks Jason for providing your valuable feedback. 

Let me know if you have any suggestions or concerns.

Thanks,
Divya


Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.

2020-05-19 Thread Divya Indi
Hi Jason,

I wanted to follow up to see if you got a chance to review the following reply?

Let me know if it addresses your concern and if you have any questions!

Thanks,
Divya

On 5/13/20 2:02 PM, Divya Indi wrote:
> Hi Jason,
>
> Please find my comments inline - 
>
> On 5/13/20 8:00 AM, Jason Gunthorpe wrote:
>> On Mon, May 11, 2020 at 02:26:30PM -0700, Divya Indi wrote:
>>>>> @@ -1123,6 +1156,18 @@ int ib_nl_handle_resolve_resp(struct sk_buff *skb,
>>>>>  
>>>>>   send_buf = query->mad_buf;
>>>>>  
>>>>> + /*
>>>>> +  * Make sure the IB_SA_NL_QUERY_SENT flag is set before
>>>>> +  * processing this query. If flag is not set, query can be accessed in
>>>>> +  * another context while setting the flag and processing the query will
>>>>> +  * eventually release it causing a possible use-after-free.
>>>>> +  */
>>>> This comment doesn't really make sense, flags insige the memory being
>>>> freed inherently can't prevent use after free.
>>> I can definitely re-phrase here to make things clearer. But, the idea here 
>>> is
>>> in the unlikely/rare case where a response for a query comes in before the 
>>> flag has been
>>> set in ib_nl_make_request, we want to wait for the flag to be sent before 
>>> proceeding. 
>>> The response handler will eventually release the query so this wait avoids 
>>> that if the flag has not been set
>>> else 
>>> "query->flags |= IB_SA_NL_QUERY_SENT;" 
>>> will be accessing a query which was freed due to the above mentioned race.
>>>
>>> It is unlikely since getting a response => We have actually sent out the 
>>> query to ibacm.
>>>
>>> How about this -
>>>
>>> "Getting a response is indicative of having sent out the query, but in an 
>>> unlikely race when 
>>> the response comes in before setting IB_SA_NL_QUERY_SENT, we need to wait 
>>> till the flag is set to
>>> avoid accessing a query that has been released."
>> It still makes no sense, a flag that is set before freeing the memory
>> is fundamentally useless to prevent races.
> Here the race is -
>
> 1. ib_nl_send_msg()
> 2. query->flags |= IB_SA_NL_QUERY_SENT
> 3. return;
>
> -
>
> response_handler() {
> wait till flag is set.
> 
> kfree(query);
>
> }
>
> Here, if the response handler was called => Query was sent
> and flag should have been set. However if response handler kicks in
> before line 2, we want to wait and make sure the flag is set and
> then free the query.
>
> Ideally after ib_nl_send_msg, we should not be accessing the query
> because we can be getting a response/timeout asynchronously and cant be
> sure when. 
>
> The only places we access a query after it is successfully sent [response 
> handler getting called
> => sending was successful] -
> 1. ib_nl_request_timeout
> 2. While setting the flag.
>
> 1. is taken care of because the request list access is protected by a lock
> and whoever gets the lock first deletes it from the request list and
> hence we can only have the response handler OR the timeout handler operate on 
> the
> query.
>
> 2. To handle this is why we wait in the response handler. Once the flag is
> set we are no longer accessing the query and hence safe to release it.
>
> I have modified the comment for v2 as follows -
>
>   /*
> +* In case of a quick response ie when a response comes in before
> +* setting IB_SA_NL_QUERY_SENT, we can have an unlikely race where the
> +* response handler will release the query, but we can still access 
> the
> +* freed query while setting the flag.
> +* Hence, before proceeding and eventually releasing the query -
> +* wait till the flag is set. The flag should be set soon since 
> getting
> +* a response is indicative of having successfully sent the query.
> +*/
>
>
> Thanks,
> Divya
>
>> Jason


Re: [PATCH] IB/sa: Fix use-after-free in ib_nl_send_msg()

2020-05-19 Thread Divya Indi
Hi Markus,

Thanks for taking the time to review.

On 5/15/20 9:25 AM, Markus Elfring wrote:
>> This patch fixes commit -
>> commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list 
>> before sending")'
>>
>> Above commit adds the query to the request list before ib_nl_snd_msg.
> I suggest to improve also this change description.
>
>   The query to the request list was added before ib_nl_snd_msg()
>   by the commit 3ebd2fd0d011 ("…").
Noted, will make this change.
>
>> This flag Indicates …
> … indicates …
Noted.
>
>> To handle the case where a response is received before we could set this
>> flag, the response handler waits for the flag to be
>> set before proceeding with the query.
> Please reconsider the word wrapping.
>
>   To handle the case where a response is received before we could set
>   this flag, the response handler waits for the flag to be set
>   before proceeding with the query.
>
>
> Would you like to add the tag “Fixes” to the commit message?

Noted!

Thanks,

Divya

>
> Regards,
> Markus


[PATCH] IB/sa: Resolving use-after-free in ib_nl_send_msg.

2020-05-14 Thread Divya Indi
This patch fixes commit -
commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before 
sending")'

Above commit adds the query to the request list before ib_nl_snd_msg.

However, if there is a delay in sending out the request (For
eg: Delay due to low memory situation) the timer to handle request timeout
might kick in before the request is sent out to ibacm via netlink.
ib_nl_request_timeout may release the query if it fails to send it to SA
as well causing a use after free situation.

Call Trace for the above race:

[] ? ib_pack+0x17b/0x240 [ib_core]
[] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
[] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
[] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
[] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
[rds_rdma]
[] rds_rdma_cm_event_handler_cmn+0x22c/0x850
[rds_rdma]
[] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
[] addr_handler+0x9e/0x140 [rdma_cm]
[] process_req+0x134/0x190 [ib_addr]
[] process_one_work+0x169/0x4a0
[] worker_thread+0x5b/0x560
[] ? flush_delayed_work+0x50/0x50
[] kthread+0xcb/0xf0
[] ? __schedule+0x24a/0x810
[] ? __schedule+0x24a/0x810
[] ? kthread_create_on_node+0x180/0x180
[] ret_from_fork+0x47/0x90
[] ? kthread_create_on_node+0x180/0x180

RIP  [] send_mad+0x33d/0x5d0 [ib_sa]

To resolve this issue, we introduce a new flag IB_SA_NL_QUERY_SENT.
This flag Indicates if the request has been sent out to ibacm yet.

If this flag is not set for a query and the query times out, we add it
back to the list with the original delay.

To handle the case where a response is received before we could set this
flag, the response handler waits for the flag to be
set before proceeding with the query.

Signed-off-by: Divya Indi 
---
 drivers/infiniband/core/sa_query.c | 53 ++
 1 file changed, 53 insertions(+)

diff --git a/drivers/infiniband/core/sa_query.c 
b/drivers/infiniband/core/sa_query.c
index 30d4c12..91486e7 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -59,6 +60,9 @@
 #define IB_SA_LOCAL_SVC_TIMEOUT_MAX20
 #define IB_SA_CPI_MAX_RETRY_CNT3
 #define IB_SA_CPI_RETRY_WAIT   1000 /*msecs */
+
+DECLARE_WAIT_QUEUE_HEAD(wait_queue);
+
 static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT;
 
 struct ib_sa_sm_ah {
@@ -123,6 +127,12 @@ struct ib_sa_query {
 #define IB_SA_CANCEL   0x0002
 #define IB_SA_QUERY_OPA0x0004
 
+/*
+ * This bit in the SA query flags indicates whether the query
+ * has been sent out to ibacm via netlink
+ */
+#define IB_SA_NL_QUERY_SENT3
+
 struct ib_sa_service_query {
void (*callback)(int, struct ib_sa_service_rec *, void *);
void *context;
@@ -746,6 +756,11 @@ static inline int ib_sa_query_cancelled(struct ib_sa_query 
*query)
return (query->flags & IB_SA_CANCEL);
 }
 
+static inline int ib_sa_nl_query_sent(struct ib_sa_query *query)
+{
+   return test_bit(IB_SA_NL_QUERY_SENT, (unsigned long *)>flags);
+}
+
 static void ib_nl_set_path_rec_attrs(struct sk_buff *skb,
 struct ib_sa_query *query)
 {
@@ -889,6 +904,15 @@ static int ib_nl_make_request(struct ib_sa_query *query, 
gfp_t gfp_mask)
spin_lock_irqsave(_nl_request_lock, flags);
list_del(>list);
spin_unlock_irqrestore(_nl_request_lock, flags);
+   } else {
+   set_bit(IB_SA_NL_QUERY_SENT, (unsigned long *)>flags);
+
+   /*
+* If response is received before this flag was set
+* someone is waiting to process the response and release the
+* query.
+*/
+   wake_up(_queue);
}
 
return ret;
@@ -994,6 +1018,21 @@ static void ib_nl_request_timeout(struct work_struct 
*work)
}
 
list_del(>list);
+
+   /*
+* If IB_SA_NL_QUERY_SENT is not set, this query has not been
+* sent to ibacm yet. Reset the timer.
+*/
+   if (!ib_sa_nl_query_sent(query)) {
+   delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
+   query->timeout = delay + jiffies;
+   list_add_tail(>list, _nl_request_list);
+   /* Start the timeout if this is the only request */
+   if (ib_nl_request_list.next == >list)
+   queue_delayed_work(ib_nl_wq, _nl_timed_work,
+   delay);
+   break;
+   }
ib_sa_disable_local_svc(query);
/* Hold the lock to protect against query cancellation */
if (ib_sa_query_cancel

Review Request

2020-05-14 Thread Divya Indi
[PATCH v2] IB/sa: Resolving use-after-free in ib_nl_send_msg.

Hi,

This is the v2 of the patch that addresses the comments received for v1 -

- Using atomic bit ops for setting and testing IB_SA_NL_QUERY_SENT.
- Rewording and adding comments.


Thanks,
Divya


Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.

2020-05-13 Thread Divya Indi
Hi Jason,

Please find my comments inline - 

On 5/13/20 8:00 AM, Jason Gunthorpe wrote:
> On Mon, May 11, 2020 at 02:26:30PM -0700, Divya Indi wrote:
>>>> @@ -1123,6 +1156,18 @@ int ib_nl_handle_resolve_resp(struct sk_buff *skb,
>>>>  
>>>>send_buf = query->mad_buf;
>>>>  
>>>> +  /*
>>>> +   * Make sure the IB_SA_NL_QUERY_SENT flag is set before
>>>> +   * processing this query. If flag is not set, query can be accessed in
>>>> +   * another context while setting the flag and processing the query will
>>>> +   * eventually release it causing a possible use-after-free.
>>>> +   */
>>> This comment doesn't really make sense, flags insige the memory being
>>> freed inherently can't prevent use after free.
>> I can definitely re-phrase here to make things clearer. But, the idea here is
>> in the unlikely/rare case where a response for a query comes in before the 
>> flag has been
>> set in ib_nl_make_request, we want to wait for the flag to be sent before 
>> proceeding. 
>> The response handler will eventually release the query so this wait avoids 
>> that if the flag has not been set
>> else 
>>  "query->flags |= IB_SA_NL_QUERY_SENT;" 
>> will be accessing a query which was freed due to the above mentioned race.
>>
>> It is unlikely since getting a response => We have actually sent out the 
>> query to ibacm.
>>
>> How about this -
>>
>> "Getting a response is indicative of having sent out the query, but in an 
>> unlikely race when 
>> the response comes in before setting IB_SA_NL_QUERY_SENT, we need to wait 
>> till the flag is set to
>> avoid accessing a query that has been released."
> It still makes no sense, a flag that is set before freeing the memory
> is fundamentally useless to prevent races.

Here the race is -

1. ib_nl_send_msg()
2. query->flags |= IB_SA_NL_QUERY_SENT
3. return;

-

response_handler() {
wait till flag is set.

kfree(query);

}

Here, if the response handler was called => Query was sent
and flag should have been set. However if response handler kicks in
before line 2, we want to wait and make sure the flag is set and
then free the query.

Ideally after ib_nl_send_msg, we should not be accessing the query
because we can be getting a response/timeout asynchronously and cant be
sure when. 

The only places we access a query after it is successfully sent [response 
handler getting called
=> sending was successful] -
1. ib_nl_request_timeout
2. While setting the flag.

1. is taken care of because the request list access is protected by a lock
and whoever gets the lock first deletes it from the request list and
hence we can only have the response handler OR the timeout handler operate on 
the
query.

2. To handle this is why we wait in the response handler. Once the flag is
set we are no longer accessing the query and hence safe to release it.

I have modified the comment for v2 as follows -

  /*
+* In case of a quick response ie when a response comes in before
+* setting IB_SA_NL_QUERY_SENT, we can have an unlikely race where the
+* response handler will release the query, but we can still access the
+* freed query while setting the flag.
+* Hence, before proceeding and eventually releasing the query -
+* wait till the flag is set. The flag should be set soon since getting
+* a response is indicative of having successfully sent the query.
+*/


Thanks,
Divya

>
> Jason


Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.

2020-05-11 Thread Divya Indi
Hi Hillf,

Please find my comments inline -

On 5/8/20 4:03 AM, Hillf Danton wrote:
> On Thu, 7 May 2020 12:36:29 Mark Bloch wrote:
>> On 5/7/2020 11:34, Divya Indi wrote:
>>> This patch fixes commit -
>>> commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list 
>>> before sending")'
>>>
>>> Above commit adds the query to the request list before ib_nl_snd_msg.
>>>
>>> However, if there is a delay in sending out the request (For
>>> eg: Delay due to low memory situation) the timer to handle request timeout
>>> might kick in before the request is sent out to ibacm via netlink.
>>> ib_nl_request_timeout may release the query if it fails to send it to SA
>>> as well causing a use after free situation.
>>>
>>> Call Trace for the above race:
>>>
>>> [] ? ib_pack+0x17b/0x240 [ib_core]
>>> [] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
>>> [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
>>> [] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
>>> [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
>>> [rds_rdma]
>>> [] rds_rdma_cm_event_handler_cmn+0x22c/0x850
>>> [rds_rdma]
>>> [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
>>> [] addr_handler+0x9e/0x140 [rdma_cm]
>>> [] process_req+0x134/0x190 [ib_addr]
>>> [] process_one_work+0x169/0x4a0
>>> [] worker_thread+0x5b/0x560
>>> [] ? flush_delayed_work+0x50/0x50
>>> [] kthread+0xcb/0xf0
>>> [] ? __schedule+0x24a/0x810
>>> [] ? __schedule+0x24a/0x810
>>> [] ? kthread_create_on_node+0x180/0x180
>>> [] ret_from_fork+0x47/0x90
>>> [] ? kthread_create_on_node+0x180/0x180
>>> 
>>> RIP  [] send_mad+0x33d/0x5d0 [ib_sa]
>>>
>>> To resolve this issue, we introduce a new flag IB_SA_NL_QUERY_SENT.
>>> This flag Indicates if the request has been sent out to ibacm yet.
>>>
>>> If this flag is not set for a query and the query times out, we add it
>>> back to the list with the original delay.
>>>
>>> To handle the case where a response is received before we could set this
>>> flag, the response handler waits for the flag to be
>>> set before proceeding with the query.
>>>
>>> Signed-off-by: Divya Indi 
>>> ---
>>>  drivers/infiniband/core/sa_query.c | 45 
>>> ++
>>>  1 file changed, 45 insertions(+)
>>>
>>> diff --git a/drivers/infiniband/core/sa_query.c 
>>> b/drivers/infiniband/core/sa_query.c
>>> index 30d4c12..ffbae2f 100644
>>> --- a/drivers/infiniband/core/sa_query.c
>>> +++ b/drivers/infiniband/core/sa_query.c
>>> @@ -59,6 +59,9 @@
>>>  #define IB_SA_LOCAL_SVC_TIMEOUT_MAX20
>>>  #define IB_SA_CPI_MAX_RETRY_CNT3
>>>  #define IB_SA_CPI_RETRY_WAIT   1000 /*msecs */
>>> +
>>> +DECLARE_WAIT_QUEUE_HEAD(wait_queue);
>>> +
>>>  static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT;
>>>  
>>>  struct ib_sa_sm_ah {
>>> @@ -122,6 +125,7 @@ struct ib_sa_query {
>>>  #define IB_SA_ENABLE_LOCAL_SERVICE 0x0001
>>>  #define IB_SA_CANCEL   0x0002
>>>  #define IB_SA_QUERY_OPA0x0004
>>> +#define IB_SA_NL_QUERY_SENT0x0008
>>>  
>>>  struct ib_sa_service_query {
>>> void (*callback)(int, struct ib_sa_service_rec *, void *);
>>> @@ -746,6 +750,11 @@ static inline int ib_sa_query_cancelled(struct 
>>> ib_sa_query *query)
>>> return (query->flags & IB_SA_CANCEL);
>>>  }
>>>  
>>> +static inline int ib_sa_nl_query_sent(struct ib_sa_query *query)
>>> +{
>>> +   return (query->flags & IB_SA_NL_QUERY_SENT);
>>> +}
>>> +
>>>  static void ib_nl_set_path_rec_attrs(struct sk_buff *skb,
>>>  struct ib_sa_query *query)
>>>  {
>>> @@ -889,6 +898,15 @@ static int ib_nl_make_request(struct ib_sa_query 
>>> *query, gfp_t gfp_mask)
>>> spin_lock_irqsave(_nl_request_lock, flags);
>>> list_del(>list);
>>> spin_unlock_irqrestore(_nl_request_lock, flags);
> Here it also needs to do wakeup as sleeper might come while sending, no
> matter the failure to send.

We only sleep in the response handler. If we could not send a query, then
response handler is not triggered and hence no one would be waiting

Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.

2020-05-11 Thread Divya Indi
Hi Jason,

On 5/7/20 5:08 PM, Jason Gunthorpe wrote:
> On Thu, May 07, 2020 at 11:34:47AM -0700, Divya Indi wrote:
>> This patch fixes commit -
>> commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list 
>> before sending")'
>>
>> Above commit adds the query to the request list before ib_nl_snd_msg.
>>
>> However, if there is a delay in sending out the request (For
>> eg: Delay due to low memory situation) the timer to handle request timeout
>> might kick in before the request is sent out to ibacm via netlink.
>> ib_nl_request_timeout may release the query if it fails to send it to SA
>> as well causing a use after free situation.
>>
>> Call Trace for the above race:
>>
>> [] ? ib_pack+0x17b/0x240 [ib_core]
>> [] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
>> [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
>> [] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
>> [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
>> [rds_rdma]
>> [] rds_rdma_cm_event_handler_cmn+0x22c/0x850
>> [rds_rdma]
>> [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
>> [] addr_handler+0x9e/0x140 [rdma_cm]
>> [] process_req+0x134/0x190 [ib_addr]
>> [] process_one_work+0x169/0x4a0
>> [] worker_thread+0x5b/0x560
>> [] ? flush_delayed_work+0x50/0x50
>> [] kthread+0xcb/0xf0
>> [] ? __schedule+0x24a/0x810
>> [] ? __schedule+0x24a/0x810
>> [] ? kthread_create_on_node+0x180/0x180
>> [] ret_from_fork+0x47/0x90
>> [] ? kthread_create_on_node+0x180/0x180
>> 
>> RIP  [] send_mad+0x33d/0x5d0 [ib_sa]
>>
>> To resolve this issue, we introduce a new flag IB_SA_NL_QUERY_SENT.
>> This flag Indicates if the request has been sent out to ibacm yet.
>>
>> If this flag is not set for a query and the query times out, we add it
>> back to the list with the original delay.
>>
>> To handle the case where a response is received before we could set this
>> flag, the response handler waits for the flag to be
>> set before proceeding with the query.
>>
>> Signed-off-by: Divya Indi 
>>  drivers/infiniband/core/sa_query.c | 45 
>> ++
>>  1 file changed, 45 insertions(+)
>>
>> diff --git a/drivers/infiniband/core/sa_query.c 
>> b/drivers/infiniband/core/sa_query.c
>> index 30d4c12..ffbae2f 100644
>> +++ b/drivers/infiniband/core/sa_query.c
>> @@ -59,6 +59,9 @@
>>  #define IB_SA_LOCAL_SVC_TIMEOUT_MAX 20
>>  #define IB_SA_CPI_MAX_RETRY_CNT 3
>>  #define IB_SA_CPI_RETRY_WAIT1000 /*msecs */
>> +
>> +DECLARE_WAIT_QUEUE_HEAD(wait_queue);
>> +
>>  static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT;
>>  
>>  struct ib_sa_sm_ah {
>> @@ -122,6 +125,7 @@ struct ib_sa_query {
>>  #define IB_SA_ENABLE_LOCAL_SERVICE  0x0001
>>  #define IB_SA_CANCEL0x0002
>>  #define IB_SA_QUERY_OPA 0x0004
>> +#define IB_SA_NL_QUERY_SENT 0x0008
>>  
>>  struct ib_sa_service_query {
>>  void (*callback)(int, struct ib_sa_service_rec *, void *);
>> @@ -746,6 +750,11 @@ static inline int ib_sa_query_cancelled(struct 
>> ib_sa_query *query)
>>  return (query->flags & IB_SA_CANCEL);
>>  }
>>  
>> +static inline int ib_sa_nl_query_sent(struct ib_sa_query *query)
>> +{
>> +return (query->flags & IB_SA_NL_QUERY_SENT);
>> +}
>> +
>>  static void ib_nl_set_path_rec_attrs(struct sk_buff *skb,
>>   struct ib_sa_query *query)
>>  {
>> @@ -889,6 +898,15 @@ static int ib_nl_make_request(struct ib_sa_query 
>> *query, gfp_t gfp_mask)
>>  spin_lock_irqsave(_nl_request_lock, flags);
>>  list_del(>list);
>>  spin_unlock_irqrestore(_nl_request_lock, flags);
>> +} else {
>> +query->flags |= IB_SA_NL_QUERY_SENT;
>> +
>> +/*
>> + * If response is received before this flag was set
>> + * someone is waiting to process the response and release the
>> + * query.
>> + */
>> +wake_up(_queue);
>>  }
>>  
>>  return ret;
>> @@ -994,6 +1012,21 @@ static void ib_nl_request_timeout(struct work_struct 
>> *work)
>>  }
>>  
>>  list_del(>list);
>> +
>> +/*
>> + * If IB_SA_NL_QUERY_SENT is not set, this query has not been
>> + * s

Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.

2020-05-11 Thread Divya Indi
Hi Mark,

Please find my comments inline -

On 5/7/20 2:40 PM, Mark Bloch wrote:
>
> On 5/7/2020 13:16, Wan, Kaike wrote:
>>
>>> -Original Message-
>>> From: Mark Bloch 
>>> Sent: Thursday, May 07, 2020 3:36 PM
>>> To: Divya Indi ; linux-kernel@vger.kernel.org; linux-
>>> r...@vger.kernel.org; Jason Gunthorpe ; Wan, Kaike
>>> 
>>> Cc: Gerd Rausch ; Håkon Bugge
>>> ; Srinivas Eeda ;
>>> Rama Nichanamatlu ; Doug Ledford
>>> 
>>> Subject: Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.
>>>
>>>
>>>> @@ -1123,6 +1156,18 @@ int ib_nl_handle_resolve_resp(struct sk_buff
>>>> *skb,
>>>>
>>>>send_buf = query->mad_buf;
>>>>
>>>> +  /*
>>>> +   * Make sure the IB_SA_NL_QUERY_SENT flag is set before
>>>> +   * processing this query. If flag is not set, query can be accessed in
>>>> +   * another context while setting the flag and processing the query
>>> will
>>>> +   * eventually release it causing a possible use-after-free.
>>>> +   */
>>>> +  if (unlikely(!ib_sa_nl_query_sent(query))) {
>>> Can't there be a race here where you check the flag (it isn't set) and 
>>> before
>>> you call wait_event() the flag is set and wake_up() is called which means 
>>> you
>>> will wait here forever?
>> Should wait_event() catch that? That is,  if the flag is not set, 
>> wait_event() will sleep until the flag is set.
>>
>>  or worse, a timeout will happen the query will be
>>> freed and them some other query will call wake_up() and we have again a
>>> use-after-free.
>> The request has been deleted from the request list by this time and 
>> therefore the timeout should have no impact here.
>>
>>
>>>> +  spin_unlock_irqrestore(_nl_request_lock, flags);
>>>> +  wait_event(wait_queue, ib_sa_nl_query_sent(query));
>>> What if there are two queries sent to userspace, shouldn't you check and
>>> make sure you got woken up by the right one setting the flag?
>> The wait_event() is conditioned on the specific query 
>> (ib_sa_nl_query_sent(query)), not on the wait_queue itself.
> Right, missed that this macro is expends into some inline code.
>
> Looking at the code a little more, I think this also fixes another issue.
> Lets say ib_nl_send_msg() returns an error but before we do the list_del() in
> ib_nl_make_request() there is also a timeout, so in ib_nl_request_timeout()
> we will do list_del() and then another one list_del() will be done in 
> ib_nl_make_request().
>
>>> Other than that, the entire solution makes it very complicated to reason 
>>> with
>>> (flags set/checked without locking etc) maybe we should just revert and fix 
>>> it
>>> the other way?
>> The flag could certainly be set under the lock, which may reduce 
>> complications.
> Anything that can help here with this.
> For me in ib_nl_make_request() the comment should also explain that not only 
> ib_nl_handle_resolve_resp()
> is waiting for the flag to be set but also ib_nl_request_timeout() and that a 
> timeout can't happen
> before the flag is set.

ib_nl_request_timeout() would re-queue the query to the request list if the 
flag is not set. 
However, makes sense! Noted, il add the comment in ib_nl_make_request to make 
things more clear.

Thanks,
Divya

> Mark
>  
>> Kaike
>> i


Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.

2020-05-11 Thread Divya Indi
Hi,

Thanks for taking the time to review. Please find my comments inline -

On 5/7/20 1:16 PM, Wan, Kaike wrote:
>
>> -Original Message-
>> From: Mark Bloch 
>> Sent: Thursday, May 07, 2020 3:36 PM
>> To: Divya Indi ; linux-kernel@vger.kernel.org; linux-
>> r...@vger.kernel.org; Jason Gunthorpe ; Wan, Kaike
>> 
>> Cc: Gerd Rausch ; Håkon Bugge
>> ; Srinivas Eeda ;
>> Rama Nichanamatlu ; Doug Ledford
>> 
>> Subject: Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.
>>
>>
>>> @@ -1123,6 +1156,18 @@ int ib_nl_handle_resolve_resp(struct sk_buff
>>> *skb,
>>>
>>> send_buf = query->mad_buf;
>>>
>>> +   /*
>>> +* Make sure the IB_SA_NL_QUERY_SENT flag is set before
>>> +* processing this query. If flag is not set, query can be accessed in
>>> +* another context while setting the flag and processing the query
>> will
>>> +* eventually release it causing a possible use-after-free.
>>> +*/
>>> +   if (unlikely(!ib_sa_nl_query_sent(query))) {
>> Can't there be a race here where you check the flag (it isn't set) and before
>> you call wait_event() the flag is set and wake_up() is called which means you
>> will wait here forever?
> Should wait_event() catch that? That is,  if the flag is not set, 
> wait_event() will sleep until the flag is set.
>
>  or worse, a timeout will happen the query will be
>> freed and them some other query will call wake_up() and we have again a
>> use-after-free.
> The request has been deleted from the request list by this time and therefore 
> the timeout should have no impact here.
>
>
>>> +   spin_unlock_irqrestore(_nl_request_lock, flags);
>>> +   wait_event(wait_queue, ib_sa_nl_query_sent(query));
>> What if there are two queries sent to userspace, shouldn't you check and
>> make sure you got woken up by the right one setting the flag?
> The wait_event() is conditioned on the specific query 
> (ib_sa_nl_query_sent(query)), not on the wait_queue itself.
>
>> Other than that, the entire solution makes it very complicated to reason with
>> (flags set/checked without locking etc) maybe we should just revert and fix 
>> it
>> the other way?
> The flag could certainly be set under the lock, which may reduce 
> complications. 

We could use a lock or use atomic operations. However, the reason for not doing 
so was that
we have 1 writer and multiple readers of the IB_SA_NL_QUERY_SENT flag and the 
readers 
wouldnt mind reading a stale value. 

Would it still be better to have a lock for this flag? 

Thanks,
Divya

>
> Kaike
>


[PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.

2020-05-07 Thread Divya Indi
This patch fixes commit -
commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before 
sending")'

Above commit adds the query to the request list before ib_nl_snd_msg.

However, if there is a delay in sending out the request (For
eg: Delay due to low memory situation) the timer to handle request timeout
might kick in before the request is sent out to ibacm via netlink.
ib_nl_request_timeout may release the query if it fails to send it to SA
as well causing a use after free situation.

Call Trace for the above race:

[] ? ib_pack+0x17b/0x240 [ib_core]
[] ib_sa_path_rec_get+0x181/0x200 [ib_sa]
[] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm]
[] ? cma_bind_port+0xa0/0xa0 [rdma_cm]
[] ? rds_rdma_cm_event_handler_cmn+0x850/0x850
[rds_rdma]
[] rds_rdma_cm_event_handler_cmn+0x22c/0x850
[rds_rdma]
[] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma]
[] addr_handler+0x9e/0x140 [rdma_cm]
[] process_req+0x134/0x190 [ib_addr]
[] process_one_work+0x169/0x4a0
[] worker_thread+0x5b/0x560
[] ? flush_delayed_work+0x50/0x50
[] kthread+0xcb/0xf0
[] ? __schedule+0x24a/0x810
[] ? __schedule+0x24a/0x810
[] ? kthread_create_on_node+0x180/0x180
[] ret_from_fork+0x47/0x90
[] ? kthread_create_on_node+0x180/0x180

RIP  [] send_mad+0x33d/0x5d0 [ib_sa]

To resolve this issue, we introduce a new flag IB_SA_NL_QUERY_SENT.
This flag Indicates if the request has been sent out to ibacm yet.

If this flag is not set for a query and the query times out, we add it
back to the list with the original delay.

To handle the case where a response is received before we could set this
flag, the response handler waits for the flag to be
set before proceeding with the query.

Signed-off-by: Divya Indi 
---
 drivers/infiniband/core/sa_query.c | 45 ++
 1 file changed, 45 insertions(+)

diff --git a/drivers/infiniband/core/sa_query.c 
b/drivers/infiniband/core/sa_query.c
index 30d4c12..ffbae2f 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -59,6 +59,9 @@
 #define IB_SA_LOCAL_SVC_TIMEOUT_MAX20
 #define IB_SA_CPI_MAX_RETRY_CNT3
 #define IB_SA_CPI_RETRY_WAIT   1000 /*msecs */
+
+DECLARE_WAIT_QUEUE_HEAD(wait_queue);
+
 static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT;
 
 struct ib_sa_sm_ah {
@@ -122,6 +125,7 @@ struct ib_sa_query {
 #define IB_SA_ENABLE_LOCAL_SERVICE 0x0001
 #define IB_SA_CANCEL   0x0002
 #define IB_SA_QUERY_OPA0x0004
+#define IB_SA_NL_QUERY_SENT0x0008
 
 struct ib_sa_service_query {
void (*callback)(int, struct ib_sa_service_rec *, void *);
@@ -746,6 +750,11 @@ static inline int ib_sa_query_cancelled(struct ib_sa_query 
*query)
return (query->flags & IB_SA_CANCEL);
 }
 
+static inline int ib_sa_nl_query_sent(struct ib_sa_query *query)
+{
+   return (query->flags & IB_SA_NL_QUERY_SENT);
+}
+
 static void ib_nl_set_path_rec_attrs(struct sk_buff *skb,
 struct ib_sa_query *query)
 {
@@ -889,6 +898,15 @@ static int ib_nl_make_request(struct ib_sa_query *query, 
gfp_t gfp_mask)
spin_lock_irqsave(_nl_request_lock, flags);
list_del(>list);
spin_unlock_irqrestore(_nl_request_lock, flags);
+   } else {
+   query->flags |= IB_SA_NL_QUERY_SENT;
+
+   /*
+* If response is received before this flag was set
+* someone is waiting to process the response and release the
+* query.
+*/
+   wake_up(_queue);
}
 
return ret;
@@ -994,6 +1012,21 @@ static void ib_nl_request_timeout(struct work_struct 
*work)
}
 
list_del(>list);
+
+   /*
+* If IB_SA_NL_QUERY_SENT is not set, this query has not been
+* sent to ibacm yet. Reset the timer.
+*/
+   if (!ib_sa_nl_query_sent(query)) {
+   delay = msecs_to_jiffies(sa_local_svc_timeout_ms);
+   query->timeout = delay + jiffies;
+   list_add_tail(>list, _nl_request_list);
+   /* Start the timeout if this is the only request */
+   if (ib_nl_request_list.next == >list)
+   queue_delayed_work(ib_nl_wq, _nl_timed_work,
+   delay);
+   break;
+   }
ib_sa_disable_local_svc(query);
/* Hold the lock to protect against query cancellation */
if (ib_sa_query_cancelled(query))
@@ -1123,6 +1156,18 @@ int ib_nl_handle_resolve_resp(struct sk_buff *skb,
 
send_buf = query->mad_buf;
 
+   /*
+* Make sure the IB_SA_NL_QUERY_SENT flag is set before
+* process

Resolving use-after-free in ib_nl_send_msg

2020-05-07 Thread Divya Indi
[PATCH] IB/sa: Resolving use-after-free in ib_nl_send_msg.

Hi,

This patch is in reply to -

https://lkml.org/lkml/2020/4/24/1076

We have a use-after-free possibility in the ibacm code path - 
when the timer(ib_nl_request_timeout) kicks in before ib_nl_snd_msg
has completed sending the query out to ibacm via netlink. The timeout 
handler ie ib_nl_request_timeout may result in releasing the query while 
ib_nl_snd_msg is still accessing query.

Since the issue appears to be specific to the ibacm code path, we are
trying to resolve it for the life cycle of sa_query in the ibacm code path.

Please review the proposed fix ie the patch that follows.

Would appreciate your thoughts and feedback on the same.

Let me know if you have any questions!

Thanks,
Divya


Re: Request for feedback : Possible use-after-free in routing SA query via netlink

2020-04-30 Thread Divya Indi
Hi Jason,

Thanks for taking the time to review. 

On 4/24/20 11:22 AM, Jason Gunthorpe wrote:
> On Fri, Apr 24, 2020 at 08:28:09AM -0700, Divya Indi wrote:
>
>> If we look at the query, it does not appear to be a valid ib_sa_query. 
>> Instead
>> looks like a pid struct for a process -> Use-after-free situation.
>>
>> We could simulate the crash by explicitly introducing a delay in 
>> ib_nl_snd_msg with
>> a sleep. The timer kicks in before ib_nl_send_msg has even sent out the 
>> request 
>> and releases the query. We could reproduce the crash with a similar stack 
>> trace.
>>
>> To summarize - We have a use-after-free possibility here when the 
>> timer(ib_nl_request_timeout)
>> kicks in before ib_nl_snd_msg has completed sending the query out to ibacm 
>> via netlink. The 
>> timeout handler ie ib_nl_request_timeout may result in releasing the query 
>> while ib_nl_snd_msg 
>> is still accessing query.
>>
>> Appreciate your thoughts on the above issue.
> Your assesment looks right to me.
>
> Fixing it will require being able to clearly explain what the lifetime
> cycle is for ib_sa_query - and what is there today looks like a mess,
> so I'm not sure what it should be changed into.

The issue reported here appears to be restricted to the ibacm code path. 
Hence, we tried to resolve it for the life cycle of sa_query in the ibacm code 
path.
I will follow up this email with a proposed fix(patch), it would be very 
helpful if
you can provide your feedback on the same.

Thanks,
Divya

>
> The
> re is lots of other stuff there that looks really weird, like
> ib_sa_cancel_query() keeps going even though there are still timers
> running..
>
> Jason


Re: [PATCH 5/5] tracing: New functions for kernel access to Ftrace instances

2019-10-16 Thread Divya Indi

Hi Steven,

Please find my comments inline -

On 10/15/19 4:19 PM, Steven Rostedt wrote:

On Wed, 14 Aug 2019 10:55:27 -0700
Divya Indi  wrote:


Adding 2 new functions -
1) trace_array_lookup : Look up and return a trace array, given its
name.
2) trace_array_set_clr_event : Enable/disable event recording to the
given trace array.

NOTE: trace_array_lookup returns a trace array and also increments the
reference counter associated with the returned trace array. Make sure to
call trace_array_put() once the use is done so that the instance can be
removed at a later time.

Example use:

tr = trace_array_lookup("foo-bar");

Should probably be renamed to: trace_array_get_by_name("foo-bar")

As the name should show that it adds a ref count.


Makes sense! Will make this change.



But if we make the change I suggested before, where we do not need to
do the put before calling the destroy, we should have:



if (!tr)

{

tr = trace_array_create("foo-bar");

trace_array_get(tr);


This would need a corresponding trace_array_put(?).
Additionally, this would again traverse the list of instances, So I
went with incrementing the ref ctr inside the function.


}



// Log to tr
// Enable/disable events to tr
trace_array_set_clr_event(tr, _THIS_IP,"system","event",1);

What's with the __THIS_IP?


My bad! Maybe remnants from a trace_array_printk copy-paste.
Will correct this.





// Done using tr
trace_array_put(tr);
..

Also, use trace_array_set_clr_event to enable/disable events to a trace array.
So now we no longer need to have ftrace_set_clr_event as an exported
API.

Signed-off-by: Divya Indi 
---
  include/linux/trace.h|  2 ++
  include/linux/trace_events.h |  3 ++-
  kernel/trace/trace.c | 28 
  kernel/trace/trace_events.c  | 23 ++-
  4 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index 2c782d5..05164bb 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -32,6 +32,8 @@ int trace_array_printk(struct trace_array *tr, unsigned long 
ip,
  struct trace_array *trace_array_create(const char *name);
  int trace_array_destroy(struct trace_array *tr);
  void trace_array_put(struct trace_array *tr);
+struct trace_array *trace_array_lookup(const char *name);
+
  #endif/* CONFIG_TRACING */
  
  #endif	/* _LINUX_TRACE_H */

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8a62731..05a7514 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -540,7 +540,8 @@ extern int trace_define_field(struct trace_event_call 
*call, const char *type,
  #define is_signed_type(type)  (((type)(-1)) < (type)1)
  
  int trace_set_clr_event(const char *system, const char *event, int set);

-
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set);
  /*
   * The double __builtin_constant_p is because gcc will give us an error
   * if we try to allocate the static variable to fmt if it is not a
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7b6a37a..e394d55 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8514,6 +8514,34 @@ static int instance_rmdir(const char *name)
return ret;
  }
  
+/**

+ * trace_array_lookup - Lookup the trace array, given its name.
+ * @name: The name of the trace array to be looked up.
+ *
+ * Lookup and return the trace array associated with @name.
+ *
+ * NOTE: The reference counter associated with the returned trace array is
+ * incremented to ensure it is not freed when in use. Make sure to call
+ * "trace_array_put" for this trace array when its use is done.
+ */
+struct trace_array *trace_array_lookup(const char *name)
+{
+   struct trace_array *tr;
+
+   mutex_lock(_types_lock);
+
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (tr->name && strcmp(tr->name, name) == 0) {
+   tr->ref++;
+   mutex_unlock(_types_lock);
+   return tr;
+   }
+   }
+   mutex_unlock(_types_lock);
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(trace_array_lookup);
+
  static __init void create_trace_instances(struct dentry *d_tracer)
  {
trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 2621995..96dd997 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -834,7 +834,6 @@ static int ftrace_set_clr_event(struct trace_array *tr, 
char *buf, int set)
  
  	return ret;

  }
-EXPORT_SYMBOL_GPL(ftrace_set_clr_event);
  
  /**

   * trace_set_clr_event - enable or disable an event
@@ -859,6 +858,28 @@ int trace_set_clr_event(const char *system, const char 
*event, int set)
  }
  EXPORT_SYMBOL_GPL

Re: [PATCH 4/5] tracing: Handle the trace array ref counter in new functions

2019-10-16 Thread Divya Indi

Hi Steve,

Thanks again for taking the time to review and providing feedback. Please find 
my comments inline.

On 10/15/19 4:04 PM, Steven Rostedt wrote:

Sorry for taking so long to getting to these patches.

On Wed, 14 Aug 2019 10:55:26 -0700
Divya Indi  wrote:


For functions returning a trace array Eg: trace_array_create(), we need to
increment the reference counter associated with the trace array to ensure it
does not get freed when in use.

Once we are done using the trace array, we need to call
trace_array_put() to make sure we are not holding a reference to it
anymore and the instance/trace array can be removed when required.

I think it would be more in line with other parts of the kernel if we
don't need to do the trace_array_put() before calling
trace_array_destroy().


The reason we went with this approach is

instance_mkdir -  ref_ctr = 0  // Does not return a trace array ptr.
trace_array_create -  ref_ctr = 1  // Since this returns a trace array ptr.
trace_array_lookup -  ref_ctr = 1  // Since this returns a trace array ptr.

if we make trace_array_destroy to expect ref_ctr to be 1, we risk destroying 
the trace array while in use.

We could make it -

instance_mkdir -ref_ctr = 1
trace_array_create -ref_ctr = 2
trace_array_lookup -ref_ctr = 2+  // depending on no of lookups

but, we'd still need the trace_array_put() (?)

We can also have one function doing create (if does not exist) or lookup (if 
exists), but that would require
some redundant code since instance_mkdir needs to return -EXIST when a trace 
array already exists.

Let me know your thoughts on this.

Thanks,
Divya


That is, if we make the following change:

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5ff206ce259e..ae12aac21c6c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8527,9 +8527,11 @@ static int __remove_instance(struct trace_array *tr)
  {
int i;
  
-	if (tr->ref || (tr->current_trace && tr->current_trace->ref))

+   if (tr->ref > 1 || (tr->current_trace && tr->current_trace->ref))
return -EBUSY;
  
+	WARN_ON_ONCE(tr->ref != 1);

+
list_del(>list);
  
  	/* Disable all the flags that were enabled coming in */



Hence, additionally exporting trace_array_put().

Example use:

tr = trace_array_create("foo-bar");
// Use this trace array
// Log to this trace array or enable/disable events to this trace array.


// tr no longer required
trace_array_put();

Signed-off-by: Divya Indi 
---
  include/linux/trace.h |  1 +
  kernel/trace/trace.c  | 41 -
  2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index 24fcf07..2c782d5 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -31,6 +31,7 @@ int trace_array_printk(struct trace_array *tr, unsigned long 
ip,
const char *fmt, ...);
  struct trace_array *trace_array_create(const char *name);
  int trace_array_destroy(struct trace_array *tr);
+void trace_array_put(struct trace_array *tr);
  #endif/* CONFIG_TRACING */
  
  #endif	/* _LINUX_TRACE_H */

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 22bf166..7b6a37a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -297,12 +297,22 @@ static void __trace_array_put(struct trace_array *this_tr)
this_tr->ref--;
  }
  
+/**

+ * trace_array_put - Decrement reference counter for the given trace array.
+ * @tr: Trace array for which reference counter needs to decrement.
+ *
+ * NOTE: Functions like trace_array_create increment the reference counter for
+ * the trace array to ensure they do not get freed while in use. Make sure to
+ * call trace_array_put() when the use is done. This will ensure that the
+ * instance can be later removed.
+ */
  void trace_array_put(struct trace_array *this_tr)
  {
mutex_lock(_types_lock);
__trace_array_put(this_tr);
mutex_unlock(_types_lock);
  }
+EXPORT_SYMBOL_GPL(trace_array_put);
  
  int call_filter_check_discard(struct trace_event_call *call, void *rec,

  struct ring_buffer *buffer,
@@ -8302,6 +8312,16 @@ static void update_tracer_options(struct trace_array *tr)
mutex_unlock(_types_lock);
  }
  
+/**

+ * trace_array_create - Create a new trace array with the given name.
+ * @name: The name of the trace array to be created.
+ *
+ * Create and return a trace array with given name if it does not exist.
+ *
+ * NOTE: The reference counter associated with the returned trace array is
+ * incremented to ensure it is not freed when in use. Make sure to call
+ * "trace_array_put" for this trace array when its use is done.
+ */
  struct trace_array *trace_array_create(const char *name)
  {
struct trace_array *tr;
@@ -8364,6 +8384,8 @@ struct trace_array *trace_array_cre

Re: [PATCH] tracing: Sample module to demonstrate kernel access to Ftrace instances.

2019-10-15 Thread Divya Indi

[Apologies for multiple resends, formatting issue]


Hi Steven,


Thanks for taking a look.

This sample module is dependent on -

- commit: f45d122 tracing: Kernel access to Ftrace instances
- Patches pending review: 
https://lore.kernel.org/lkml/1565805327-579-1-git-send-email-divya.i...@oracle.com/

I mentioned it in the cover page, but will add to this patch as well to avoid 
confusion.

So, I think the messages you are seeing is due to the absence of

Patches pending review: 
https://lore.kernel.org/lkml/1565805327-579-1-git-send-email-divya.i...@oracle.com/


Let me know if you prefer to have the sample module and this patch-set (pending 
review) as part of one patch set.

For reference, adding the cover page details here -

--
This patch is for a sample module to demonstrate the use of APIs that
were introduced/exported in order to access Ftrace instances from within the 
kernel.

Please Note: This module is dependent on -
- commit: f45d122 tracing: Kernel access to Ftrace instances
- Patches pending review: 
https://lore.kernel.org/lkml/1565805327-579-1-git-send-email-divya.i...@oracle.com/

The sample module creates/lookup a trace array called sample-instance on module 
load time.
We then start a kernel thread(simple-thread) to -
1) Enable tracing for event "sample_event" to buffer associated with the trace array - 
"sample-instance".
2) Start a timer that will disable tracing to this buffer after 5 sec. (Tracing 
disabled after 5 sec ie at count=4)
3) Write to the buffer using trace_array_printk()
4) Stop the kernel thread and destroy the buffer during module unload.

A sample output for the same -

# tracer: nop
#
# entries-in-buffer/entries-written: 16/16   #P:4
#
#  _-=> irqs-off
# / _=> need-resched
#| / _---=> hardirq/softirq
#|| / _--=> preempt-depth
#||| / delay
#   TASK-PID   CPU#  TIMESTAMP  FUNCTION
#  | |   |      | |
 sample-instance-26797 [003]  955180.489833: simple_thread: 
trace_array_printk: count=0
 sample-instance-26797 [003]  955180.489836: sample_event: count value=0 at 
jiffies=5249940864
 sample-instance-26797 [003]  955181.513722: simple_thread: 
trace_array_printk: count=1
 sample-instance-26797 [003]  955181.513724: sample_event: count value=1 at 
jiffies=5249941888
 sample-instance-26797 [003]  955182.537629: simple_thread: 
trace_array_printk: count=2
 sample-instance-26797 [003]  955182.537631: sample_event: count value=2 at 
jiffies=5249942912
 sample-instance-26797 [003]  955183.561516: simple_thread: 
trace_array_printk: count=3
 sample-instance-26797 [003]  955183.561518: sample_event: count value=3 at 
jiffies=5249943936
 sample-instance-26797 [003]  955184.585423: simple_thread: 
trace_array_printk: count=4
 sample-instance-26797 [003]  955184.585427: sample_event: count value=4 at 
jiffies=5249944960
 sample-instance-26797 [003]  955185.609344: simple_thread: 
trace_array_printk: count=5
 sample-instance-26797 [003]  955186.633241: simple_thread: 
trace_array_printk: count=6
 sample-instance-26797 [003]  955187.657157: simple_thread: 
trace_array_printk: count=7
 sample-instance-26797 [003]  955188.681039: simple_thread: 
trace_array_printk: count=8
 sample-instance-26797 [003]  955189.704937: simple_thread: 
trace_array_printk: count=9
 sample-instance-26797 [003]  955190.728840: simple_thread: 
trace_array_printk: count=10




Thanks,
Divya

On 10/15/19 10:05 AM, Steven Rostedt wrote:

On Fri, 20 Sep 2019 16:59:26 -0700
Divya Indi  wrote:


This is a sample module to demostrate the use of the newly introduced and
exported APIs to access Ftrace instances from within the kernel.

Newly introduced APIs used here -

1. Create a new trace array if it does not exist.
struct trace_array *trace_array_create(const char *name)

2. Destroy/Remove a trace array.
int trace_array_destroy(struct trace_array *tr)

3. Lookup a trace array, given its name.
struct trace_array *trace_array_lookup(const char *name)

4. Enable/Disable trace events:
int trace_array_set_clr_event(struct trace_array *tr, const char *system,
 const char *event, int set);

Exported APIs -
1. trace_printk equivalent for instances.
int trace_array_printk(struct trace_array *tr,
unsigned long ip, const char *fmt, ...);

2. Helper function.
void trace_printk_init_buffers(void);

3. To decrement the reference counter.
void trace_array_put(struct trace_array *tr)

Signed-off-by: Divya Indi 
Reviewed-by: Manjunath Patil 
Reviewed-by: Joe Jin 
---
  samples/Kconfig   

[RFC]Sample module for Kernel access to Ftrace instances.

2019-09-20 Thread Divya Indi
[PATCH] tracing: Sample module to demonstrate kernel access to Ftrace

Hi,

This patch is for a sample module to demonstrate the use of APIs that 
were introduced/exported in order to access Ftrace instances from within the 
kernel.

Please Note: This module is dependent on -
- commit: f45d122 tracing: Kernel access to Ftrace instances
- Patches pending review: 
https://lore.kernel.org/lkml/1565805327-579-1-git-send-email-divya.i...@oracle.com/

The sample module creates/lookup a trace array called sample-instance on module 
load time. 
We then start a kernel thread(simple-thread) to -
1) Enable tracing for event "sample_event" to buffer associated with the trace 
array - "sample-instance".
2) Start a timer that will disable tracing to this buffer after 5 sec. (Tracing 
disabled after 5 sec ie at count=4)
3) Write to the buffer using trace_array_printk()
4) Stop the kernel thread and destroy the buffer during module unload.

A sample output for the same -

# tracer: nop
#
# entries-in-buffer/entries-written: 16/16   #P:4
#
#  _-=> irqs-off
# / _=> need-resched
#| / _---=> hardirq/softirq
#|| / _--=> preempt-depth
#||| / delay
#   TASK-PID   CPU#  TIMESTAMP  FUNCTION
#  | |   |      | |
 sample-instance-26797 [003]  955180.489833: simple_thread: 
trace_array_printk: count=0
 sample-instance-26797 [003]  955180.489836: sample_event: count value=0 at 
jiffies=5249940864
 sample-instance-26797 [003]  955181.513722: simple_thread: 
trace_array_printk: count=1
 sample-instance-26797 [003]  955181.513724: sample_event: count value=1 at 
jiffies=5249941888
 sample-instance-26797 [003]  955182.537629: simple_thread: 
trace_array_printk: count=2
 sample-instance-26797 [003]  955182.537631: sample_event: count value=2 at 
jiffies=5249942912
 sample-instance-26797 [003]  955183.561516: simple_thread: 
trace_array_printk: count=3
 sample-instance-26797 [003]  955183.561518: sample_event: count value=3 at 
jiffies=5249943936
 sample-instance-26797 [003]  955184.585423: simple_thread: 
trace_array_printk: count=4
 sample-instance-26797 [003]  955184.585427: sample_event: count value=4 at 
jiffies=5249944960
 sample-instance-26797 [003]  955185.609344: simple_thread: 
trace_array_printk: count=5
 sample-instance-26797 [003]  955186.633241: simple_thread: 
trace_array_printk: count=6
 sample-instance-26797 [003]  955187.657157: simple_thread: 
trace_array_printk: count=7
 sample-instance-26797 [003]  955188.681039: simple_thread: 
trace_array_printk: count=8
 sample-instance-26797 [003]  955189.704937: simple_thread: 
trace_array_printk: count=9
 sample-instance-26797 [003]  955190.728840: simple_thread: 
trace_array_printk: count=10

Let me know if you have any questions.

Thanks,
Divya


[PATCH] tracing: Sample module to demonstrate kernel access to Ftrace instances.

2019-09-20 Thread Divya Indi
This is a sample module to demostrate the use of the newly introduced and
exported APIs to access Ftrace instances from within the kernel.

Newly introduced APIs used here -

1. Create a new trace array if it does not exist.
struct trace_array *trace_array_create(const char *name)

2. Destroy/Remove a trace array.
int trace_array_destroy(struct trace_array *tr)

3. Lookup a trace array, given its name.
struct trace_array *trace_array_lookup(const char *name)

4. Enable/Disable trace events:
int trace_array_set_clr_event(struct trace_array *tr, const char *system,
const char *event, int set);

Exported APIs -
1. trace_printk equivalent for instances.
int trace_array_printk(struct trace_array *tr,
   unsigned long ip, const char *fmt, ...);

2. Helper function.
void trace_printk_init_buffers(void);

3. To decrement the reference counter.
void trace_array_put(struct trace_array *tr)

Signed-off-by: Divya Indi 
Reviewed-by: Manjunath Patil 
Reviewed-by: Joe Jin 
---
 samples/Kconfig  |   7 ++
 samples/Makefile |   1 +
 samples/ftrace_instance/Makefile |   6 ++
 samples/ftrace_instance/sample-trace-array.c | 134 +++
 samples/ftrace_instance/sample-trace-array.h |  84 +
 5 files changed, 232 insertions(+)
 create mode 100644 samples/ftrace_instance/Makefile
 create mode 100644 samples/ftrace_instance/sample-trace-array.c
 create mode 100644 samples/ftrace_instance/sample-trace-array.h

diff --git a/samples/Kconfig b/samples/Kconfig
index d63cc8a..1c7864b 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -20,6 +20,13 @@ config SAMPLE_TRACE_PRINTK
 This builds a module that calls trace_printk() and can be used to
 test various trace_printk() calls from a module.
 
+config SAMPLE_TRACE_ARRAY
+tristate "Build sample module for kernel access to Ftrace instancess"
+   depends on EVENT_TRACING && m
+   help
+This builds a module that demonstrates the use of various APIs to
+access Ftrace instances from within the kernel.
+
 config SAMPLE_KOBJECT
tristate "Build kobject examples"
help
diff --git a/samples/Makefile b/samples/Makefile
index debf892..02c444e 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_SAMPLE_RPMSG_CLIENT) += rpmsg/
 subdir-$(CONFIG_SAMPLE_SECCOMP)+= seccomp
 obj-$(CONFIG_SAMPLE_TRACE_EVENTS)  += trace_events/
 obj-$(CONFIG_SAMPLE_TRACE_PRINTK)  += trace_printk/
+obj-$(CONFIG_SAMPLE_TRACE_ARRAY)   += ftrace_instance/
 obj-$(CONFIG_VIDEO_PCI_SKELETON)   += v4l/
 obj-y  += vfio-mdev/
 subdir-$(CONFIG_SAMPLE_VFS)+= vfs
diff --git a/samples/ftrace_instance/Makefile b/samples/ftrace_instance/Makefile
new file mode 100644
index 000..3603b13
--- /dev/null
+++ b/samples/ftrace_instance/Makefile
@@ -0,0 +1,6 @@
+# Builds a module that calls various routines to access Ftrace instances.
+# To use(as root):  insmod sample-trace-array.ko
+
+CFLAGS_sample-trace-array.o := -I$(src)
+
+obj-$(CONFIG_SAMPLE_TRACE_ARRAY) += sample-trace-array.o
diff --git a/samples/ftrace_instance/sample-trace-array.c 
b/samples/ftrace_instance/sample-trace-array.c
new file mode 100644
index 000..0595bc7
--- /dev/null
+++ b/samples/ftrace_instance/sample-trace-array.c
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Any file that uses trace points, must include the header.
+ * But only one file, must include the header by defining
+ * CREATE_TRACE_POINTS first.  This will make the C code that
+ * creates the handles for the trace points.
+ */
+#define CREATE_TRACE_POINTS
+#include "sample-trace-array.h"
+
+struct trace_array *tr;
+static void mytimer_handler(struct timer_list *unused);
+static struct task_struct *simple_tsk;
+
+/*
+ * mytimer: Timer setup to disable tracing for event "sample_event". This
+ * timer is only for the purposes of the sample module to demonstrate access of
+ * Ftrace instances from within kernel.
+ */
+static DEFINE_TIMER(mytimer, mytimer_handler);
+
+static void mytimer_handler(struct timer_list *unused)
+{
+   /*
+* Disable tracing for event "sample_event".
+*/
+   trace_array_set_clr_event(tr, "sample-subsystem", "sample_event", 0);
+}
+
+static void simple_thread_func(int count)
+{
+   set_current_state(TASK_INTERRUPTIBLE);
+   schedule_timeout(HZ);
+
+   /*
+* Printing count value using trace_array_printk() - trace_printk()
+* equivalent for the instance buffers.
+*/
+   trace_array_printk(tr, _THIS_IP_, "trace_array_printk: count=%d\n",
+   count);
+   /*
+* Tracepoint for event &

[PATCH 4/5] tracing: Handle the trace array ref counter in new functions

2019-08-14 Thread Divya Indi
For functions returning a trace array Eg: trace_array_create(), we need to
increment the reference counter associated with the trace array to ensure it
does not get freed when in use.

Once we are done using the trace array, we need to call
trace_array_put() to make sure we are not holding a reference to it
anymore and the instance/trace array can be removed when required.

Hence, additionally exporting trace_array_put().

Example use:

tr = trace_array_create("foo-bar");
// Use this trace array
// Log to this trace array or enable/disable events to this trace array.


// tr no longer required
trace_array_put();

Signed-off-by: Divya Indi 
---
 include/linux/trace.h |  1 +
 kernel/trace/trace.c  | 41 -
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index 24fcf07..2c782d5 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -31,6 +31,7 @@ int trace_array_printk(struct trace_array *tr, unsigned long 
ip,
const char *fmt, ...);
 struct trace_array *trace_array_create(const char *name);
 int trace_array_destroy(struct trace_array *tr);
+void trace_array_put(struct trace_array *tr);
 #endif /* CONFIG_TRACING */
 
 #endif /* _LINUX_TRACE_H */
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 22bf166..7b6a37a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -297,12 +297,22 @@ static void __trace_array_put(struct trace_array *this_tr)
this_tr->ref--;
 }
 
+/**
+ * trace_array_put - Decrement reference counter for the given trace array.
+ * @tr: Trace array for which reference counter needs to decrement.
+ *
+ * NOTE: Functions like trace_array_create increment the reference counter for
+ * the trace array to ensure they do not get freed while in use. Make sure to
+ * call trace_array_put() when the use is done. This will ensure that the
+ * instance can be later removed.
+ */
 void trace_array_put(struct trace_array *this_tr)
 {
mutex_lock(_types_lock);
__trace_array_put(this_tr);
mutex_unlock(_types_lock);
 }
+EXPORT_SYMBOL_GPL(trace_array_put);
 
 int call_filter_check_discard(struct trace_event_call *call, void *rec,
  struct ring_buffer *buffer,
@@ -8302,6 +8312,16 @@ static void update_tracer_options(struct trace_array *tr)
mutex_unlock(_types_lock);
 }
 
+/**
+ * trace_array_create - Create a new trace array with the given name.
+ * @name: The name of the trace array to be created.
+ *
+ * Create and return a trace array with given name if it does not exist.
+ *
+ * NOTE: The reference counter associated with the returned trace array is
+ * incremented to ensure it is not freed when in use. Make sure to call
+ * "trace_array_put" for this trace array when its use is done.
+ */
 struct trace_array *trace_array_create(const char *name)
 {
struct trace_array *tr;
@@ -8364,6 +8384,8 @@ struct trace_array *trace_array_create(const char *name)
 
list_add(>list, _trace_arrays);
 
+   tr->ref++;
+
mutex_unlock(_types_lock);
mutex_unlock(_mutex);
 
@@ -8385,7 +8407,19 @@ struct trace_array *trace_array_create(const char *name)
 
 static int instance_mkdir(const char *name)
 {
-   return PTR_ERR_OR_ZERO(trace_array_create(name));
+   struct trace_array *tr;
+
+   tr = trace_array_create(name);
+   if (IS_ERR(tr))
+   return PTR_ERR(tr);
+
+   /* This function does not return a reference to the created trace array,
+* so the reference counter is to be 0 when it returns.
+* trace_array_create increments the ref counter, decrement it here
+* by calling trace_array_put() */
+   trace_array_put(tr);
+
+   return 0;
 }
 
 static int __remove_instance(struct trace_array *tr)
@@ -8424,6 +8458,11 @@ static int __remove_instance(struct trace_array *tr)
return 0;
 }
 
+/*
+ * NOTE: An instance cannot be removed if there is still a reference to it.
+ * Make sure to call "trace_array_put" for a trace array returned by functions
+ * like trace_array_create(), otherwise trace_array_destroy will not succeed.
+ */
 int trace_array_destroy(struct trace_array *this_tr)
 {
struct trace_array *tr;
-- 
1.8.3.1



[PATCH 1/5] tracing: Declare newly exported APIs in include/linux/trace.h

2019-08-14 Thread Divya Indi
Declare the newly introduced and exported APIs in the header file -
include/linux/trace.h. Moving previous declarations from
kernel/trace/trace.h to include/linux/trace.h.

Signed-off-by: Divya Indi 
---
 include/linux/trace.h | 7 +++
 kernel/trace/trace.h  | 4 +---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index b95ffb2..24fcf07 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -24,6 +24,13 @@ struct trace_export {
 int register_ftrace_export(struct trace_export *export);
 int unregister_ftrace_export(struct trace_export *export);
 
+struct trace_array;
+
+void trace_printk_init_buffers(void);
+int trace_array_printk(struct trace_array *tr, unsigned long ip,
+   const char *fmt, ...);
+struct trace_array *trace_array_create(const char *name);
+int trace_array_destroy(struct trace_array *tr);
 #endif /* CONFIG_TRACING */
 
 #endif /* _LINUX_TRACE_H */
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 005f086..66ff63e 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -852,8 +853,6 @@ extern int trace_selftest_startup_branch(struct tracer 
*trace,
 extern int
 trace_array_vprintk(struct trace_array *tr,
unsigned long ip, const char *fmt, va_list args);
-int trace_array_printk(struct trace_array *tr,
-  unsigned long ip, const char *fmt, ...);
 int trace_array_printk_buf(struct ring_buffer *buffer,
   unsigned long ip, const char *fmt, ...);
 void trace_printk_seq(struct trace_seq *s);
@@ -1869,7 +1868,6 @@ extern int trace_event_enable_disable(struct 
trace_event_file *file,
 extern const char *__stop___tracepoint_str[];
 
 void trace_printk_control(bool enabled);
-void trace_printk_init_buffers(void);
 void trace_printk_start_comm(void);
 int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set);
 int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled);
-- 
1.8.3.1



[PATCH 0/5 v4]Kernel Access to Ftrace instances.

2019-08-14 Thread Divya Indi
In addition to patches introduced by commit f45d1225adb0 "tracing: Kernel
access to Ftrace instances") we also need the following patches to reliably
access ftrace instances from other kernel modules or components.

This version addresses the review comments/suggestions received for v3.

Please review the patches that follow.

Divya Indi (5):
  tracing: Declare newly exported APIs in include/linux/trace.h
  tracing: Verify if trace array exists before destroying it.
  tracing: Adding NULL checks
  tracing: Handle the trace array ref counter in new functions
  tracing: New functions for kernel access to Ftrace instances

 include/linux/trace.h| 10 +
 include/linux/trace_events.h |  3 +-
 kernel/trace/trace.c | 88 ++--
 kernel/trace/trace.h |  4 +-
 kernel/trace/trace_events.c  | 25 -
 5 files changed, 121 insertions(+), 9 deletions(-)

-- 
1.8.3.1



[PATCH 5/5] tracing: New functions for kernel access to Ftrace instances

2019-08-14 Thread Divya Indi
Adding 2 new functions -
1) trace_array_lookup : Look up and return a trace array, given its
name.
2) trace_array_set_clr_event : Enable/disable event recording to the
given trace array.

NOTE: trace_array_lookup returns a trace array and also increments the
reference counter associated with the returned trace array. Make sure to
call trace_array_put() once the use is done so that the instance can be
removed at a later time.

Example use:

tr = trace_array_lookup("foo-bar");
if (!tr)
tr = trace_array_create("foo-bar");
// Log to tr
// Enable/disable events to tr
trace_array_set_clr_event(tr, _THIS_IP,"system","event",1);

// Done using tr
trace_array_put(tr);
..

Also, use trace_array_set_clr_event to enable/disable events to a trace array.
So now we no longer need to have ftrace_set_clr_event as an exported
API.

Signed-off-by: Divya Indi 
---
 include/linux/trace.h|  2 ++
 include/linux/trace_events.h |  3 ++-
 kernel/trace/trace.c | 28 
 kernel/trace/trace_events.c  | 23 ++-
 4 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index 2c782d5..05164bb 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -32,6 +32,8 @@ int trace_array_printk(struct trace_array *tr, unsigned long 
ip,
 struct trace_array *trace_array_create(const char *name);
 int trace_array_destroy(struct trace_array *tr);
 void trace_array_put(struct trace_array *tr);
+struct trace_array *trace_array_lookup(const char *name);
+
 #endif /* CONFIG_TRACING */
 
 #endif /* _LINUX_TRACE_H */
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8a62731..05a7514 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -540,7 +540,8 @@ extern int trace_define_field(struct trace_event_call 
*call, const char *type,
 #define is_signed_type(type)   (((type)(-1)) < (type)1)
 
 int trace_set_clr_event(const char *system, const char *event, int set);
-
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set);
 /*
  * The double __builtin_constant_p is because gcc will give us an error
  * if we try to allocate the static variable to fmt if it is not a
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7b6a37a..e394d55 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8514,6 +8514,34 @@ static int instance_rmdir(const char *name)
return ret;
 }
 
+/**
+ * trace_array_lookup - Lookup the trace array, given its name.
+ * @name: The name of the trace array to be looked up.
+ *
+ * Lookup and return the trace array associated with @name.
+ *
+ * NOTE: The reference counter associated with the returned trace array is
+ * incremented to ensure it is not freed when in use. Make sure to call
+ * "trace_array_put" for this trace array when its use is done.
+ */
+struct trace_array *trace_array_lookup(const char *name)
+{
+   struct trace_array *tr;
+
+   mutex_lock(_types_lock);
+
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (tr->name && strcmp(tr->name, name) == 0) {
+   tr->ref++;
+   mutex_unlock(_types_lock);
+   return tr;
+   }
+   }
+   mutex_unlock(_types_lock);
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(trace_array_lookup);
+
 static __init void create_trace_instances(struct dentry *d_tracer)
 {
trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 2621995..96dd997 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -834,7 +834,6 @@ static int ftrace_set_clr_event(struct trace_array *tr, 
char *buf, int set)
 
return ret;
 }
-EXPORT_SYMBOL_GPL(ftrace_set_clr_event);
 
 /**
  * trace_set_clr_event - enable or disable an event
@@ -859,6 +858,28 @@ int trace_set_clr_event(const char *system, const char 
*event, int set)
 }
 EXPORT_SYMBOL_GPL(trace_set_clr_event);
 
+/**
+ * trace_array_set_clr_event - enable or disable an event for a trace array
+ * @system: system name to match (NULL for any system)
+ * @event: event name to match (NULL for all events, within system)
+ * @set: 1 to enable, 0 to disable
+ *
+ * This is a way for other parts of the kernel to enable or disable
+ * event recording to instances.
+ *
+ * Returns 0 on success, -EINVAL if the parameters do not match any
+ * registered events.
+ */
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set)
+{
+   if (!tr)
+   return -ENOENT;
+
+   return __ftrace_set_clr_event(tr, NULL, system, event, set);
+}
+EXPORT_SYMBOL_GPL(trace_array_set_clr_event);
+
 /* 128 should be much more than enough */
 #define EVENT_BUF_SIZE 127
 
-- 
1.8.3.1



[PATCH 2/5] tracing: Verify if trace array exists before destroying it.

2019-08-14 Thread Divya Indi
A trace array can be destroyed from userspace or kernel. Verify if the
trace exists before proceeding to destroy/remove it.

Signed-off-by: Divya Indi 
---
 kernel/trace/trace.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 1c80521..468ecc5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8421,17 +8421,27 @@ static int __remove_instance(struct trace_array *tr)
return 0;
 }
 
-int trace_array_destroy(struct trace_array *tr)
+int trace_array_destroy(struct trace_array *this_tr)
 {
+   struct trace_array *tr;
int ret;
 
-   if (!tr)
+   if (!this_tr) {
return -EINVAL;
+   }
 
mutex_lock(_mutex);
mutex_lock(_types_lock);
 
-   ret = __remove_instance(tr);
+   ret = -ENODEV;
+
+   /* Making sure trace array exists before destroying it. */
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (tr == this_tr) {
+   ret = __remove_instance(tr);
+   break;
+   }
+   }
 
mutex_unlock(_types_lock);
mutex_unlock(_mutex);
-- 
1.8.3.1



[PATCH 3/5] tracing: Adding NULL checks

2019-08-14 Thread Divya Indi
As part of commit f45d1225adb0 ("tracing: Kernel access to Ftrace
instances") we exported certain functions. Here, we are adding some additional
NULL checks to ensure safe usage by users of these APIs.

Signed-off-by: Divya Indi 
---
 kernel/trace/trace.c| 3 +++
 kernel/trace/trace_events.c | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 468ecc5..22bf166 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3205,6 +3205,9 @@ int trace_array_printk(struct trace_array *tr,
if (!(global_trace.trace_flags & TRACE_ITER_PRINTK))
return 0;
 
+   if (!tr)
+   return -ENOENT;
+
va_start(ap, fmt);
ret = trace_array_vprintk(tr, ip, fmt, ap);
va_end(ap);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 0ce3db6..2621995 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -800,6 +800,8 @@ static int ftrace_set_clr_event(struct trace_array *tr, 
char *buf, int set)
char *event = NULL, *sub = NULL, *match;
int ret;
 
+   if (!tr)
+   return -ENOENT;
/*
 * The buf format can be :
 *  *: means any event by that name.
-- 
1.8.3.1



Re: [PATCH 7/7] tracing: Un-export ftrace_set_clr_event

2019-08-07 Thread Divya Indi

Hi Steven,

On 8/2/19 2:25 PM, Steven Rostedt wrote:

On Fri, 2 Aug 2019 14:12:54 -0700
Divya Indi  wrote:


Hi Steve,

The first patch would be like a temporary fix in case we need more
changes to the patches that add the new function - trace_array_set_clr()
and unexport ftrace_set_clr_event() and might take some time. In which
case I think it would be good to have this in place (But, not part of
this series).

If they all are to go in together as part of the same release ie if all
is good with the concerned patches (Patch 6 & Patch 7), then I think
having this patch would be meaningless.

Can you just do this part of patch 6 instead?


Yes, will merge the two ie -

1)  Add 2 new functions - trace_array_set_clr_event(), trace_array_lookup()

2)  Unexport ftrace_set_clr_event.

into a single patch.


Thanks,

Divya



+/**
+ * trace_array_set_clr_event - enable or disable an event for a trace array
+ * @system: system name to match (NULL for any system)
+ * @event: event name to match (NULL for all events, within system)
+ * @set: 1 to enable, 0 to disable
+ *
+ * This is a way for other parts of the kernel to enable or disable
+ * event recording to instances.
+ *
+ * Returns 0 on success, -EINVAL if the parameters do not match any
+ * registered events.
+ */
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set)
+{
+   if (!tr)
+   return -ENOENT;
+
+   return __ftrace_set_clr_event(tr, NULL, system, event, set);
+}
+EXPORT_SYMBOL_GPL(trace_array_set_clr_event);
+

-- Steve


Re: [PATCH 7/7] tracing: Un-export ftrace_set_clr_event

2019-08-02 Thread Divya Indi

Hi Steve,

The first patch would be like a temporary fix in case we need more 
changes to the patches that add the new function - trace_array_set_clr() 
and unexport ftrace_set_clr_event() and might take some time. In which 
case I think it would be good to have this in place (But, not part of 
this series).


If they all are to go in together as part of the same release ie if all 
is good with the concerned patches (Patch 6 & Patch 7), then I think 
having this patch would be meaningless.



Thanks,

Divya

On 8/2/19 1:46 PM, Steven Rostedt wrote:

On Fri, 2 Aug 2019 13:41:20 -0700
Divya Indi  wrote:


As a stand alone patch, the first one may be fine. But as part of a
series, it doesn't make sense to add it.

I see. Will separate this out from the series.

Is that really needed? Do you need to have that patch in the kernel?

Do you plan on marking it for stable?

-- Steve


Re: [PATCH 7/7] tracing: Un-export ftrace_set_clr_event

2019-07-30 Thread Divya Indi

Hi Steven,

On 7/29/19 5:51 PM, Steven Rostedt wrote:

On Mon, 29 Jul 2019 17:02:34 -0700
Divya Indi  wrote:


Use "trace_array_set_clr_event" to enable/disable events to a trace
array from other kernel modules/components.
Hence, we no longer need to have ftrace_set_clr_event as an exported API.

Hmm, this simply reverts patch 1. Why have patch 1 and this patch in
the first place?


Right! First patch fixes issues in a previous commit and then this patch 
reverts the part of previous commit that required the fix.


We can eliminate the first patch if it seems counter intuitive.


Thanks,

Divya



-- Steve


Signed-off-by: Divya Indi 
Reviewed-By: Aruna Ramakrishna 


[PATCH 3/7] tracing: Verify if trace array exists before destroying it.

2019-07-29 Thread Divya Indi
A trace array can be destroyed from userspace or kernel. Verify if the
trace array exists before proceeding to destroy/remove it.

Signed-off-by: Divya Indi 
Reviewed-By: Aruna Ramakrishna 
---
 kernel/trace/trace.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 1c80521..468ecc5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8421,17 +8421,27 @@ static int __remove_instance(struct trace_array *tr)
return 0;
 }
 
-int trace_array_destroy(struct trace_array *tr)
+int trace_array_destroy(struct trace_array *this_tr)
 {
+   struct trace_array *tr;
int ret;
 
-   if (!tr)
+   if (!this_tr) {
return -EINVAL;
+   }
 
mutex_lock(_mutex);
mutex_lock(_types_lock);
 
-   ret = __remove_instance(tr);
+   ret = -ENODEV;
+
+   /* Making sure trace array exists before destroying it. */
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (tr == this_tr) {
+   ret = __remove_instance(tr);
+   break;
+   }
+   }
 
mutex_unlock(_types_lock);
mutex_unlock(_mutex);
-- 
1.8.3.1



[PATCH 1/7] tracing: Required changes to use ftrace_set_clr_event.

2019-07-29 Thread Divya Indi
As part of commit f45d1225adb0 ("tracing: Kernel access to Ftrace
instances") ftrace_set_clr_event was exported, but for other kernel
modules to use the function, we require the following additional changes

1. Removing the static keyword for this newly exported function.
2. Declaring it in the header file - include/linux/trace_events.h

Signed-off-by: Divya Indi 
Reviewed-By: Aruna Ramakrishna 
---
 include/linux/trace_events.h | 1 +
 kernel/trace/trace_events.c  | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8a62731..0f874fb 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -540,6 +540,7 @@ extern int trace_define_field(struct trace_event_call 
*call, const char *type,
 #define is_signed_type(type)   (((type)(-1)) < (type)1)
 
 int trace_set_clr_event(const char *system, const char *event, int set);
+int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
 
 /*
  * The double __builtin_constant_p is because gcc will give us an error
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 0ce3db6..b6b4618 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -795,7 +795,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, 
const char *match,
return ret;
 }
 
-static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
+int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
 {
char *event = NULL, *sub = NULL, *match;
int ret;
-- 
1.8.3.1



[PATCH 7/7] tracing: Un-export ftrace_set_clr_event

2019-07-29 Thread Divya Indi
Use "trace_array_set_clr_event" to enable/disable events to a trace
array from other kernel modules/components.
Hence, we no longer need to have ftrace_set_clr_event as an exported API.

Signed-off-by: Divya Indi 
Reviewed-By: Aruna Ramakrishna 
---
 include/linux/trace_events.h | 1 -
 kernel/trace/trace_events.c  | 3 +--
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 6da3600..025ae2b 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -542,7 +542,6 @@ extern int trace_define_field(struct trace_event_call 
*call, const char *type,
 int trace_set_clr_event(const char *system, const char *event, int set);
 int trace_array_set_clr_event(struct trace_array *tr, const char *system,
const char *event, int set);
-int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
 
 /*
  * The double __builtin_constant_p is because gcc will give us an error
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 9ee6b52..96dd997 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -795,7 +795,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, 
const char *match,
return ret;
 }
 
-int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
+static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
 {
char *event = NULL, *sub = NULL, *match;
int ret;
@@ -834,7 +834,6 @@ int ftrace_set_clr_event(struct trace_array *tr, char *buf, 
int set)
 
return ret;
 }
-EXPORT_SYMBOL_GPL(ftrace_set_clr_event);
 
 /**
  * trace_set_clr_event - enable or disable an event
-- 
1.8.3.1



[PATCH 2/7] tracing: Declare newly exported APIs in include/linux/trace.h

2019-07-29 Thread Divya Indi
Declare the newly introduced and exported APIs in the header file -
include/linux/trace.h. Moving previous declarations from
kernel/trace/trace.h to include/linux/trace.h.

Signed-off-by: Divya Indi 
Reviewed-By: Aruna Ramakrishna 
---
 include/linux/trace.h | 7 +++
 kernel/trace/trace.h  | 4 +---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index b95ffb2..24fcf07 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -24,6 +24,13 @@ struct trace_export {
 int register_ftrace_export(struct trace_export *export);
 int unregister_ftrace_export(struct trace_export *export);
 
+struct trace_array;
+
+void trace_printk_init_buffers(void);
+int trace_array_printk(struct trace_array *tr, unsigned long ip,
+   const char *fmt, ...);
+struct trace_array *trace_array_create(const char *name);
+int trace_array_destroy(struct trace_array *tr);
 #endif /* CONFIG_TRACING */
 
 #endif /* _LINUX_TRACE_H */
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 005f086..66ff63e 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -852,8 +853,6 @@ extern int trace_selftest_startup_branch(struct tracer 
*trace,
 extern int
 trace_array_vprintk(struct trace_array *tr,
unsigned long ip, const char *fmt, va_list args);
-int trace_array_printk(struct trace_array *tr,
-  unsigned long ip, const char *fmt, ...);
 int trace_array_printk_buf(struct ring_buffer *buffer,
   unsigned long ip, const char *fmt, ...);
 void trace_printk_seq(struct trace_seq *s);
@@ -1869,7 +1868,6 @@ extern int trace_event_enable_disable(struct 
trace_event_file *file,
 extern const char *__stop___tracepoint_str[];
 
 void trace_printk_control(bool enabled);
-void trace_printk_init_buffers(void);
 void trace_printk_start_comm(void);
 int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set);
 int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled);
-- 
1.8.3.1



[PATCH 5/7] tracing: Handle the trace array ref counter in new functions

2019-07-29 Thread Divya Indi
For functions returning a trace array Eg: trace_array_create(), we need to
increment the reference counter associated with the trace array to ensure it
does not get freed when in use.

Once we are done using the trace array, we need to call
trace_array_put() to make sure we are not holding a reference to it
anymore and the instance/trace array can be removed when required.

Hence, additionally exporting trace_array_put().

Example use:

tr = trace_array_create("foo-bar");
// Use this trace array
// Log to this trace array or enable/disable events to this trace array.


// tr no longer required
trace_array_put();

Signed-off-by: Divya Indi 
Reviewed-By: Aruna Ramakrishna 
---
 include/linux/trace.h |  1 +
 kernel/trace/trace.c  | 43 ++-
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index 24fcf07..2c782d5 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -31,6 +31,7 @@ int trace_array_printk(struct trace_array *tr, unsigned long 
ip,
const char *fmt, ...);
 struct trace_array *trace_array_create(const char *name);
 int trace_array_destroy(struct trace_array *tr);
+void trace_array_put(struct trace_array *tr);
 #endif /* CONFIG_TRACING */
 
 #endif /* _LINUX_TRACE_H */
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 22bf166..130579b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -297,12 +297,22 @@ static void __trace_array_put(struct trace_array *this_tr)
this_tr->ref--;
 }
 
+/**
+ * trace_array_put - Decrement reference counter for the given trace array.
+ * @tr: Trace array for which reference counter needs to decrement.
+ *
+ * NOTE: Functions like trace_array_create increment the reference counter for
+ * the trace array to ensure they do not get freed while in use. Make sure to
+ * call trace_array_put() when the use is done. This will ensure that the
+ * instance can be later removed.
+ */
 void trace_array_put(struct trace_array *this_tr)
 {
mutex_lock(_types_lock);
__trace_array_put(this_tr);
mutex_unlock(_types_lock);
 }
+EXPORT_SYMBOL_GPL(trace_array_put);
 
 int call_filter_check_discard(struct trace_event_call *call, void *rec,
  struct ring_buffer *buffer,
@@ -8302,6 +8312,16 @@ static void update_tracer_options(struct trace_array *tr)
mutex_unlock(_types_lock);
 }
 
+/**
+ * trace_array_create - Create a new trace array with the given name.
+ * @name: The name of the trace array to be created.
+ *
+ * Create and return a trace array with given name if it does not exist.
+ *
+ * NOTE: The reference counter associated with the returned trace array is
+ * incremented to ensure it is not freed when in use. Make sure to call
+ * "trace_array_put" for this trace array when its use is done.
+ */
 struct trace_array *trace_array_create(const char *name)
 {
struct trace_array *tr;
@@ -8364,6 +8384,8 @@ struct trace_array *trace_array_create(const char *name)
 
list_add(>list, _trace_arrays);
 
+   tr->ref++;
+
mutex_unlock(_types_lock);
mutex_unlock(_mutex);
 
@@ -8385,7 +8407,21 @@ struct trace_array *trace_array_create(const char *name)
 
 static int instance_mkdir(const char *name)
 {
-   return PTR_ERR_OR_ZERO(trace_array_create(name));
+   struct trace_array *tr;
+
+   tr = trace_array_create(name);
+   if (IS_ERR(tr))
+   return PTR_ERR(tr);
+
+   /*
+* This function does not return a reference to the created trace array,
+* so the reference counter is to be 0 when it returns.
+* trace_array_create increments the ref counter, decrement it here
+* by calling trace_array_put()
+*/
+   trace_array_put(tr);
+
+   return 0;
 }
 
 static int __remove_instance(struct trace_array *tr)
@@ -8424,6 +8460,11 @@ static int __remove_instance(struct trace_array *tr)
return 0;
 }
 
+/*
+ * NOTE: An instance cannot be removed if there is still a reference to it.
+ * Make sure to call "trace_array_put" for a trace array returned by functions
+ * like trace_array_create(), otherwise trace_array_destroy will not succeed.
+ */
 int trace_array_destroy(struct trace_array *this_tr)
 {
struct trace_array *tr;
-- 
1.8.3.1



[PATCH 4/7] tracing: Adding NULL checks

2019-07-29 Thread Divya Indi
As part of commit f45d1225adb0 ("tracing: Kernel access to Ftrace
instances") we exported certain functions. Here, we are adding some additional
NULL checks to ensure safe usage by users of these APIs.

Signed-off-by: Divya Indi 
Reviewed-By: Aruna Ramakrishna 
---
 kernel/trace/trace.c| 3 +++
 kernel/trace/trace_events.c | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 468ecc5..22bf166 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3205,6 +3205,9 @@ int trace_array_printk(struct trace_array *tr,
if (!(global_trace.trace_flags & TRACE_ITER_PRINTK))
return 0;
 
+   if (!tr)
+   return -ENOENT;
+
va_start(ap, fmt);
ret = trace_array_vprintk(tr, ip, fmt, ap);
va_end(ap);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index b6b4618..99eb5f8 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -800,6 +800,8 @@ int ftrace_set_clr_event(struct trace_array *tr, char *buf, 
int set)
char *event = NULL, *sub = NULL, *match;
int ret;
 
+   if (!tr)
+   return -ENOENT;
/*
 * The buf format can be :
 *  *: means any event by that name.
-- 
1.8.3.1



[PATCH 6/7] tracing: New functions for kernel access to Ftrace instances

2019-07-29 Thread Divya Indi
Adding 2 new functions -
1) trace_array_lookup : Look up and return a trace array, given its
name.
2) trace_array_set_clr_event : Enable/disable event recording to the
given trace array.

NOTE: trace_array_lookup returns a trace array and also increments the
reference counter associated with the returned trace array. Make sure to
call trace_array_put() once the use is done so that the instance can be
removed at a later time.

Example use:

tr = trace_array_lookup("foo-bar");
if (!tr)
tr = trace_array_create("foo-bar");
// Log to tr
// Enable/disable events to tr
trace_array_set_clr_event(tr, _THIS_IP,"system","event",1);

// Done using tr
trace_array_put(tr);
..

Signed-off-by: Divya Indi 
Reviewed-By: Aruna Ramakrishna 
---
 include/linux/trace.h|  2 ++
 include/linux/trace_events.h |  2 ++
 kernel/trace/trace.c | 28 
 kernel/trace/trace_events.c  | 22 ++
 4 files changed, 54 insertions(+)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index 2c782d5..05164bb 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -32,6 +32,8 @@ int trace_array_printk(struct trace_array *tr, unsigned long 
ip,
 struct trace_array *trace_array_create(const char *name);
 int trace_array_destroy(struct trace_array *tr);
 void trace_array_put(struct trace_array *tr);
+struct trace_array *trace_array_lookup(const char *name);
+
 #endif /* CONFIG_TRACING */
 
 #endif /* _LINUX_TRACE_H */
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 0f874fb..6da3600 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -540,6 +540,8 @@ extern int trace_define_field(struct trace_event_call 
*call, const char *type,
 #define is_signed_type(type)   (((type)(-1)) < (type)1)
 
 int trace_set_clr_event(const char *system, const char *event, int set);
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set);
 int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
 
 /*
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 130579b..fcf42bb 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8516,6 +8516,34 @@ static int instance_rmdir(const char *name)
return ret;
 }
 
+/**
+ * trace_array_lookup - Lookup the trace array, given its name.
+ * @name: The name of the trace array to be looked up.
+ *
+ * Lookup and return the trace array associated with @name.
+ *
+ * NOTE: The reference counter associated with the returned trace array is
+ * incremented to ensure it is not freed when in use. Make sure to call
+ * "trace_array_put" for this trace array when its use is done.
+ */
+struct trace_array *trace_array_lookup(const char *name)
+{
+   struct trace_array *tr;
+
+   mutex_lock(_types_lock);
+
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (tr->name && strcmp(tr->name, name) == 0) {
+   tr->ref++;
+   mutex_unlock(_types_lock);
+   return tr;
+   }
+   }
+   mutex_unlock(_types_lock);
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(trace_array_lookup);
+
 static __init void create_trace_instances(struct dentry *d_tracer)
 {
trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 99eb5f8..9ee6b52 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -859,6 +859,28 @@ int trace_set_clr_event(const char *system, const char 
*event, int set)
 }
 EXPORT_SYMBOL_GPL(trace_set_clr_event);
 
+/**
+ * trace_array_set_clr_event - enable or disable an event for a trace array
+ * @system: system name to match (NULL for any system)
+ * @event: event name to match (NULL for all events, within system)
+ * @set: 1 to enable, 0 to disable
+ *
+ * This is a way for other parts of the kernel to enable or disable
+ * event recording to instances.
+ *
+ * Returns 0 on success, -EINVAL if the parameters do not match any
+ * registered events.
+ */
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set)
+{
+   if (!tr)
+   return -ENOENT;
+
+   return __ftrace_set_clr_event(tr, NULL, system, event, set);
+}
+EXPORT_SYMBOL_GPL(trace_array_set_clr_event);
+
 /* 128 should be much more than enough */
 #define EVENT_BUF_SIZE 127
 
-- 
1.8.3.1



[PATCH 0/7[v3]] Kernel access to Ftrace instances.

2019-07-29 Thread Divya Indi
In addition to patches introduced by commit f45d1225adb0 "tracing: Kernel
access to Ftrace instances") we also need the following patches to reliably
access ftrace instances from other kernel modules or components.

Please review the patches that follow.

Divya Indi (7):
  tracing: Required changes to use ftrace_set_clr_event.
  tracing: Declare newly exported APIs in include/linux/trace.h
  tracing: Verify if trace array exists before destroying it.
  tracing: Adding NULL checks
  tracing: Handle the trace array ref counter in new functions
  tracing: New functions for kernel access to Ftrace instances
  tracing: Un-export ftrace_set_clr_event

 include/linux/trace.h| 10 +
 include/linux/trace_events.h |  2 +
 kernel/trace/trace.c | 90 ++--
 kernel/trace/trace.h |  4 +-
 kernel/trace/trace_events.c  | 25 +++-
 5 files changed, 123 insertions(+), 8 deletions(-)

-- 
1.8.3.1



Re: [PATCH 3/3] tracing: Add 2 new funcs. for kernel access to Ftrace instances.

2019-06-19 Thread Divya Indi

Hi Steven,

Thanks for reviewing the patch. I have made a note of all the suggested 
changes, comment requirements.


Will send them out soon as v2.

(comments inline)

On 6/17/19 4:45 PM, Steven Rostedt wrote:

On Wed, 12 Jun 2019 09:34:19 -0700
Divya Indi  wrote:


Adding 2 new functions -
1) trace_array_lookup : Look up and return a trace array, given its
name.
2) trace_array_set_clr_event : Enable/disable event recording to the
given trace array.

Newly added functions trace_array_lookup and trace_array_create also
need to increment the reference counter associated with the trace array
they return. This is to ensure the trace array does not get freed
while in use by the newly introduced APIs.
The reference ctr is decremented in the trace_array_destroy.

Signed-off-by: Divya Indi 
---
  include/linux/trace_events.h |  3 +++
  kernel/trace/trace.c | 30 +-
  kernel/trace/trace_events.c  | 22 ++
  3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index d7b7d85..0cc99a8 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -545,7 +545,10 @@ int trace_array_printk(struct trace_array *tr, unsigned 
long ip,
  struct trace_array *trace_array_create(const char *name);
  int trace_array_destroy(struct trace_array *tr);
  int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
+struct trace_array *trace_array_lookup(const char *name);
  int trace_set_clr_event(const char *system, const char *event, int set);
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set);
  
  /*

   * The double __builtin_constant_p is because gcc will give us an error
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a60dc13..fb70ccc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8364,6 +8364,8 @@ struct trace_array *trace_array_create(const char *name)
  
  	list_add(>list, _trace_arrays);
  
+	tr->ref++;

+

Needs a comment for why the ref is incremented.



mutex_unlock(_types_lock);
mutex_unlock(_mutex);
  
@@ -8385,7 +8387,14 @@ struct trace_array *trace_array_create(const char *name)
  
  static int instance_mkdir(const char *name)

  {
-   return PTR_ERR_OR_ZERO(trace_array_create(name));
+   struct trace_array *tr;
+
+   tr = trace_array_create(name);
+   if (IS_ERR(tr))
+   return PTR_ERR(tr);
+   trace_array_put(tr);

Need a comment to why we need to do a put here.


+
+   return 0;
  }
  
  static int __remove_instance(struct trace_array *tr)

@@ -8434,6 +8443,7 @@ int trace_array_destroy(struct trace_array *tr)
mutex_lock(_mutex);
mutex_lock(_types_lock);
  
+	tr->ref--;

ret = __remove_instance(tr);
  
  	mutex_unlock(_types_lock);

@@ -8465,6 +8475,24 @@ static int instance_rmdir(const char *name)
return ret;
  }
  

Need a kerneldoc heading here, that also states that if a trace_array
is returned, it requires a call to trace_array_put().



+struct trace_array *trace_array_lookup(const char *name)
+{
+   struct trace_array *tr;
+
+   mutex_lock(_types_lock);
+
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (tr->name && strcmp(tr->name, name) == 0) {
+   tr->ref++;
+   mutex_unlock(_types_lock);
+   return tr;
+   }
+   }
+   mutex_unlock(_types_lock);
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(trace_array_lookup);
+
  static __init void create_trace_instances(struct dentry *d_tracer)
  {
trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 445b059..c126d2c 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -859,6 +859,28 @@ int trace_set_clr_event(const char *system, const char 
*event, int set)
  }
  EXPORT_SYMBOL_GPL(trace_set_clr_event);
  
+/**

+ * trace_array_set_clr_event - enable or disable an event for a trace array
+ * @system: system name to match (NULL for any system)
+ * @event: event name to match (NULL for all events, within system)
+ * @set: 1 to enable, 0 to disable
+ *
+ * This is a way for other parts of the kernel to enable or disable
+ * event recording to instances.
+ *
+ * Returns 0 on success, -EINVAL if the parameters do not match any
+ * registered events.
+ */
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set)
+{
+   if (!tr)
+   return -ENODEV;

If we have this, should we get rid of the external use of
ftrace_set_clr_event()?


I think so too. External use of trace_array_set_clr_event is more 
convenient and intuitive. Will make the change.



Thanks,

Divya



-- Steve


+
+   return __f

[PATCH 3/3] tracing: Add 2 new funcs. for kernel access to Ftrace instances.

2019-06-12 Thread Divya Indi
Adding 2 new functions -
1) trace_array_lookup : Look up and return a trace array, given its
name.
2) trace_array_set_clr_event : Enable/disable event recording to the
given trace array.

Newly added functions trace_array_lookup and trace_array_create also
need to increment the reference counter associated with the trace array
they return. This is to ensure the trace array does not get freed
while in use by the newly introduced APIs.
The reference ctr is decremented in the trace_array_destroy.

Signed-off-by: Divya Indi 
---
 include/linux/trace_events.h |  3 +++
 kernel/trace/trace.c | 30 +-
 kernel/trace/trace_events.c  | 22 ++
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index d7b7d85..0cc99a8 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -545,7 +545,10 @@ int trace_array_printk(struct trace_array *tr, unsigned 
long ip,
 struct trace_array *trace_array_create(const char *name);
 int trace_array_destroy(struct trace_array *tr);
 int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
+struct trace_array *trace_array_lookup(const char *name);
 int trace_set_clr_event(const char *system, const char *event, int set);
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set);
 
 /*
  * The double __builtin_constant_p is because gcc will give us an error
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a60dc13..fb70ccc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8364,6 +8364,8 @@ struct trace_array *trace_array_create(const char *name)
 
list_add(>list, _trace_arrays);
 
+   tr->ref++;
+
mutex_unlock(_types_lock);
mutex_unlock(_mutex);
 
@@ -8385,7 +8387,14 @@ struct trace_array *trace_array_create(const char *name)
 
 static int instance_mkdir(const char *name)
 {
-   return PTR_ERR_OR_ZERO(trace_array_create(name));
+   struct trace_array *tr;
+
+   tr = trace_array_create(name);
+   if (IS_ERR(tr))
+   return PTR_ERR(tr);
+   trace_array_put(tr);
+
+   return 0;
 }
 
 static int __remove_instance(struct trace_array *tr)
@@ -8434,6 +8443,7 @@ int trace_array_destroy(struct trace_array *tr)
mutex_lock(_mutex);
mutex_lock(_types_lock);
 
+   tr->ref--;
ret = __remove_instance(tr);
 
mutex_unlock(_types_lock);
@@ -8465,6 +8475,24 @@ static int instance_rmdir(const char *name)
return ret;
 }
 
+struct trace_array *trace_array_lookup(const char *name)
+{
+   struct trace_array *tr;
+
+   mutex_lock(_types_lock);
+
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (tr->name && strcmp(tr->name, name) == 0) {
+   tr->ref++;
+   mutex_unlock(_types_lock);
+   return tr;
+   }
+   }
+   mutex_unlock(_types_lock);
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(trace_array_lookup);
+
 static __init void create_trace_instances(struct dentry *d_tracer)
 {
trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 445b059..c126d2c 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -859,6 +859,28 @@ int trace_set_clr_event(const char *system, const char 
*event, int set)
 }
 EXPORT_SYMBOL_GPL(trace_set_clr_event);
 
+/**
+ * trace_array_set_clr_event - enable or disable an event for a trace array
+ * @system: system name to match (NULL for any system)
+ * @event: event name to match (NULL for all events, within system)
+ * @set: 1 to enable, 0 to disable
+ *
+ * This is a way for other parts of the kernel to enable or disable
+ * event recording to instances.
+ *
+ * Returns 0 on success, -EINVAL if the parameters do not match any
+ * registered events.
+ */
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set)
+{
+   if (!tr)
+   return -ENODEV;
+
+   return __ftrace_set_clr_event(tr, NULL, system, event, set);
+}
+EXPORT_SYMBOL_GPL(trace_array_set_clr_event);
+
 /* 128 should be much more than enough */
 #define EVENT_BUF_SIZE 127
 
-- 
1.8.3.1



[PATCH 2/3] tracing: Adding additional NULL checks.

2019-06-12 Thread Divya Indi
commit f45d1225adb0 ("tracing: Kernel access to Ftrace instances")
exported certain functions providing access to Ftrace instances from
other kernel components.
Adding some additional NULL checks to ensure safe usage by the users.

Signed-off-by: Divya Indi 
---
 kernel/trace/trace.c| 3 +++
 kernel/trace/trace_events.c | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 1c80521..a60dc13 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3205,6 +3205,9 @@ int trace_array_printk(struct trace_array *tr,
if (!(global_trace.trace_flags & TRACE_ITER_PRINTK))
return 0;
 
+   if (!tr)
+   return -EINVAL;
+
va_start(ap, fmt);
ret = trace_array_vprintk(tr, ip, fmt, ap);
va_end(ap);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index b6b4618..445b059 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -800,6 +800,8 @@ int ftrace_set_clr_event(struct trace_array *tr, char *buf, 
int set)
char *event = NULL, *sub = NULL, *match;
int ret;
 
+   if (!tr)
+   return -ENODEV;
/*
 * The buf format can be :
 *  *: means any event by that name.
-- 
1.8.3.1



[PATCH 1/3] tracing: Relevant changes for kernel access to Ftrace instances.

2019-06-12 Thread Divya Indi
For commit f45d1225adb0 ("tracing: Kernel access to Ftrace instances")
Adding the following changes to ensure other kernel components can
use these functions -
1) Remove static keyword for newly exported fn - ftrace_set_clr_event.
2) Add the req functions to header file include/linux/trace_events.h.

Signed-off-by: Divya Indi 
---
 include/linux/trace_events.h | 6 ++
 kernel/trace/trace_events.c  | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8a62731..d7b7d85 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -539,6 +539,12 @@ extern int trace_define_field(struct trace_event_call 
*call, const char *type,
 
 #define is_signed_type(type)   (((type)(-1)) < (type)1)
 
+void trace_printk_init_buffers(void);
+int trace_array_printk(struct trace_array *tr, unsigned long ip,
+   const char *fmt, ...);
+struct trace_array *trace_array_create(const char *name);
+int trace_array_destroy(struct trace_array *tr);
+int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
 int trace_set_clr_event(const char *system, const char *event, int set);
 
 /*
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 0ce3db6..b6b4618 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -795,7 +795,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, 
const char *match,
return ret;
 }
 
-static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
+int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
 {
char *event = NULL, *sub = NULL, *match;
int ret;
-- 
1.8.3.1



[RFC][v2] Kernel Access to Ftrace Instances.

2019-06-12 Thread Divya Indi
Hi,
Please review the patches that follow -

[PATCH 1/3] tracing: Relevant changes for kernel access to Ftrace instances.
[PATCH 2/3] tracing: Adding additional NULL checks.
[PATCH 3/3] tracing: Add 2 new funcs. for kernel access to Ftrace instances.

This is v2 for the series with changes made to Patch 3/3 to address review 
comments. 

The new changes ensure that a trace array created by trace_array_create or
accessed via trace_array_lookup cannot be freed if still in use.

Let me know if you have any questions/concerns/suggestions. 

Thanks,
Divya 



Re: [PATCH 3/3] tracing: Add 2 new funcs. for kernel access to Ftrace instances.

2019-06-12 Thread Divya Indi

Hi Steven,

Thanks for taking the time to review. Please find my comments inline.

On 6/8/19 2:51 PM, Steven Rostedt wrote:

On Tue,  4 Jun 2019 17:42:05 -0700
Divya Indi  wrote:


Adding 2 new functions -
1) trace_array_lookup : Look up and return a trace array, given its
name.
2) trace_array_set_clr_event : Enable/disable event recording to the
given trace array.

Signed-off-by: Divya Indi 
---
  include/linux/trace_events.h |  3 +++
  kernel/trace/trace.c | 11 +++
  kernel/trace/trace_events.c  | 22 ++
  3 files changed, 36 insertions(+)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index d7b7d85..0cc99a8 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -545,7 +545,10 @@ int trace_array_printk(struct trace_array *tr, unsigned 
long ip,
  struct trace_array *trace_array_create(const char *name);
  int trace_array_destroy(struct trace_array *tr);
  int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
+struct trace_array *trace_array_lookup(const char *name);
  int trace_set_clr_event(const char *system, const char *event, int set);
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set);
  
  /*

   * The double __builtin_constant_p is because gcc will give us an error
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a60dc13..1d171fd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8465,6 +8465,17 @@ static int instance_rmdir(const char *name)
return ret;
  }
  
+struct trace_array *trace_array_lookup(const char *name)

+{
+   struct trace_array *tr;
+   list_for_each_entry(tr, _trace_arrays, list) {

Accessing the ftrace_trace_arrays requires taking the trace_types_lock.
It should also probably increment the ref counter too, and then
trace_array_put() needs to be called.

This prevents the trace array from being freed while something has
access to it.

-- Steve


Agree - Noted!

Also, adding a similar change for trace_array_create which also returns 
a ptr to a newly created trace_array so will face the same issue.


Since trace_array_lookup and trace_array_create will be accompanied by a 
trace_array_destroy once the use of the trace_array is done, 
decrementing the reference ctr here.


Sending a v2 to address this.


Thanks,

Divya




+   if (tr->name && strcmp(tr->name, name) == 0)
+   return tr;
+   }
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(trace_array_lookup);
+
  static __init void create_trace_instances(struct dentry *d_tracer)
  {
trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 445b059..c126d2c 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -859,6 +859,28 @@ int trace_set_clr_event(const char *system, const char 
*event, int set)
  }
  EXPORT_SYMBOL_GPL(trace_set_clr_event);
  
+/**

+ * trace_array_set_clr_event - enable or disable an event for a trace array
+ * @system: system name to match (NULL for any system)
+ * @event: event name to match (NULL for all events, within system)
+ * @set: 1 to enable, 0 to disable
+ *
+ * This is a way for other parts of the kernel to enable or disable
+ * event recording to instances.
+ *
+ * Returns 0 on success, -EINVAL if the parameters do not match any
+ * registered events.
+ */
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set)
+{
+   if (!tr)
+   return -ENODEV;
+
+   return __ftrace_set_clr_event(tr, NULL, system, event, set);
+}
+EXPORT_SYMBOL_GPL(trace_array_set_clr_event);
+
  /* 128 should be much more than enough */
  #define EVENT_BUF_SIZE127
  


[RFC] Kernel Access to Ftrace instances.

2019-06-04 Thread Divya Indi
Hi, 
Please Review the patches that follow. These include -

[PATCH 1/3] tracing: Relevant changes for kernel access to Ftrace instances.
[PATCH 2/3] tracing: Adding additional NULL checks.
[PATCH 3/3] tracing: Add 2 new funcs. for kernel access to Ftrace instances.

Let me know if you have any concerns or questions. 

A sample module demonstrating the use of the above functions will follow soon. 

Thanks,
Divya


[PATCH 2/3] tracing: Adding additional NULL checks.

2019-06-04 Thread Divya Indi
Now that we have exported certain functions providing access to Ftrace
instances from other kernel components, we are adding some additional
NULL checks to ensure safe usage by the users.

Signed-off-by: Divya Indi 
---
 kernel/trace/trace.c| 3 +++
 kernel/trace/trace_events.c | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 1c80521..a60dc13 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3205,6 +3205,9 @@ int trace_array_printk(struct trace_array *tr,
if (!(global_trace.trace_flags & TRACE_ITER_PRINTK))
return 0;
 
+   if (!tr)
+   return -EINVAL;
+
va_start(ap, fmt);
ret = trace_array_vprintk(tr, ip, fmt, ap);
va_end(ap);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index b6b4618..445b059 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -800,6 +800,8 @@ int ftrace_set_clr_event(struct trace_array *tr, char *buf, 
int set)
char *event = NULL, *sub = NULL, *match;
int ret;
 
+   if (!tr)
+   return -ENODEV;
/*
 * The buf format can be :
 *  *: means any event by that name.
-- 
1.8.3.1



[PATCH 1/3] tracing: Relevant changes for kernel access to Ftrace instances.

2019-06-04 Thread Divya Indi
For commit (f45d122): tracing: Kernel access to Ftrace instances.
We need the following additional changes to ensure other kernel components can
use these functions -
1) Remove static keyword for newly exported fn - ftrace_set_clr_event.
2) Add the req functions to header file include/linux/trace_events.h.

Signed-off-by: Divya Indi 
---
 include/linux/trace_events.h | 6 ++
 kernel/trace/trace_events.c  | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8a62731..d7b7d85 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -539,6 +539,12 @@ extern int trace_define_field(struct trace_event_call 
*call, const char *type,
 
 #define is_signed_type(type)   (((type)(-1)) < (type)1)
 
+void trace_printk_init_buffers(void);
+int trace_array_printk(struct trace_array *tr, unsigned long ip,
+   const char *fmt, ...);
+struct trace_array *trace_array_create(const char *name);
+int trace_array_destroy(struct trace_array *tr);
+int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
 int trace_set_clr_event(const char *system, const char *event, int set);
 
 /*
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 0ce3db6..b6b4618 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -795,7 +795,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, 
const char *match,
return ret;
 }
 
-static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
+int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
 {
char *event = NULL, *sub = NULL, *match;
int ret;
-- 
1.8.3.1



[PATCH 3/3] tracing: Add 2 new funcs. for kernel access to Ftrace instances.

2019-06-04 Thread Divya Indi
Adding 2 new functions -
1) trace_array_lookup : Look up and return a trace array, given its
name.
2) trace_array_set_clr_event : Enable/disable event recording to the
given trace array.

Signed-off-by: Divya Indi 
---
 include/linux/trace_events.h |  3 +++
 kernel/trace/trace.c | 11 +++
 kernel/trace/trace_events.c  | 22 ++
 3 files changed, 36 insertions(+)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index d7b7d85..0cc99a8 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -545,7 +545,10 @@ int trace_array_printk(struct trace_array *tr, unsigned 
long ip,
 struct trace_array *trace_array_create(const char *name);
 int trace_array_destroy(struct trace_array *tr);
 int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
+struct trace_array *trace_array_lookup(const char *name);
 int trace_set_clr_event(const char *system, const char *event, int set);
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set);
 
 /*
  * The double __builtin_constant_p is because gcc will give us an error
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a60dc13..1d171fd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8465,6 +8465,17 @@ static int instance_rmdir(const char *name)
return ret;
 }
 
+struct trace_array *trace_array_lookup(const char *name)
+{
+   struct trace_array *tr;
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (tr->name && strcmp(tr->name, name) == 0)
+   return tr;
+   }
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(trace_array_lookup);
+
 static __init void create_trace_instances(struct dentry *d_tracer)
 {
trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 445b059..c126d2c 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -859,6 +859,28 @@ int trace_set_clr_event(const char *system, const char 
*event, int set)
 }
 EXPORT_SYMBOL_GPL(trace_set_clr_event);
 
+/**
+ * trace_array_set_clr_event - enable or disable an event for a trace array
+ * @system: system name to match (NULL for any system)
+ * @event: event name to match (NULL for all events, within system)
+ * @set: 1 to enable, 0 to disable
+ *
+ * This is a way for other parts of the kernel to enable or disable
+ * event recording to instances.
+ *
+ * Returns 0 on success, -EINVAL if the parameters do not match any
+ * registered events.
+ */
+int trace_array_set_clr_event(struct trace_array *tr, const char *system,
+   const char *event, int set)
+{
+   if (!tr)
+   return -ENODEV;
+
+   return __ftrace_set_clr_event(tr, NULL, system, event, set);
+}
+EXPORT_SYMBOL_GPL(trace_array_set_clr_event);
+
 /* 128 should be much more than enough */
 #define EVENT_BUF_SIZE 127
 
-- 
1.8.3.1



Re: [PATCH] tracing: Kernel access to Ftrace instances

2019-05-20 Thread divya . indi




On 05/20/2019 08:23 AM, Steven Rostedt wrote:

On Wed, 15 May 2019 20:24:43 -0700
Steven Rostedt  wrote:

It seems your's already in Steve's ftrace/core branch, so I think you
can make
additional patch to fix it. Steve, is that OK?
  

Yes. In fact I already sent a pull request to Linus.  Please send a patch on 
top of my ftrace/core branch.


And now it is in mainline (v5.2-rc1). Please send a patch with a sample
module (as Masami requested). Also, that function not only needs to be
changed to not being static, you need to add it to a header
(include/linux/trace_events.h?)

Thanks!

Working on it. Will send it out soon.

Thanks,
Divya


-- Steve




Re: [PATCH] tracing: Kernel access to Ftrace instances

2019-05-15 Thread Divya Indi

Hi Masami,

Thanks for pointing it out.

Yes, it should not be static.

On 5/15/19 5:09 PM, Masami Hiramatsu wrote:

HI Divya,

On Wed, 20 Mar 2019 11:28:51 -0700
Divya Indi  wrote:


Ftrace provides the feature “instances” that provides the capability to
create multiple Ftrace ring buffers. However, currently these buffers
are created/accessed via userspace only. The kernel APIs providing these
features are not exported, hence cannot be used by other kernel
components.

This patch aims to extend this infrastructure to provide the
flexibility to create/log/remove/ enable-disable existing trace events
to these buffers from within the kernel.

Signed-off-by: Divya Indi 
Reviewed-by: Joe Jin 

Would you tested these APIs with your module? Since,
[...]

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 5b3b0c3..81c038e 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -832,6 +832,7 @@ static int ftrace_set_clr_event(struct trace_array *tr, 
char *buf, int set)
  
  	return ret;

  }
+EXPORT_SYMBOL_GPL(ftrace_set_clr_event);

I found this exports a static function to module. Did it work?


I had tested the changes with my module. This change to static was added 
in the test patch, but somehow missed it in the final patch that was 
sent out.


Will send a new patch along with a few additional ones to add some NULL 
checks to ensure safe usage by modules and add the APIs to a header file 
that can be used by the modules.


Thanks,

Divya



Thank you,



[PATCH] tracing: Kernel access to Ftrace instances

2019-03-20 Thread Divya Indi
Ftrace provides the feature “instances” that provides the capability to
create multiple Ftrace ring buffers. However, currently these buffers
are created/accessed via userspace only. The kernel APIs providing these
features are not exported, hence cannot be used by other kernel
components.

This patch aims to extend this infrastructure to provide the
flexibility to create/log/remove/ enable-disable existing trace events
to these buffers from within the kernel.

Signed-off-by: Divya Indi 
Reviewed-by: Joe Jin 
---
 kernel/trace/trace.c| 74 ++---
 kernel/trace/trace_events.c |  1 +
 2 files changed, 51 insertions(+), 24 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c4238b4..eaf163a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2878,6 +2878,7 @@ void trace_printk_init_buffers(void)
if (global_trace.trace_buffer.buffer)
tracing_start_cmdline_record();
 }
+EXPORT_SYMBOL_GPL(trace_printk_init_buffers);
 
 void trace_printk_start_comm(void)
 {
@@ -3038,6 +3039,7 @@ int trace_array_printk(struct trace_array *tr,
va_end(ap);
return ret;
 }
+EXPORT_SYMBOL_GPL(trace_array_printk);
 
 __printf(3, 4)
 int trace_array_printk_buf(struct ring_buffer *buffer,
@@ -7832,7 +7834,7 @@ static void update_tracer_options(struct trace_array *tr)
mutex_unlock(_types_lock);
 }
 
-static int instance_mkdir(const char *name)
+struct trace_array *trace_array_create(const char *name)
 {
struct trace_array *tr;
int ret;
@@ -7896,7 +7898,7 @@ static int instance_mkdir(const char *name)
mutex_unlock(_types_lock);
mutex_unlock(_mutex);
 
-   return 0;
+   return tr;
 
  out_free_tr:
free_trace_buffers(tr);
@@ -7908,33 +7910,21 @@ static int instance_mkdir(const char *name)
mutex_unlock(_types_lock);
mutex_unlock(_mutex);
 
-   return ret;
+   return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(trace_array_create);
 
+static int instance_mkdir(const char *name)
+{
+   return PTR_ERR_OR_ZERO(trace_array_create(name));
 }
 
-static int instance_rmdir(const char *name)
+static int __remove_instance(struct trace_array *tr)
 {
-   struct trace_array *tr;
-   int found = 0;
-   int ret;
int i;
 
-   mutex_lock(_mutex);
-   mutex_lock(_types_lock);
-
-   ret = -ENODEV;
-   list_for_each_entry(tr, _trace_arrays, list) {
-   if (tr->name && strcmp(tr->name, name) == 0) {
-   found = 1;
-   break;
-   }
-   }
-   if (!found)
-   goto out_unlock;
-
-   ret = -EBUSY;
if (tr->ref || (tr->current_trace && tr->current_trace->ref))
-   goto out_unlock;
+   return -EBUSY;
 
list_del(>list);
 
@@ -7960,10 +7950,46 @@ static int instance_rmdir(const char *name)
free_cpumask_var(tr->tracing_cpumask);
kfree(tr->name);
kfree(tr);
+   tr = NULL;
 
-   ret = 0;
+   return 0;
+}
+
+int trace_array_destroy(struct trace_array *tr)
+{
+   int ret;
+
+   if (!tr)
+   return -EINVAL;
+
+   mutex_lock(_mutex);
+   mutex_lock(_types_lock);
+
+   ret = __remove_instance(tr);
+
+   mutex_unlock(_types_lock);
+   mutex_unlock(_mutex);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(trace_array_destroy);
+
+static int instance_rmdir(const char *name)
+{
+   struct trace_array *tr;
+   int ret;
+
+   mutex_lock(_mutex);
+   mutex_lock(_types_lock);
+
+   ret = -ENODEV;
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (tr->name && strcmp(tr->name, name) == 0) {
+   ret = __remove_instance(tr);
+   break;
+   }
+   }
 
- out_unlock:
mutex_unlock(_types_lock);
mutex_unlock(_mutex);
 
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 5b3b0c3..81c038e 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -832,6 +832,7 @@ static int ftrace_set_clr_event(struct trace_array *tr, 
char *buf, int set)
 
return ret;
 }
+EXPORT_SYMBOL_GPL(ftrace_set_clr_event);
 
 /**
  * trace_set_clr_event - enable or disable an event
-- 
1.8.3.1



[RFC] Kernel access to Ftrace instances.

2019-03-20 Thread Divya Indi
[PATCH] tracing: Kernel access to Ftrace instances.

Please review the patch that follows. Below are some details providing
the goal and justification for the patch.

===

Goal:

Ftrace provides the feature “instances” that provides the capability to
create multiple Ftrace ring buffers. However, currently these buffers
are created/accessed via userspace only. The kernel APIs providing these
features are not exported, hence cannot be used by other kernel
components. We want to extend this infrastructure to provide the
flexibility to create/log/remove/ enable-disable existing trace events
to these buffers from within the kernel.


Justification:

 1. We need module-specific/use-case specific ring buffers (apart
from the global trace buffer) to avoid overwrite by other components.
Hence, the need to use Ftrace "instances".

 2. Flexibility to add additional logging to these module-specific
buffers via ksplice/live patch - Having a trace_printk counterpart for
these additional ring buffers.

 3. Most often certain issues and events can be best monitored
within kernel.

 4. Time sensitivity - We need the capability to dynamically enable
and disable tracing from within kernel to extract relevant debugging
info for the right time-window.


Example:

When the kernel detects an unexpected event such as connection drop (Eg:
RDS/NFS connection drops), we need the capability to enable specific
event tracing to capture relevant info during reconnect. This event
tracing will help us diagnose issues that occur during reconnect like
RCA longer reconnect times. In such cases we also want to disable the
tracing  at the right moment and capture a snapshot from within kernel
to make sure we have the relevant diagnostics data and nothing is
overwritten or lost.


Note: The additional logging is not part of the kernel. We intend to
only provide the flexibility to add the logging as part of diagnostics
via ksplice/live-patch on need-basis.


Please find below the compilation of APIs to be introduced or exported as is.


We propose adding two new functions:

1. struct trace_array *trace_array_create(const char *name);
2. int trace_array_destroy(struct trace_array *tr);


In addition, we need to export functions:

3. int trace_array_printk(struct trace_array *tr, unsigned long ip,
 const char *fmt, ...);
4. int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
5. void trace_printk_init_buffers(void);


To workaround the redundancy due to the newly introduced APIs, we propose the
following restructuring -

1. Move the contents of instance_mkdir to the new API.
static int instance_mkdir(const char *name)
{
 return PTR_ERR_OR_ZERO(trace_array_create(name));
}

2. Introduce internal static function: __remove_instance(struct trace_array *tr)
This will be almost similar to old instance_rmdir which
identified the trace_array to be removed based on the name.

Modify existing API to use the internal function:
static int instance_rmdir(const char *name)
{
 struct trace_array *tr;
 int err = -ENODEV;

 mutex_lock(_mutex);
 mutex_lock(_types_lock);

 list_for_each_entry(tr, _trace_arrays, list) {
 if (tr->name && strcmp(tr->name, name) == 0) {
 err = __remove_instance(tr);
 break;
 }
 }

 mutex_unlock(_types_lock);
 mutex_unlock(_mutex);

 return err;
}

New API to be exported:
int trace_array_destroy(struct trace_array *tr)
{
 int err;

 mutex_lock(_mutex);
 mutex_lock(_types_lock);
 err = __remove_instance(tr);
 mutex_unlock(_types_lock);
 mutex_unlock(_mutex);

 return err;
}



Thanks,
Divya 


Re: [PATCH] tracing: Kernel access to ftrace instances

2019-03-20 Thread divya . indi

Sure, thanks for taking the time to review.

There are a few issues with the patch styling and some minor modifications. I 
will shortly send the v2 for the same.


Regards,
Divya


On 03/19/2019 01:16 PM, Steven Rostedt wrote:

On Fri, 15 Mar 2019 16:35:25 -0700
Divya Indi  wrote:


Ftrace provides the feature “instances” that provides the capability to
create multiple Ftrace ring buffers. However, currently these buffers
are created/accessed via userspace only. The kernel APIs providing these
features are not exported, hence cannot be used by other kernel
components.

This patch aims to extend this infrastructure to provide the
flexibility to create/log/remove/ enable-disable existing trace events
to these buffers from within the kernel.



Thanks for sending this. I'm currently working on some other changes,
but will look in this when I'm finished with the other work. Should be
within this week.

-- Steve




[PATCH] tracing: Kernel access to ftrace instances

2019-03-15 Thread Divya Indi
Ftrace provides the feature “instances” that provides the capability to
create multiple Ftrace ring buffers. However, currently these buffers
are created/accessed via userspace only. The kernel APIs providing these
features are not exported, hence cannot be used by other kernel
components.

This patch aims to extend this infrastructure to provide the
flexibility to create/log/remove/ enable-disable existing trace events
to these buffers from within the kernel.

Signed-off-by: Divya Indi 
---
 kernel/trace/trace.c| 72 +
 kernel/trace/trace_events.c |  1 +
 2 files changed, 48 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c4238b4..a5e7e51 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2878,6 +2878,7 @@ void trace_printk_init_buffers(void)
if (global_trace.trace_buffer.buffer)
tracing_start_cmdline_record();
 }
+EXPORT_SYMBOL_GPL(trace_printk_init_buffers);
 
 void trace_printk_start_comm(void)
 {
@@ -3038,6 +3039,7 @@ int trace_array_printk(struct trace_array *tr,
va_end(ap);
return ret;
 }
+EXPORT_SYMBOL_GPL(trace_array_printk);
 
 __printf(3, 4)
 int trace_array_printk_buf(struct ring_buffer *buffer,
@@ -7832,7 +7834,7 @@ static void update_tracer_options(struct trace_array *tr)
mutex_unlock(_types_lock);
 }
 
-static int instance_mkdir(const char *name)
+struct trace_array *trace_array_create(const char *name)
 {
struct trace_array *tr;
int ret;
@@ -7896,7 +7898,7 @@ static int instance_mkdir(const char *name)
mutex_unlock(_types_lock);
mutex_unlock(_mutex);
 
-   return 0;
+   return tr;
 
  out_free_tr:
free_trace_buffers(tr);
@@ -7908,33 +7910,21 @@ static int instance_mkdir(const char *name)
mutex_unlock(_types_lock);
mutex_unlock(_mutex);
 
-   return ret;
+   return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(trace_array_create);
 
+static int instance_mkdir(const char *name)
+{
+   return PTR_ERR_OR_ZERO(trace_array_create(name));
 }
 
-static int instance_rmdir(const char *name)
+static int __remove_instance(struct trace_array *tr)
 {
-   struct trace_array *tr;
-   int found = 0;
-   int ret;
int i;
-
-   mutex_lock(_mutex);
-   mutex_lock(_types_lock);
-
-   ret = -ENODEV;
-   list_for_each_entry(tr, _trace_arrays, list) {
-   if (tr->name && strcmp(tr->name, name) == 0) {
-   found = 1;
-   break;
-   }
-   }
-   if (!found)
-   goto out_unlock;
-
-   ret = -EBUSY;
+   
if (tr->ref || (tr->current_trace && tr->current_trace->ref))
-   goto out_unlock;
+   return -EBUSY;
 
list_del(>list);
 
@@ -7961,9 +7951,41 @@ static int instance_rmdir(const char *name)
kfree(tr->name);
kfree(tr);
 
-   ret = 0;
+   return 0;
+}
 
- out_unlock:
+int trace_array_destroy(struct trace_array *tr)
+{
+   int ret;
+
+   mutex_lock(_mutex);
+   mutex_lock(_types_lock);
+
+   ret = __remove_instance(tr);
+
+   mutex_unlock(_types_lock);
+   mutex_unlock(_mutex);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(trace_array_destroy);
+   
+static int instance_rmdir(const char *name)
+{
+   struct trace_array *tr;
+   int ret;
+
+   mutex_lock(_mutex);
+   mutex_lock(_types_lock);
+
+   ret = -ENODEV;
+   list_for_each_entry(tr, _trace_arrays, list) {
+   if (tr->name && strcmp(tr->name, name) == 0) {
+   ret = __remove_instance(tr);
+   break;
+   }
+   }
+   
mutex_unlock(_types_lock);
mutex_unlock(_mutex);
 
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 5b3b0c3..81c038e 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -832,6 +832,7 @@ static int ftrace_set_clr_event(struct trace_array *tr, 
char *buf, int set)
 
return ret;
 }
+EXPORT_SYMBOL_GPL(ftrace_set_clr_event);
 
 /**
  * trace_set_clr_event - enable or disable an event
-- 
1.8.3.1



[RFC] Kernel access to Ftrace instances.

2019-03-15 Thread Divya Indi
[PATCH] tracing: Kernel access to ftrace instances.

Please review the patch that follows. Below are some details providing
the goal and justification for the patch.

===

Goal:

Ftrace provides the feature “instances” that provides the capability to
create multiple Ftrace ring buffers. However, currently these buffers
are created/accessed via userspace only. The kernel APIs providing these
features are not exported, hence cannot be used by other kernel
components. We want to extend this infrastructure to provide the
flexibility to create/log/remove/ enable-disable existing trace events
to these buffers from within the kernel.


Justification:

 1. We need module-specific/use-case specific ring buffers (apart
from the global trace buffer) to avoid overwrite by other components.
Hence, the need to use Ftrace "instances".

 2. Flexibility to add additional logging to these module-specific
buffers via ksplice/live patch - Having a trace_printk counterpart for
these additional ring buffers.

 3. Most often certain issues and events can be best monitored
within kernel.

 4. Time sensitivity - We need the capability to dynamically enable
and disable tracing from within kernel to extract relevant debugging
info for the right time-window.


Example:

When the kernel detects an unexpected event such as connection drop (Eg:
RDS/NFS connection drops), we need the capability to enable specific
event tracing to capture relevant info during reconnect. This event
tracing will help us diagnose issues that occur during reconnect like
RCA longer reconnect times. In such cases we also want to disable the
tracing  at the right moment and capture a snapshot from within kernel
to make sure we have the relevant diagnostics data and nothing is
overwritten or lost.


Note: The additional logging is not part of the kernel. We intend to
only provide the flexibility to add the logging as part of diagnostics
via ksplice/live-patch on need-basis.


Please find below the compilation of APIs to be introduced or exported as is.


We propose adding two new functions:

1. struct trace_array *trace_array_create(const char *name);
2. int trace_array_destroy(struct trace_array *tr);


In addition, we need to export functions:

3. int trace_array_printk(struct trace_array *tr, unsigned long ip,
 const char *fmt, ...);
4. int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set);
5. void trace_printk_init_buffers(void);


To workaround the redundancy due to the newly introduced APIs, we propose the
following restructuring -

1. Move the contents of instance_mkdir to the new API.
static int instance_mkdir(const char *name)
{
 return PTR_ERR_OR_ZERO(trace_array_create(name));
}

2. Introduce internal static function: __remove_instance(struct trace_array *tr)
This will be almost similar to old instance_rmdir which
identified the trace_array to be removed based on the name.

Modify existing API to use the internal function:
static int instance_rmdir(const char *name)
{
 struct trace_array *tr;
 int err = -ENODEV;

 mutex_lock(_mutex);
 mutex_lock(_types_lock);

 list_for_each_entry(tr, _trace_arrays, list) {
 if (tr->name && strcmp(tr->name, name) == 0) {
 err = __remove_instance(tr);
 break;
 }
 }

 mutex_unlock(_types_lock);
 mutex_unlock(_mutex);

 return err;
}

New API to be exported:
int trace_array_destroy(struct trace_array *tr)
{
 int err;

 mutex_lock(_mutex);
 mutex_lock(_types_lock);
 err = __remove_instance(tr);
 mutex_unlock(_types_lock);
 mutex_unlock(_mutex);

 return err;
}



Thanks,
Divya