Re: [PATCH v4] IB/sa: Resolving use-after-free in ib_nl_send_msg
Thanks Jason. Appreciate your help and feedback for fixing this issue. Would it be possible to access the edited version of the patch? If yes, please share a pointer to the same. Thanks, Divya On 7/2/20 12:07 PM, Jason Gunthorpe wrote: > On Tue, Jun 23, 2020 at 07:13:09PM -0700, Divya Indi wrote: >> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list >> before sending")' >> - >> 1. Adds the query to the request list before ib_nl_snd_msg. >> 2. Moves ib_nl_send_msg out of spinlock, hence safe to use gfp_mask as is. >> >> However, if there is a delay in sending out the request (For >> eg: Delay due to low memory situation) the timer to handle request timeout >> might kick in before the request is sent out to ibacm via netlink. >> ib_nl_request_timeout may release the query causing a use after free >> situation >> while accessing the query in ib_nl_send_msg. >> >> Call Trace for the above race: >> >> [] ? ib_pack+0x17b/0x240 [ib_core] >> [] ib_sa_path_rec_get+0x181/0x200 [ib_sa] >> [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] >> [] ? cma_bind_port+0xa0/0xa0 [rdma_cm] >> [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 >> [rds_rdma] >> [] rds_rdma_cm_event_handler_cmn+0x22c/0x850 >> [rds_rdma] >> [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] >> [] addr_handler+0x9e/0x140 [rdma_cm] >> [] process_req+0x134/0x190 [ib_addr] >> [] process_one_work+0x169/0x4a0 >> [] worker_thread+0x5b/0x560 >> [] ? flush_delayed_work+0x50/0x50 >> [] kthread+0xcb/0xf0 >> [] ? __schedule+0x24a/0x810 >> [] ? __schedule+0x24a/0x810 >> [] ? kthread_create_on_node+0x180/0x180 >> [] ret_from_fork+0x47/0x90 >> [] ? kthread_create_on_node+0x180/0x180 >> >> RIP [] send_mad+0x33d/0x5d0 [ib_sa] >> >> To resolve the above issue - >> 1. Add the req to the request list only after the request has been sent out. >> 2. To handle the race where response comes in before adding request to >> the request list, send(rdma_nl_multicast) and add to list while holding the >> spinlock - request_lock. >> 3. Use non blocking memory allocation flags for rdma_nl_multicast since it is >> called while holding a spinlock. >> >> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list >> before sending") >> >> Signed-off-by: Divya Indi >> --- >> v1: >> - Use flag IB_SA_NL_QUERY_SENT to prevent the use-after-free. >> >> v2: >> - Use atomic bit ops for setting and testing IB_SA_NL_QUERY_SENT. >> - Rewording and adding comments. >> >> v3: >> - Change approach and remove usage of IB_SA_NL_QUERY_SENT. >> - Add req to request list only after the request has been sent out. >> - Send and add to list while holding the spinlock (request_lock). >> - Overide gfp_mask and use GFP_NOWAIT for rdma_nl_multicast since we >> need non blocking memory allocation while holding spinlock. >> >> v4: >> - Formatting changes. >> - Use GFP_NOWAIT conditionally - Only when GFP_ATOMIC is not provided by >> caller. >> --- >> drivers/infiniband/core/sa_query.c | 41 >> ++ >> 1 file changed, 24 insertions(+), 17 deletions(-) > I made a few edits, and applied to for-rc > > Thanks, > Jason
Re: [PATCH v4] IB/sa: Resolving use-after-free in ib_nl_send_msg
Hi Leon, Please find my comments inline - On 6/25/20 3:09 AM, Leon Romanovsky wrote: > On Tue, Jun 23, 2020 at 07:13:09PM -0700, Divya Indi wrote: >> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list >> before sending")' >> - >> 1. Adds the query to the request list before ib_nl_snd_msg. >> 2. Moves ib_nl_send_msg out of spinlock, hence safe to use gfp_mask as is. >> >> However, if there is a delay in sending out the request (For >> eg: Delay due to low memory situation) the timer to handle request timeout >> might kick in before the request is sent out to ibacm via netlink. >> ib_nl_request_timeout may release the query causing a use after free >> situation >> while accessing the query in ib_nl_send_msg. >> >> Call Trace for the above race: >> >> [] ? ib_pack+0x17b/0x240 [ib_core] >> [] ib_sa_path_rec_get+0x181/0x200 [ib_sa] >> [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] >> [] ? cma_bind_port+0xa0/0xa0 [rdma_cm] >> [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 >> [rds_rdma] >> [] rds_rdma_cm_event_handler_cmn+0x22c/0x850 >> [rds_rdma] >> [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] >> [] addr_handler+0x9e/0x140 [rdma_cm] >> [] process_req+0x134/0x190 [ib_addr] >> [] process_one_work+0x169/0x4a0 >> [] worker_thread+0x5b/0x560 >> [] ? flush_delayed_work+0x50/0x50 >> [] kthread+0xcb/0xf0 >> [] ? __schedule+0x24a/0x810 >> [] ? __schedule+0x24a/0x810 >> [] ? kthread_create_on_node+0x180/0x180 >> [] ret_from_fork+0x47/0x90 >> [] ? kthread_create_on_node+0x180/0x180 >> >> RIP [] send_mad+0x33d/0x5d0 [ib_sa] >> >> To resolve the above issue - >> 1. Add the req to the request list only after the request has been sent out. >> 2. To handle the race where response comes in before adding request to >> the request list, send(rdma_nl_multicast) and add to list while holding the >> spinlock - request_lock. >> 3. Use non blocking memory allocation flags for rdma_nl_multicast since it is >> called while holding a spinlock. >> >> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list >> before sending") >> >> Signed-off-by: Divya Indi >> --- >> v1: >> - Use flag IB_SA_NL_QUERY_SENT to prevent the use-after-free. >> >> v2: >> - Use atomic bit ops for setting and testing IB_SA_NL_QUERY_SENT. >> - Rewording and adding comments. >> >> v3: >> - Change approach and remove usage of IB_SA_NL_QUERY_SENT. >> - Add req to request list only after the request has been sent out. >> - Send and add to list while holding the spinlock (request_lock). >> - Overide gfp_mask and use GFP_NOWAIT for rdma_nl_multicast since we >> need non blocking memory allocation while holding spinlock. >> >> v4: >> - Formatting changes. >> - Use GFP_NOWAIT conditionally - Only when GFP_ATOMIC is not provided by >> caller. >> --- >> drivers/infiniband/core/sa_query.c | 41 >> ++ >> 1 file changed, 24 insertions(+), 17 deletions(-) >> >> diff --git a/drivers/infiniband/core/sa_query.c >> b/drivers/infiniband/core/sa_query.c >> index 74e0058..9066d48 100644 >> --- a/drivers/infiniband/core/sa_query.c >> +++ b/drivers/infiniband/core/sa_query.c >> @@ -836,6 +836,10 @@ static int ib_nl_send_msg(struct ib_sa_query *query, >> gfp_t gfp_mask) >> void *data; >> struct ib_sa_mad *mad; >> int len; >> +unsigned long flags; >> +unsigned long delay; >> +gfp_t gfp_flag; >> +int ret; >> >> mad = query->mad_buf->mad; >> len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask); >> @@ -860,36 +864,39 @@ static int ib_nl_send_msg(struct ib_sa_query *query, >> gfp_t gfp_mask) >> /* Repair the nlmsg header length */ >> nlmsg_end(skb, nlh); >> >> -return rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, gfp_mask); >> -} >> +gfp_flag = ((gfp_mask & GFP_ATOMIC) == GFP_ATOMIC) ? GFP_ATOMIC : >> +GFP_NOWAIT; > I would say that the better way will be to write something like this: > gfp_flag |= GFP_NOWAIT; You mean gfp_flag = gfp_mask|GFP_NOWAIT? [We dont want to modify the gfp_mask sent by caller] #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM) If a caller passes GFP_KERNEL, "gfp_mask|GFP_NOWAIT" will still have __GFP_RECLAIM, __GFP_IO and __GFP_FS set which is not suitable for using under spinlock. Thanks, Divya > > Thanks
[PATCH v4] IB/sa: Resolving use-after-free in ib_nl_send_msg
Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending")' - 1. Adds the query to the request list before ib_nl_snd_msg. 2. Moves ib_nl_send_msg out of spinlock, hence safe to use gfp_mask as is. However, if there is a delay in sending out the request (For eg: Delay due to low memory situation) the timer to handle request timeout might kick in before the request is sent out to ibacm via netlink. ib_nl_request_timeout may release the query causing a use after free situation while accessing the query in ib_nl_send_msg. Call Trace for the above race: [] ? ib_pack+0x17b/0x240 [ib_core] [] ib_sa_path_rec_get+0x181/0x200 [ib_sa] [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] [] ? cma_bind_port+0xa0/0xa0 [rdma_cm] [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 [rds_rdma] [] rds_rdma_cm_event_handler_cmn+0x22c/0x850 [rds_rdma] [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] [] addr_handler+0x9e/0x140 [rdma_cm] [] process_req+0x134/0x190 [ib_addr] [] process_one_work+0x169/0x4a0 [] worker_thread+0x5b/0x560 [] ? flush_delayed_work+0x50/0x50 [] kthread+0xcb/0xf0 [] ? __schedule+0x24a/0x810 [] ? __schedule+0x24a/0x810 [] ? kthread_create_on_node+0x180/0x180 [] ret_from_fork+0x47/0x90 [] ? kthread_create_on_node+0x180/0x180 RIP [] send_mad+0x33d/0x5d0 [ib_sa] To resolve the above issue - 1. Add the req to the request list only after the request has been sent out. 2. To handle the race where response comes in before adding request to the request list, send(rdma_nl_multicast) and add to list while holding the spinlock - request_lock. 3. Use non blocking memory allocation flags for rdma_nl_multicast since it is called while holding a spinlock. Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending") Signed-off-by: Divya Indi --- v1: - Use flag IB_SA_NL_QUERY_SENT to prevent the use-after-free. v2: - Use atomic bit ops for setting and testing IB_SA_NL_QUERY_SENT. - Rewording and adding comments. v3: - Change approach and remove usage of IB_SA_NL_QUERY_SENT. - Add req to request list only after the request has been sent out. - Send and add to list while holding the spinlock (request_lock). - Overide gfp_mask and use GFP_NOWAIT for rdma_nl_multicast since we need non blocking memory allocation while holding spinlock. v4: - Formatting changes. - Use GFP_NOWAIT conditionally - Only when GFP_ATOMIC is not provided by caller. --- drivers/infiniband/core/sa_query.c | 41 ++ 1 file changed, 24 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 74e0058..9066d48 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -836,6 +836,10 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask) void *data; struct ib_sa_mad *mad; int len; + unsigned long flags; + unsigned long delay; + gfp_t gfp_flag; + int ret; mad = query->mad_buf->mad; len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask); @@ -860,36 +864,39 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask) /* Repair the nlmsg header length */ nlmsg_end(skb, nlh); - return rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, gfp_mask); -} + gfp_flag = ((gfp_mask & GFP_ATOMIC) == GFP_ATOMIC) ? GFP_ATOMIC : + GFP_NOWAIT; -static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask) -{ - unsigned long flags; - unsigned long delay; - int ret; + spin_lock_irqsave(_nl_request_lock, flags); + ret = rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, gfp_flag); - INIT_LIST_HEAD(>list); - query->seq = (u32)atomic_inc_return(_nl_sa_request_seq); + if (ret) + goto out; - /* Put the request on the list first.*/ - spin_lock_irqsave(_nl_request_lock, flags); + /* Put the request on the list.*/ delay = msecs_to_jiffies(sa_local_svc_timeout_ms); query->timeout = delay + jiffies; list_add_tail(>list, _nl_request_list); /* Start the timeout if this is the only request */ if (ib_nl_request_list.next == >list) queue_delayed_work(ib_nl_wq, _nl_timed_work, delay); + +out: spin_unlock_irqrestore(_nl_request_lock, flags); + return ret; +} + +static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask) +{ + int ret; + + INIT_LIST_HEAD(>list); + query->seq = (u32)atomic_inc_return(_nl_sa_request_seq); + ret = ib_nl_send_msg(query, gfp_mask); - if (ret) { + if (ret) ret = -EIO; - /* Remove the request */ - spin_lock_irqsave(_nl_request_lock, flags); - list_del(>list); - spin_unlock_irqrestore(_nl_request_lock, flags); - } return ret; } -- 1.8.3.1
Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
Hi Jason, Thanks for taking the time to review! On 6/17/20 11:24 AM, Jason Gunthorpe wrote: > On Tue, Jun 16, 2020 at 10:56:53AM -0700, Divya Indi wrote: >> The other option might be to use GFP_NOWAIT conditionally ie >> (only use GFP_NOWAIT when GFP_ATOMIC is not specified in gfp_mask else >> use GFP_ATOMIC). Eventual goal being to not have a blocking memory >> allocation. > This is probably safest for now, unless you can audit all callers and > see if they can switch to GFP_NOWAIT as well At present the callers with GFP_ATOMIC appear to be ipoib. Might not be feasible to change them all to GFP_NOWAIT. Will incorporate the review comments and send out v3 early next week. Thanks, Divya > > Jason
Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
Hi Leon, Please find my comments inline - On 6/13/20 11:41 PM, Leon Romanovsky wrote: > On Tue, Jun 09, 2020 at 07:45:21AM -0700, Divya Indi wrote: >> Hi Leon, >> >> Thanks for taking the time to review. >> >> Please find my comments inline - >> >> On 6/9/20 12:00 AM, Leon Romanovsky wrote: >>> On Mon, Jun 08, 2020 at 07:46:16AM -0700, Divya Indi wrote: >>>> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list >>>> before sending")' >>>> - >>>> 1. Adds the query to the request list before ib_nl_snd_msg. >>>> 2. Removes ib_nl_send_msg from within the spinlock which also makes it >>>> possible to allocate memory with GFP_KERNEL. >>>> >>>> However, if there is a delay in sending out the request (For >>>> eg: Delay due to low memory situation) the timer to handle request timeout >>>> might kick in before the request is sent out to ibacm via netlink. >>>> ib_nl_request_timeout may release the query causing a use after free >>>> situation >>>> while accessing the query in ib_nl_send_msg. >>>> >>>> Call Trace for the above race: >>>> >>>> [] ? ib_pack+0x17b/0x240 [ib_core] >>>> [] ib_sa_path_rec_get+0x181/0x200 [ib_sa] >>>> [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] >>>> [] ? cma_bind_port+0xa0/0xa0 [rdma_cm] >>>> [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 >>>> [rds_rdma] >>>> [] rds_rdma_cm_event_handler_cmn+0x22c/0x850 >>>> [rds_rdma] >>>> [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] >>>> [] addr_handler+0x9e/0x140 [rdma_cm] >>>> [] process_req+0x134/0x190 [ib_addr] >>>> [] process_one_work+0x169/0x4a0 >>>> [] worker_thread+0x5b/0x560 >>>> [] ? flush_delayed_work+0x50/0x50 >>>> [] kthread+0xcb/0xf0 >>>> [] ? __schedule+0x24a/0x810 >>>> [] ? __schedule+0x24a/0x810 >>>> [] ? kthread_create_on_node+0x180/0x180 >>>> [] ret_from_fork+0x47/0x90 >>>> [] ? kthread_create_on_node+0x180/0x180 >>>> >>>> RIP [] send_mad+0x33d/0x5d0 [ib_sa] >>>> >>>> To resolve the above issue - >>>> 1. Add the req to the request list only after the request has been sent >>>> out. >>>> 2. To handle the race where response comes in before adding request to >>>> the request list, send(rdma_nl_multicast) and add to list while holding the >>>> spinlock - request_lock. >>>> 3. Use GFP_NOWAIT for rdma_nl_multicast since it is called while holding >>>> a spinlock. In case of memory allocation failure, request will go out to >>>> SA. >>>> >>>> Signed-off-by: Divya Indi >>>> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list >>>> before sending") >>> Author SOB should be after "Fixes" line. >> My bad. Noted. >> >>>> --- >>>> drivers/infiniband/core/sa_query.c | 34 +- >>>> 1 file changed, 17 insertions(+), 17 deletions(-) >>>> >>>> diff --git a/drivers/infiniband/core/sa_query.c >>>> b/drivers/infiniband/core/sa_query.c >>>> index 74e0058..042c99b 100644 >>>> --- a/drivers/infiniband/core/sa_query.c >>>> +++ b/drivers/infiniband/core/sa_query.c >>>> @@ -836,6 +836,9 @@ static int ib_nl_send_msg(struct ib_sa_query *query, >>>> gfp_t gfp_mask) >>>>void *data; >>>>struct ib_sa_mad *mad; >>>>int len; >>>> + unsigned long flags; >>>> + unsigned long delay; >>>> + int ret; >>>> >>>>mad = query->mad_buf->mad; >>>>len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask); >>>> @@ -860,35 +863,32 @@ static int ib_nl_send_msg(struct ib_sa_query *query, >>>> gfp_t gfp_mask) >>>>/* Repair the nlmsg header length */ >>>>nlmsg_end(skb, nlh); >>>> >>>> - return rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, gfp_mask); >>>> + spin_lock_irqsave(_nl_request_lock, flags); >>>> + ret = rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, GFP_NOWAIT); >>> It is hard to be convinced that this is correct solution. The mix of >>> gfp_flags and GFP_NOWAIT at the same time and usage of >>> ib_nl_request_lock to protect lists and suddenly rdma_nl_multicast() too >>&
Re: [PATCH v2] sample-trace-array: Fix sleeping function called from invalid context
On 6/9/20 6:51 AM, Kefeng Wang wrote: > BUG: sleeping function called from invalid context at > kernel/locking/mutex.c:935 > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/5 > 1 lock held by swapper/5/0: > #0: 80001002bd90 (samples/ftrace/sample-trace-array.c:38){+.-.}-{0:0}, > at: call_timer_fn+0x8/0x3e0 > CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.7.0+ #8 > Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 > Call trace: > dump_backtrace+0x0/0x1a0 > show_stack+0x20/0x30 > dump_stack+0xe4/0x150 > ___might_sleep+0x160/0x200 > __might_sleep+0x58/0x90 > __mutex_lock+0x64/0x948 > mutex_lock_nested+0x3c/0x58 > __ftrace_set_clr_event+0x44/0x88 > trace_array_set_clr_event+0x24/0x38 > mytimer_handler+0x34/0x40 [sample_trace_array] > > mutex_lock() will be called in interrupt context, using workqueue to fix it. Fixes: > Signed-off-by: Kefeng Wang Reviewed-by: Divya Indi > --- > v2: > - add include of linux/workqueue.h > - add missing cancel_work_sync() suggested by Divya Indi > > samples/ftrace/sample-trace-array.c | 18 +- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/samples/ftrace/sample-trace-array.c > b/samples/ftrace/sample-trace-array.c > index d523450d73eb..9e437f930280 100644 > --- a/samples/ftrace/sample-trace-array.c > +++ b/samples/ftrace/sample-trace-array.c > @@ -6,6 +6,7 @@ > #include > #include > #include > +#include > > /* > * Any file that uses trace points, must include the header. > @@ -20,6 +21,16 @@ struct trace_array *tr; > static void mytimer_handler(struct timer_list *unused); > static struct task_struct *simple_tsk; > > +static void trace_work_fn(struct work_struct *work) > +{ > + /* > + * Disable tracing for event "sample_event". > + */ > + trace_array_set_clr_event(tr, "sample-subsystem", "sample_event", > + false); > +} > +static DECLARE_WORK(trace_work, trace_work_fn); > + > /* > * mytimer: Timer setup to disable tracing for event "sample_event". This > * timer is only for the purposes of the sample module to demonstrate access > of > @@ -29,11 +40,7 @@ static DEFINE_TIMER(mytimer, mytimer_handler); > > static void mytimer_handler(struct timer_list *unused) > { > - /* > - * Disable tracing for event "sample_event". > - */ > - trace_array_set_clr_event(tr, "sample-subsystem", "sample_event", > - false); > + schedule_work(_work); > } > > static void simple_thread_func(int count) > @@ -76,6 +83,7 @@ static int simple_thread(void *arg) > simple_thread_func(count++); > > del_timer(); > + cancel_work_sync(_work); > > /* >* trace_array_put() decrements the reference counter associated with
Re: [PATCH] sample-trace-array: Remove trace_array 'sample-instance'
Reviewed-by: Divya Indi On 6/9/20 6:52 AM, Kefeng Wang wrote: > Remove trace_array 'sample-instance' if kthread_run fails > in sample_trace_array_init(). > > Signed-off-by: Kefeng Wang > --- > samples/ftrace/sample-trace-array.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/samples/ftrace/sample-trace-array.c > b/samples/ftrace/sample-trace-array.c > index 9e437f930280..6aba02a31c96 100644 > --- a/samples/ftrace/sample-trace-array.c > +++ b/samples/ftrace/sample-trace-array.c > @@ -115,8 +115,12 @@ static int __init sample_trace_array_init(void) > trace_printk_init_buffers(); > > simple_tsk = kthread_run(simple_thread, NULL, "sample-instance"); > - if (IS_ERR(simple_tsk)) > + if (IS_ERR(simple_tsk)) { > + trace_array_put(tr); > + trace_array_destroy(tr); > return -1; > + } > + > return 0; > } >
Re: Review Request
Thanks Leon, Noted! On 6/9/20 12:03 AM, Leon Romanovsky wrote: > On Mon, Jun 08, 2020 at 07:46:15AM -0700, Divya Indi wrote: >> [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg >> >> Hi, >> >> Please review the patch that follows. > Please read Documentation/process/submitting-patches.rst > 14) The canonical patch format > > You don't need an extra email "Review request" and Changelog should be > put inside the patch itself under "---" marker. > > Thanks
Re: [PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
Hi Leon, Thanks for taking the time to review. Please find my comments inline - On 6/9/20 12:00 AM, Leon Romanovsky wrote: > On Mon, Jun 08, 2020 at 07:46:16AM -0700, Divya Indi wrote: >> Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list >> before sending")' >> - >> 1. Adds the query to the request list before ib_nl_snd_msg. >> 2. Removes ib_nl_send_msg from within the spinlock which also makes it >> possible to allocate memory with GFP_KERNEL. >> >> However, if there is a delay in sending out the request (For >> eg: Delay due to low memory situation) the timer to handle request timeout >> might kick in before the request is sent out to ibacm via netlink. >> ib_nl_request_timeout may release the query causing a use after free >> situation >> while accessing the query in ib_nl_send_msg. >> >> Call Trace for the above race: >> >> [] ? ib_pack+0x17b/0x240 [ib_core] >> [] ib_sa_path_rec_get+0x181/0x200 [ib_sa] >> [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] >> [] ? cma_bind_port+0xa0/0xa0 [rdma_cm] >> [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 >> [rds_rdma] >> [] rds_rdma_cm_event_handler_cmn+0x22c/0x850 >> [rds_rdma] >> [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] >> [] addr_handler+0x9e/0x140 [rdma_cm] >> [] process_req+0x134/0x190 [ib_addr] >> [] process_one_work+0x169/0x4a0 >> [] worker_thread+0x5b/0x560 >> [] ? flush_delayed_work+0x50/0x50 >> [] kthread+0xcb/0xf0 >> [] ? __schedule+0x24a/0x810 >> [] ? __schedule+0x24a/0x810 >> [] ? kthread_create_on_node+0x180/0x180 >> [] ret_from_fork+0x47/0x90 >> [] ? kthread_create_on_node+0x180/0x180 >> >> RIP [] send_mad+0x33d/0x5d0 [ib_sa] >> >> To resolve the above issue - >> 1. Add the req to the request list only after the request has been sent out. >> 2. To handle the race where response comes in before adding request to >> the request list, send(rdma_nl_multicast) and add to list while holding the >> spinlock - request_lock. >> 3. Use GFP_NOWAIT for rdma_nl_multicast since it is called while holding >> a spinlock. In case of memory allocation failure, request will go out to SA. >> >> Signed-off-by: Divya Indi >> Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list >> before sending") > Author SOB should be after "Fixes" line. My bad. Noted. > >> --- >> drivers/infiniband/core/sa_query.c | 34 +- >> 1 file changed, 17 insertions(+), 17 deletions(-) >> >> diff --git a/drivers/infiniband/core/sa_query.c >> b/drivers/infiniband/core/sa_query.c >> index 74e0058..042c99b 100644 >> --- a/drivers/infiniband/core/sa_query.c >> +++ b/drivers/infiniband/core/sa_query.c >> @@ -836,6 +836,9 @@ static int ib_nl_send_msg(struct ib_sa_query *query, >> gfp_t gfp_mask) >> void *data; >> struct ib_sa_mad *mad; >> int len; >> +unsigned long flags; >> +unsigned long delay; >> +int ret; >> >> mad = query->mad_buf->mad; >> len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask); >> @@ -860,35 +863,32 @@ static int ib_nl_send_msg(struct ib_sa_query *query, >> gfp_t gfp_mask) >> /* Repair the nlmsg header length */ >> nlmsg_end(skb, nlh); >> >> -return rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, gfp_mask); >> +spin_lock_irqsave(_nl_request_lock, flags); >> +ret = rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, GFP_NOWAIT); > It is hard to be convinced that this is correct solution. The mix of > gfp_flags and GFP_NOWAIT at the same time and usage of > ib_nl_request_lock to protect lists and suddenly rdma_nl_multicast() too > makes this code unreadable/non-maintainable. Prior to 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending"), we had ib_nl_send_msg under the spinlock ib_nl_request_lock. ie we had - 1. Get spinlock - ib_nl_request_lock 2. ib_nl_send_msg 2.a) rdma_nl_multicast 3. Add request to the req list 4. Arm the timer if needed. 5. Release spinlock However, ib_nl_send_msg involved a memory allocation using GFP_KERNEL. hence, was moved out of the spinlock. In addition, req was now being added prior to ib_nl_send_msg [To handle the race where response can come in before we get a chance to add the request back to the list]. This introduced another race resulting in use-after-free.[Described in the commit.] To resolve this, sending out the request and adding it to list need to happen while holding the request_lock. To ensure minimum allocations whil
Re: [PATCH] sample-trace-array: Fix sleeping function called from invalid context
Hi Kefeng, Thanks for catching this issue. Please find my comments line - On 6/8/20 12:54 AM, Kefeng Wang wrote: > BUG: sleeping function called from invalid context at > kernel/locking/mutex.c:935 > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/5 > 1 lock held by swapper/5/0: > #0: 80001002bd90 (samples/ftrace/sample-trace-array.c:38){+.-.}-{0:0}, > at: call_timer_fn+0x8/0x3e0 > CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.7.0+ #8 > Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 > Call trace: > dump_backtrace+0x0/0x1a0 > show_stack+0x20/0x30 > dump_stack+0xe4/0x150 > ___might_sleep+0x160/0x200 > __might_sleep+0x58/0x90 > __mutex_lock+0x64/0x948 > mutex_lock_nested+0x3c/0x58 > __ftrace_set_clr_event+0x44/0x88 > trace_array_set_clr_event+0x24/0x38 > mytimer_handler+0x34/0x40 [sample_trace_array] > > mutex_lock() will be called in interrupt context, using workqueueu to fix it. /s/workqueueu/workqueue Fixes: 89ed42495ef4 ("tracing: Sample module to demonstrate kernel access to Ftrace instances.") > Signed-off-by: Kefeng Wang > --- > samples/ftrace/sample-trace-array.c | 16 +++- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/samples/ftrace/sample-trace-array.c > b/samples/ftrace/sample-trace-array.c > index d523450d73eb..41684c7dbd7b 100644 > --- a/samples/ftrace/sample-trace-array.c > +++ b/samples/ftrace/sample-trace-array.c > @@ -20,6 +20,16 @@ struct trace_array *tr; > static void mytimer_handler(struct timer_list *unused); > static struct task_struct *simple_tsk; > > +static void trace_work_fn(struct work_struct *work) > +{ > + /* > + * Disable tracing for event "sample_event". > + */ > + trace_array_set_clr_event(tr, "sample-subsystem", "sample_event", > + false); > +} > +static DECLARE_WORK(trace_work, trace_work_fn); > + > /* > * mytimer: Timer setup to disable tracing for event "sample_event". This > * timer is only for the purposes of the sample module to demonstrate access > of > @@ -29,11 +39,7 @@ static DEFINE_TIMER(mytimer, mytimer_handler); > > static void mytimer_handler(struct timer_list *unused) > { > - /* > - * Disable tracing for event "sample_event". > - */ > - trace_array_set_clr_event(tr, "sample-subsystem", "sample_event", > - false); > + schedule_work(_work); > } > > static void simple_thread_func(int count) I think we also need to use cancel_work_sync() to handle the case - 1. Module unloaded 2. Timer already ran and scheduled work to disable trace event 3. When the work runs we no longer have the relevant trace array. static int simple_thread(void *arg) { . del_timer(); . return 0; } Thanks, Divya
Re: [PATCH] sample-trace-array: Fix sleeping function called from invalid context
Hi Steve, Sure, I am looking into it and reviewing the patch. Thanks, Divya On 6/8/20 7:37 AM, Steven Rostedt wrote: > On Mon, 8 Jun 2020 07:54:37 + > Kefeng Wang wrote: > >> BUG: sleeping function called from invalid context at >> kernel/locking/mutex.c:935 >> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/5 >> 1 lock held by swapper/5/0: >> #0: 80001002bd90 (samples/ftrace/sample-trace-array.c:38){+.-.}-{0:0}, >> at: call_timer_fn+0x8/0x3e0 >> CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.7.0+ #8 >> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 >> Call trace: >> dump_backtrace+0x0/0x1a0 >> show_stack+0x20/0x30 >> dump_stack+0xe4/0x150 >> ___might_sleep+0x160/0x200 >> __might_sleep+0x58/0x90 >> __mutex_lock+0x64/0x948 >> mutex_lock_nested+0x3c/0x58 >> __ftrace_set_clr_event+0x44/0x88 >> trace_array_set_clr_event+0x24/0x38 >> mytimer_handler+0x34/0x40 [sample_trace_array] >> >> mutex_lock() will be called in interrupt context, using workqueueu to fix it. >> >> Signed-off-by: Kefeng Wang >> > Divya, > > Can you give a Reviewed-by for this? > > -- Steve
[PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg
Commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending")' - 1. Adds the query to the request list before ib_nl_snd_msg. 2. Removes ib_nl_send_msg from within the spinlock which also makes it possible to allocate memory with GFP_KERNEL. However, if there is a delay in sending out the request (For eg: Delay due to low memory situation) the timer to handle request timeout might kick in before the request is sent out to ibacm via netlink. ib_nl_request_timeout may release the query causing a use after free situation while accessing the query in ib_nl_send_msg. Call Trace for the above race: [] ? ib_pack+0x17b/0x240 [ib_core] [] ib_sa_path_rec_get+0x181/0x200 [ib_sa] [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] [] ? cma_bind_port+0xa0/0xa0 [rdma_cm] [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 [rds_rdma] [] rds_rdma_cm_event_handler_cmn+0x22c/0x850 [rds_rdma] [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] [] addr_handler+0x9e/0x140 [rdma_cm] [] process_req+0x134/0x190 [ib_addr] [] process_one_work+0x169/0x4a0 [] worker_thread+0x5b/0x560 [] ? flush_delayed_work+0x50/0x50 [] kthread+0xcb/0xf0 [] ? __schedule+0x24a/0x810 [] ? __schedule+0x24a/0x810 [] ? kthread_create_on_node+0x180/0x180 [] ret_from_fork+0x47/0x90 [] ? kthread_create_on_node+0x180/0x180 RIP [] send_mad+0x33d/0x5d0 [ib_sa] To resolve the above issue - 1. Add the req to the request list only after the request has been sent out. 2. To handle the race where response comes in before adding request to the request list, send(rdma_nl_multicast) and add to list while holding the spinlock - request_lock. 3. Use GFP_NOWAIT for rdma_nl_multicast since it is called while holding a spinlock. In case of memory allocation failure, request will go out to SA. Signed-off-by: Divya Indi Fixes: 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending") --- drivers/infiniband/core/sa_query.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 74e0058..042c99b 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -836,6 +836,9 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask) void *data; struct ib_sa_mad *mad; int len; + unsigned long flags; + unsigned long delay; + int ret; mad = query->mad_buf->mad; len = ib_nl_get_path_rec_attrs_len(mad->sa_hdr.comp_mask); @@ -860,35 +863,32 @@ static int ib_nl_send_msg(struct ib_sa_query *query, gfp_t gfp_mask) /* Repair the nlmsg header length */ nlmsg_end(skb, nlh); - return rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, gfp_mask); + spin_lock_irqsave(_nl_request_lock, flags); + ret = rdma_nl_multicast(_net, skb, RDMA_NL_GROUP_LS, GFP_NOWAIT); + if (!ret) { + /* Put the request on the list.*/ + delay = msecs_to_jiffies(sa_local_svc_timeout_ms); + query->timeout = delay + jiffies; + list_add_tail(>list, _nl_request_list); + /* Start the timeout if this is the only request */ + if (ib_nl_request_list.next == >list) + queue_delayed_work(ib_nl_wq, _nl_timed_work, delay); + } + spin_unlock_irqrestore(_nl_request_lock, flags); + + return ret; } static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask) { - unsigned long flags; - unsigned long delay; int ret; INIT_LIST_HEAD(>list); query->seq = (u32)atomic_inc_return(_nl_sa_request_seq); - /* Put the request on the list first.*/ - spin_lock_irqsave(_nl_request_lock, flags); - delay = msecs_to_jiffies(sa_local_svc_timeout_ms); - query->timeout = delay + jiffies; - list_add_tail(>list, _nl_request_list); - /* Start the timeout if this is the only request */ - if (ib_nl_request_list.next == >list) - queue_delayed_work(ib_nl_wq, _nl_timed_work, delay); - spin_unlock_irqrestore(_nl_request_lock, flags); - ret = ib_nl_send_msg(query, gfp_mask); if (ret) { ret = -EIO; - /* Remove the request */ - spin_lock_irqsave(_nl_request_lock, flags); - list_del(>list); - spin_unlock_irqrestore(_nl_request_lock, flags); } return ret; -- 1.8.3.1
Review Request
[PATCH v3] IB/sa: Resolving use-after-free in ib_nl_send_msg Hi, Please review the patch that follows. v3 addresses the previously raised concerns. Changes include - 1. To resolve the race where the timer can kick in before request has been sent out, we now add the request to the list after sending out the request. 2. To handle the race where the response can come in before we got a chance to add the req to the list, sending and adding the request to request list is done under spinlock - request_lock. 3. To make sure there is no blocking op/delay while holding the spinlock, using GFP_NOWAIT for memory allocation. Thanks Jason for providing your valuable feedback. Let me know if you have any suggestions or concerns. Thanks, Divya
Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.
Hi Jason, I wanted to follow up to see if you got a chance to review the following reply? Let me know if it addresses your concern and if you have any questions! Thanks, Divya On 5/13/20 2:02 PM, Divya Indi wrote: > Hi Jason, > > Please find my comments inline - > > On 5/13/20 8:00 AM, Jason Gunthorpe wrote: >> On Mon, May 11, 2020 at 02:26:30PM -0700, Divya Indi wrote: >>>>> @@ -1123,6 +1156,18 @@ int ib_nl_handle_resolve_resp(struct sk_buff *skb, >>>>> >>>>> send_buf = query->mad_buf; >>>>> >>>>> + /* >>>>> + * Make sure the IB_SA_NL_QUERY_SENT flag is set before >>>>> + * processing this query. If flag is not set, query can be accessed in >>>>> + * another context while setting the flag and processing the query will >>>>> + * eventually release it causing a possible use-after-free. >>>>> + */ >>>> This comment doesn't really make sense, flags insige the memory being >>>> freed inherently can't prevent use after free. >>> I can definitely re-phrase here to make things clearer. But, the idea here >>> is >>> in the unlikely/rare case where a response for a query comes in before the >>> flag has been >>> set in ib_nl_make_request, we want to wait for the flag to be sent before >>> proceeding. >>> The response handler will eventually release the query so this wait avoids >>> that if the flag has not been set >>> else >>> "query->flags |= IB_SA_NL_QUERY_SENT;" >>> will be accessing a query which was freed due to the above mentioned race. >>> >>> It is unlikely since getting a response => We have actually sent out the >>> query to ibacm. >>> >>> How about this - >>> >>> "Getting a response is indicative of having sent out the query, but in an >>> unlikely race when >>> the response comes in before setting IB_SA_NL_QUERY_SENT, we need to wait >>> till the flag is set to >>> avoid accessing a query that has been released." >> It still makes no sense, a flag that is set before freeing the memory >> is fundamentally useless to prevent races. > Here the race is - > > 1. ib_nl_send_msg() > 2. query->flags |= IB_SA_NL_QUERY_SENT > 3. return; > > - > > response_handler() { > wait till flag is set. > > kfree(query); > > } > > Here, if the response handler was called => Query was sent > and flag should have been set. However if response handler kicks in > before line 2, we want to wait and make sure the flag is set and > then free the query. > > Ideally after ib_nl_send_msg, we should not be accessing the query > because we can be getting a response/timeout asynchronously and cant be > sure when. > > The only places we access a query after it is successfully sent [response > handler getting called > => sending was successful] - > 1. ib_nl_request_timeout > 2. While setting the flag. > > 1. is taken care of because the request list access is protected by a lock > and whoever gets the lock first deletes it from the request list and > hence we can only have the response handler OR the timeout handler operate on > the > query. > > 2. To handle this is why we wait in the response handler. Once the flag is > set we are no longer accessing the query and hence safe to release it. > > I have modified the comment for v2 as follows - > > /* > +* In case of a quick response ie when a response comes in before > +* setting IB_SA_NL_QUERY_SENT, we can have an unlikely race where the > +* response handler will release the query, but we can still access > the > +* freed query while setting the flag. > +* Hence, before proceeding and eventually releasing the query - > +* wait till the flag is set. The flag should be set soon since > getting > +* a response is indicative of having successfully sent the query. > +*/ > > > Thanks, > Divya > >> Jason
Re: [PATCH] IB/sa: Fix use-after-free in ib_nl_send_msg()
Hi Markus, Thanks for taking the time to review. On 5/15/20 9:25 AM, Markus Elfring wrote: >> This patch fixes commit - >> commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list >> before sending")' >> >> Above commit adds the query to the request list before ib_nl_snd_msg. > I suggest to improve also this change description. > > The query to the request list was added before ib_nl_snd_msg() > by the commit 3ebd2fd0d011 ("…"). Noted, will make this change. > >> This flag Indicates … > … indicates … Noted. > >> To handle the case where a response is received before we could set this >> flag, the response handler waits for the flag to be >> set before proceeding with the query. > Please reconsider the word wrapping. > > To handle the case where a response is received before we could set > this flag, the response handler waits for the flag to be set > before proceeding with the query. > > > Would you like to add the tag “Fixes” to the commit message? Noted! Thanks, Divya > > Regards, > Markus
[PATCH] IB/sa: Resolving use-after-free in ib_nl_send_msg.
This patch fixes commit - commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending")' Above commit adds the query to the request list before ib_nl_snd_msg. However, if there is a delay in sending out the request (For eg: Delay due to low memory situation) the timer to handle request timeout might kick in before the request is sent out to ibacm via netlink. ib_nl_request_timeout may release the query if it fails to send it to SA as well causing a use after free situation. Call Trace for the above race: [] ? ib_pack+0x17b/0x240 [ib_core] [] ib_sa_path_rec_get+0x181/0x200 [ib_sa] [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] [] ? cma_bind_port+0xa0/0xa0 [rdma_cm] [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 [rds_rdma] [] rds_rdma_cm_event_handler_cmn+0x22c/0x850 [rds_rdma] [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] [] addr_handler+0x9e/0x140 [rdma_cm] [] process_req+0x134/0x190 [ib_addr] [] process_one_work+0x169/0x4a0 [] worker_thread+0x5b/0x560 [] ? flush_delayed_work+0x50/0x50 [] kthread+0xcb/0xf0 [] ? __schedule+0x24a/0x810 [] ? __schedule+0x24a/0x810 [] ? kthread_create_on_node+0x180/0x180 [] ret_from_fork+0x47/0x90 [] ? kthread_create_on_node+0x180/0x180 RIP [] send_mad+0x33d/0x5d0 [ib_sa] To resolve this issue, we introduce a new flag IB_SA_NL_QUERY_SENT. This flag Indicates if the request has been sent out to ibacm yet. If this flag is not set for a query and the query times out, we add it back to the list with the original delay. To handle the case where a response is received before we could set this flag, the response handler waits for the flag to be set before proceeding with the query. Signed-off-by: Divya Indi --- drivers/infiniband/core/sa_query.c | 53 ++ 1 file changed, 53 insertions(+) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 30d4c12..91486e7 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include @@ -59,6 +60,9 @@ #define IB_SA_LOCAL_SVC_TIMEOUT_MAX20 #define IB_SA_CPI_MAX_RETRY_CNT3 #define IB_SA_CPI_RETRY_WAIT 1000 /*msecs */ + +DECLARE_WAIT_QUEUE_HEAD(wait_queue); + static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT; struct ib_sa_sm_ah { @@ -123,6 +127,12 @@ struct ib_sa_query { #define IB_SA_CANCEL 0x0002 #define IB_SA_QUERY_OPA0x0004 +/* + * This bit in the SA query flags indicates whether the query + * has been sent out to ibacm via netlink + */ +#define IB_SA_NL_QUERY_SENT3 + struct ib_sa_service_query { void (*callback)(int, struct ib_sa_service_rec *, void *); void *context; @@ -746,6 +756,11 @@ static inline int ib_sa_query_cancelled(struct ib_sa_query *query) return (query->flags & IB_SA_CANCEL); } +static inline int ib_sa_nl_query_sent(struct ib_sa_query *query) +{ + return test_bit(IB_SA_NL_QUERY_SENT, (unsigned long *)>flags); +} + static void ib_nl_set_path_rec_attrs(struct sk_buff *skb, struct ib_sa_query *query) { @@ -889,6 +904,15 @@ static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask) spin_lock_irqsave(_nl_request_lock, flags); list_del(>list); spin_unlock_irqrestore(_nl_request_lock, flags); + } else { + set_bit(IB_SA_NL_QUERY_SENT, (unsigned long *)>flags); + + /* +* If response is received before this flag was set +* someone is waiting to process the response and release the +* query. +*/ + wake_up(_queue); } return ret; @@ -994,6 +1018,21 @@ static void ib_nl_request_timeout(struct work_struct *work) } list_del(>list); + + /* +* If IB_SA_NL_QUERY_SENT is not set, this query has not been +* sent to ibacm yet. Reset the timer. +*/ + if (!ib_sa_nl_query_sent(query)) { + delay = msecs_to_jiffies(sa_local_svc_timeout_ms); + query->timeout = delay + jiffies; + list_add_tail(>list, _nl_request_list); + /* Start the timeout if this is the only request */ + if (ib_nl_request_list.next == >list) + queue_delayed_work(ib_nl_wq, _nl_timed_work, + delay); + break; + } ib_sa_disable_local_svc(query); /* Hold the lock to protect against query cancellation */ if (ib_sa_query_cancel
Review Request
[PATCH v2] IB/sa: Resolving use-after-free in ib_nl_send_msg. Hi, This is the v2 of the patch that addresses the comments received for v1 - - Using atomic bit ops for setting and testing IB_SA_NL_QUERY_SENT. - Rewording and adding comments. Thanks, Divya
Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.
Hi Jason, Please find my comments inline - On 5/13/20 8:00 AM, Jason Gunthorpe wrote: > On Mon, May 11, 2020 at 02:26:30PM -0700, Divya Indi wrote: >>>> @@ -1123,6 +1156,18 @@ int ib_nl_handle_resolve_resp(struct sk_buff *skb, >>>> >>>>send_buf = query->mad_buf; >>>> >>>> + /* >>>> + * Make sure the IB_SA_NL_QUERY_SENT flag is set before >>>> + * processing this query. If flag is not set, query can be accessed in >>>> + * another context while setting the flag and processing the query will >>>> + * eventually release it causing a possible use-after-free. >>>> + */ >>> This comment doesn't really make sense, flags insige the memory being >>> freed inherently can't prevent use after free. >> I can definitely re-phrase here to make things clearer. But, the idea here is >> in the unlikely/rare case where a response for a query comes in before the >> flag has been >> set in ib_nl_make_request, we want to wait for the flag to be sent before >> proceeding. >> The response handler will eventually release the query so this wait avoids >> that if the flag has not been set >> else >> "query->flags |= IB_SA_NL_QUERY_SENT;" >> will be accessing a query which was freed due to the above mentioned race. >> >> It is unlikely since getting a response => We have actually sent out the >> query to ibacm. >> >> How about this - >> >> "Getting a response is indicative of having sent out the query, but in an >> unlikely race when >> the response comes in before setting IB_SA_NL_QUERY_SENT, we need to wait >> till the flag is set to >> avoid accessing a query that has been released." > It still makes no sense, a flag that is set before freeing the memory > is fundamentally useless to prevent races. Here the race is - 1. ib_nl_send_msg() 2. query->flags |= IB_SA_NL_QUERY_SENT 3. return; - response_handler() { wait till flag is set. kfree(query); } Here, if the response handler was called => Query was sent and flag should have been set. However if response handler kicks in before line 2, we want to wait and make sure the flag is set and then free the query. Ideally after ib_nl_send_msg, we should not be accessing the query because we can be getting a response/timeout asynchronously and cant be sure when. The only places we access a query after it is successfully sent [response handler getting called => sending was successful] - 1. ib_nl_request_timeout 2. While setting the flag. 1. is taken care of because the request list access is protected by a lock and whoever gets the lock first deletes it from the request list and hence we can only have the response handler OR the timeout handler operate on the query. 2. To handle this is why we wait in the response handler. Once the flag is set we are no longer accessing the query and hence safe to release it. I have modified the comment for v2 as follows - /* +* In case of a quick response ie when a response comes in before +* setting IB_SA_NL_QUERY_SENT, we can have an unlikely race where the +* response handler will release the query, but we can still access the +* freed query while setting the flag. +* Hence, before proceeding and eventually releasing the query - +* wait till the flag is set. The flag should be set soon since getting +* a response is indicative of having successfully sent the query. +*/ Thanks, Divya > > Jason
Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.
Hi Hillf, Please find my comments inline - On 5/8/20 4:03 AM, Hillf Danton wrote: > On Thu, 7 May 2020 12:36:29 Mark Bloch wrote: >> On 5/7/2020 11:34, Divya Indi wrote: >>> This patch fixes commit - >>> commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list >>> before sending")' >>> >>> Above commit adds the query to the request list before ib_nl_snd_msg. >>> >>> However, if there is a delay in sending out the request (For >>> eg: Delay due to low memory situation) the timer to handle request timeout >>> might kick in before the request is sent out to ibacm via netlink. >>> ib_nl_request_timeout may release the query if it fails to send it to SA >>> as well causing a use after free situation. >>> >>> Call Trace for the above race: >>> >>> [] ? ib_pack+0x17b/0x240 [ib_core] >>> [] ib_sa_path_rec_get+0x181/0x200 [ib_sa] >>> [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] >>> [] ? cma_bind_port+0xa0/0xa0 [rdma_cm] >>> [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 >>> [rds_rdma] >>> [] rds_rdma_cm_event_handler_cmn+0x22c/0x850 >>> [rds_rdma] >>> [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] >>> [] addr_handler+0x9e/0x140 [rdma_cm] >>> [] process_req+0x134/0x190 [ib_addr] >>> [] process_one_work+0x169/0x4a0 >>> [] worker_thread+0x5b/0x560 >>> [] ? flush_delayed_work+0x50/0x50 >>> [] kthread+0xcb/0xf0 >>> [] ? __schedule+0x24a/0x810 >>> [] ? __schedule+0x24a/0x810 >>> [] ? kthread_create_on_node+0x180/0x180 >>> [] ret_from_fork+0x47/0x90 >>> [] ? kthread_create_on_node+0x180/0x180 >>> >>> RIP [] send_mad+0x33d/0x5d0 [ib_sa] >>> >>> To resolve this issue, we introduce a new flag IB_SA_NL_QUERY_SENT. >>> This flag Indicates if the request has been sent out to ibacm yet. >>> >>> If this flag is not set for a query and the query times out, we add it >>> back to the list with the original delay. >>> >>> To handle the case where a response is received before we could set this >>> flag, the response handler waits for the flag to be >>> set before proceeding with the query. >>> >>> Signed-off-by: Divya Indi >>> --- >>> drivers/infiniband/core/sa_query.c | 45 >>> ++ >>> 1 file changed, 45 insertions(+) >>> >>> diff --git a/drivers/infiniband/core/sa_query.c >>> b/drivers/infiniband/core/sa_query.c >>> index 30d4c12..ffbae2f 100644 >>> --- a/drivers/infiniband/core/sa_query.c >>> +++ b/drivers/infiniband/core/sa_query.c >>> @@ -59,6 +59,9 @@ >>> #define IB_SA_LOCAL_SVC_TIMEOUT_MAX20 >>> #define IB_SA_CPI_MAX_RETRY_CNT3 >>> #define IB_SA_CPI_RETRY_WAIT 1000 /*msecs */ >>> + >>> +DECLARE_WAIT_QUEUE_HEAD(wait_queue); >>> + >>> static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT; >>> >>> struct ib_sa_sm_ah { >>> @@ -122,6 +125,7 @@ struct ib_sa_query { >>> #define IB_SA_ENABLE_LOCAL_SERVICE 0x0001 >>> #define IB_SA_CANCEL 0x0002 >>> #define IB_SA_QUERY_OPA0x0004 >>> +#define IB_SA_NL_QUERY_SENT0x0008 >>> >>> struct ib_sa_service_query { >>> void (*callback)(int, struct ib_sa_service_rec *, void *); >>> @@ -746,6 +750,11 @@ static inline int ib_sa_query_cancelled(struct >>> ib_sa_query *query) >>> return (query->flags & IB_SA_CANCEL); >>> } >>> >>> +static inline int ib_sa_nl_query_sent(struct ib_sa_query *query) >>> +{ >>> + return (query->flags & IB_SA_NL_QUERY_SENT); >>> +} >>> + >>> static void ib_nl_set_path_rec_attrs(struct sk_buff *skb, >>> struct ib_sa_query *query) >>> { >>> @@ -889,6 +898,15 @@ static int ib_nl_make_request(struct ib_sa_query >>> *query, gfp_t gfp_mask) >>> spin_lock_irqsave(_nl_request_lock, flags); >>> list_del(>list); >>> spin_unlock_irqrestore(_nl_request_lock, flags); > Here it also needs to do wakeup as sleeper might come while sending, no > matter the failure to send. We only sleep in the response handler. If we could not send a query, then response handler is not triggered and hence no one would be waiting
Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.
Hi Jason, On 5/7/20 5:08 PM, Jason Gunthorpe wrote: > On Thu, May 07, 2020 at 11:34:47AM -0700, Divya Indi wrote: >> This patch fixes commit - >> commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list >> before sending")' >> >> Above commit adds the query to the request list before ib_nl_snd_msg. >> >> However, if there is a delay in sending out the request (For >> eg: Delay due to low memory situation) the timer to handle request timeout >> might kick in before the request is sent out to ibacm via netlink. >> ib_nl_request_timeout may release the query if it fails to send it to SA >> as well causing a use after free situation. >> >> Call Trace for the above race: >> >> [] ? ib_pack+0x17b/0x240 [ib_core] >> [] ib_sa_path_rec_get+0x181/0x200 [ib_sa] >> [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] >> [] ? cma_bind_port+0xa0/0xa0 [rdma_cm] >> [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 >> [rds_rdma] >> [] rds_rdma_cm_event_handler_cmn+0x22c/0x850 >> [rds_rdma] >> [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] >> [] addr_handler+0x9e/0x140 [rdma_cm] >> [] process_req+0x134/0x190 [ib_addr] >> [] process_one_work+0x169/0x4a0 >> [] worker_thread+0x5b/0x560 >> [] ? flush_delayed_work+0x50/0x50 >> [] kthread+0xcb/0xf0 >> [] ? __schedule+0x24a/0x810 >> [] ? __schedule+0x24a/0x810 >> [] ? kthread_create_on_node+0x180/0x180 >> [] ret_from_fork+0x47/0x90 >> [] ? kthread_create_on_node+0x180/0x180 >> >> RIP [] send_mad+0x33d/0x5d0 [ib_sa] >> >> To resolve this issue, we introduce a new flag IB_SA_NL_QUERY_SENT. >> This flag Indicates if the request has been sent out to ibacm yet. >> >> If this flag is not set for a query and the query times out, we add it >> back to the list with the original delay. >> >> To handle the case where a response is received before we could set this >> flag, the response handler waits for the flag to be >> set before proceeding with the query. >> >> Signed-off-by: Divya Indi >> drivers/infiniband/core/sa_query.c | 45 >> ++ >> 1 file changed, 45 insertions(+) >> >> diff --git a/drivers/infiniband/core/sa_query.c >> b/drivers/infiniband/core/sa_query.c >> index 30d4c12..ffbae2f 100644 >> +++ b/drivers/infiniband/core/sa_query.c >> @@ -59,6 +59,9 @@ >> #define IB_SA_LOCAL_SVC_TIMEOUT_MAX 20 >> #define IB_SA_CPI_MAX_RETRY_CNT 3 >> #define IB_SA_CPI_RETRY_WAIT1000 /*msecs */ >> + >> +DECLARE_WAIT_QUEUE_HEAD(wait_queue); >> + >> static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT; >> >> struct ib_sa_sm_ah { >> @@ -122,6 +125,7 @@ struct ib_sa_query { >> #define IB_SA_ENABLE_LOCAL_SERVICE 0x0001 >> #define IB_SA_CANCEL0x0002 >> #define IB_SA_QUERY_OPA 0x0004 >> +#define IB_SA_NL_QUERY_SENT 0x0008 >> >> struct ib_sa_service_query { >> void (*callback)(int, struct ib_sa_service_rec *, void *); >> @@ -746,6 +750,11 @@ static inline int ib_sa_query_cancelled(struct >> ib_sa_query *query) >> return (query->flags & IB_SA_CANCEL); >> } >> >> +static inline int ib_sa_nl_query_sent(struct ib_sa_query *query) >> +{ >> +return (query->flags & IB_SA_NL_QUERY_SENT); >> +} >> + >> static void ib_nl_set_path_rec_attrs(struct sk_buff *skb, >> struct ib_sa_query *query) >> { >> @@ -889,6 +898,15 @@ static int ib_nl_make_request(struct ib_sa_query >> *query, gfp_t gfp_mask) >> spin_lock_irqsave(_nl_request_lock, flags); >> list_del(>list); >> spin_unlock_irqrestore(_nl_request_lock, flags); >> +} else { >> +query->flags |= IB_SA_NL_QUERY_SENT; >> + >> +/* >> + * If response is received before this flag was set >> + * someone is waiting to process the response and release the >> + * query. >> + */ >> +wake_up(_queue); >> } >> >> return ret; >> @@ -994,6 +1012,21 @@ static void ib_nl_request_timeout(struct work_struct >> *work) >> } >> >> list_del(>list); >> + >> +/* >> + * If IB_SA_NL_QUERY_SENT is not set, this query has not been >> + * s
Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.
Hi Mark, Please find my comments inline - On 5/7/20 2:40 PM, Mark Bloch wrote: > > On 5/7/2020 13:16, Wan, Kaike wrote: >> >>> -Original Message- >>> From: Mark Bloch >>> Sent: Thursday, May 07, 2020 3:36 PM >>> To: Divya Indi ; linux-kernel@vger.kernel.org; linux- >>> r...@vger.kernel.org; Jason Gunthorpe ; Wan, Kaike >>> >>> Cc: Gerd Rausch ; Håkon Bugge >>> ; Srinivas Eeda ; >>> Rama Nichanamatlu ; Doug Ledford >>> >>> Subject: Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg. >>> >>> >>>> @@ -1123,6 +1156,18 @@ int ib_nl_handle_resolve_resp(struct sk_buff >>>> *skb, >>>> >>>>send_buf = query->mad_buf; >>>> >>>> + /* >>>> + * Make sure the IB_SA_NL_QUERY_SENT flag is set before >>>> + * processing this query. If flag is not set, query can be accessed in >>>> + * another context while setting the flag and processing the query >>> will >>>> + * eventually release it causing a possible use-after-free. >>>> + */ >>>> + if (unlikely(!ib_sa_nl_query_sent(query))) { >>> Can't there be a race here where you check the flag (it isn't set) and >>> before >>> you call wait_event() the flag is set and wake_up() is called which means >>> you >>> will wait here forever? >> Should wait_event() catch that? That is, if the flag is not set, >> wait_event() will sleep until the flag is set. >> >> or worse, a timeout will happen the query will be >>> freed and them some other query will call wake_up() and we have again a >>> use-after-free. >> The request has been deleted from the request list by this time and >> therefore the timeout should have no impact here. >> >> >>>> + spin_unlock_irqrestore(_nl_request_lock, flags); >>>> + wait_event(wait_queue, ib_sa_nl_query_sent(query)); >>> What if there are two queries sent to userspace, shouldn't you check and >>> make sure you got woken up by the right one setting the flag? >> The wait_event() is conditioned on the specific query >> (ib_sa_nl_query_sent(query)), not on the wait_queue itself. > Right, missed that this macro is expends into some inline code. > > Looking at the code a little more, I think this also fixes another issue. > Lets say ib_nl_send_msg() returns an error but before we do the list_del() in > ib_nl_make_request() there is also a timeout, so in ib_nl_request_timeout() > we will do list_del() and then another one list_del() will be done in > ib_nl_make_request(). > >>> Other than that, the entire solution makes it very complicated to reason >>> with >>> (flags set/checked without locking etc) maybe we should just revert and fix >>> it >>> the other way? >> The flag could certainly be set under the lock, which may reduce >> complications. > Anything that can help here with this. > For me in ib_nl_make_request() the comment should also explain that not only > ib_nl_handle_resolve_resp() > is waiting for the flag to be set but also ib_nl_request_timeout() and that a > timeout can't happen > before the flag is set. ib_nl_request_timeout() would re-queue the query to the request list if the flag is not set. However, makes sense! Noted, il add the comment in ib_nl_make_request to make things more clear. Thanks, Divya > Mark > >> Kaike >> i
Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.
Hi, Thanks for taking the time to review. Please find my comments inline - On 5/7/20 1:16 PM, Wan, Kaike wrote: > >> -Original Message- >> From: Mark Bloch >> Sent: Thursday, May 07, 2020 3:36 PM >> To: Divya Indi ; linux-kernel@vger.kernel.org; linux- >> r...@vger.kernel.org; Jason Gunthorpe ; Wan, Kaike >> >> Cc: Gerd Rausch ; Håkon Bugge >> ; Srinivas Eeda ; >> Rama Nichanamatlu ; Doug Ledford >> >> Subject: Re: [PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg. >> >> >>> @@ -1123,6 +1156,18 @@ int ib_nl_handle_resolve_resp(struct sk_buff >>> *skb, >>> >>> send_buf = query->mad_buf; >>> >>> + /* >>> +* Make sure the IB_SA_NL_QUERY_SENT flag is set before >>> +* processing this query. If flag is not set, query can be accessed in >>> +* another context while setting the flag and processing the query >> will >>> +* eventually release it causing a possible use-after-free. >>> +*/ >>> + if (unlikely(!ib_sa_nl_query_sent(query))) { >> Can't there be a race here where you check the flag (it isn't set) and before >> you call wait_event() the flag is set and wake_up() is called which means you >> will wait here forever? > Should wait_event() catch that? That is, if the flag is not set, > wait_event() will sleep until the flag is set. > > or worse, a timeout will happen the query will be >> freed and them some other query will call wake_up() and we have again a >> use-after-free. > The request has been deleted from the request list by this time and therefore > the timeout should have no impact here. > > >>> + spin_unlock_irqrestore(_nl_request_lock, flags); >>> + wait_event(wait_queue, ib_sa_nl_query_sent(query)); >> What if there are two queries sent to userspace, shouldn't you check and >> make sure you got woken up by the right one setting the flag? > The wait_event() is conditioned on the specific query > (ib_sa_nl_query_sent(query)), not on the wait_queue itself. > >> Other than that, the entire solution makes it very complicated to reason with >> (flags set/checked without locking etc) maybe we should just revert and fix >> it >> the other way? > The flag could certainly be set under the lock, which may reduce > complications. We could use a lock or use atomic operations. However, the reason for not doing so was that we have 1 writer and multiple readers of the IB_SA_NL_QUERY_SENT flag and the readers wouldnt mind reading a stale value. Would it still be better to have a lock for this flag? Thanks, Divya > > Kaike >
[PATCH 1/2] IB/sa: Resolving use-after-free in ib_nl_send_msg.
This patch fixes commit - commit 3ebd2fd0d011 ("IB/sa: Put netlink request into the request list before sending")' Above commit adds the query to the request list before ib_nl_snd_msg. However, if there is a delay in sending out the request (For eg: Delay due to low memory situation) the timer to handle request timeout might kick in before the request is sent out to ibacm via netlink. ib_nl_request_timeout may release the query if it fails to send it to SA as well causing a use after free situation. Call Trace for the above race: [] ? ib_pack+0x17b/0x240 [ib_core] [] ib_sa_path_rec_get+0x181/0x200 [ib_sa] [] rdma_resolve_route+0x3c0/0x8d0 [rdma_cm] [] ? cma_bind_port+0xa0/0xa0 [rdma_cm] [] ? rds_rdma_cm_event_handler_cmn+0x850/0x850 [rds_rdma] [] rds_rdma_cm_event_handler_cmn+0x22c/0x850 [rds_rdma] [] rds_rdma_cm_event_handler+0x10/0x20 [rds_rdma] [] addr_handler+0x9e/0x140 [rdma_cm] [] process_req+0x134/0x190 [ib_addr] [] process_one_work+0x169/0x4a0 [] worker_thread+0x5b/0x560 [] ? flush_delayed_work+0x50/0x50 [] kthread+0xcb/0xf0 [] ? __schedule+0x24a/0x810 [] ? __schedule+0x24a/0x810 [] ? kthread_create_on_node+0x180/0x180 [] ret_from_fork+0x47/0x90 [] ? kthread_create_on_node+0x180/0x180 RIP [] send_mad+0x33d/0x5d0 [ib_sa] To resolve this issue, we introduce a new flag IB_SA_NL_QUERY_SENT. This flag Indicates if the request has been sent out to ibacm yet. If this flag is not set for a query and the query times out, we add it back to the list with the original delay. To handle the case where a response is received before we could set this flag, the response handler waits for the flag to be set before proceeding with the query. Signed-off-by: Divya Indi --- drivers/infiniband/core/sa_query.c | 45 ++ 1 file changed, 45 insertions(+) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 30d4c12..ffbae2f 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -59,6 +59,9 @@ #define IB_SA_LOCAL_SVC_TIMEOUT_MAX20 #define IB_SA_CPI_MAX_RETRY_CNT3 #define IB_SA_CPI_RETRY_WAIT 1000 /*msecs */ + +DECLARE_WAIT_QUEUE_HEAD(wait_queue); + static int sa_local_svc_timeout_ms = IB_SA_LOCAL_SVC_TIMEOUT_DEFAULT; struct ib_sa_sm_ah { @@ -122,6 +125,7 @@ struct ib_sa_query { #define IB_SA_ENABLE_LOCAL_SERVICE 0x0001 #define IB_SA_CANCEL 0x0002 #define IB_SA_QUERY_OPA0x0004 +#define IB_SA_NL_QUERY_SENT0x0008 struct ib_sa_service_query { void (*callback)(int, struct ib_sa_service_rec *, void *); @@ -746,6 +750,11 @@ static inline int ib_sa_query_cancelled(struct ib_sa_query *query) return (query->flags & IB_SA_CANCEL); } +static inline int ib_sa_nl_query_sent(struct ib_sa_query *query) +{ + return (query->flags & IB_SA_NL_QUERY_SENT); +} + static void ib_nl_set_path_rec_attrs(struct sk_buff *skb, struct ib_sa_query *query) { @@ -889,6 +898,15 @@ static int ib_nl_make_request(struct ib_sa_query *query, gfp_t gfp_mask) spin_lock_irqsave(_nl_request_lock, flags); list_del(>list); spin_unlock_irqrestore(_nl_request_lock, flags); + } else { + query->flags |= IB_SA_NL_QUERY_SENT; + + /* +* If response is received before this flag was set +* someone is waiting to process the response and release the +* query. +*/ + wake_up(_queue); } return ret; @@ -994,6 +1012,21 @@ static void ib_nl_request_timeout(struct work_struct *work) } list_del(>list); + + /* +* If IB_SA_NL_QUERY_SENT is not set, this query has not been +* sent to ibacm yet. Reset the timer. +*/ + if (!ib_sa_nl_query_sent(query)) { + delay = msecs_to_jiffies(sa_local_svc_timeout_ms); + query->timeout = delay + jiffies; + list_add_tail(>list, _nl_request_list); + /* Start the timeout if this is the only request */ + if (ib_nl_request_list.next == >list) + queue_delayed_work(ib_nl_wq, _nl_timed_work, + delay); + break; + } ib_sa_disable_local_svc(query); /* Hold the lock to protect against query cancellation */ if (ib_sa_query_cancelled(query)) @@ -1123,6 +1156,18 @@ int ib_nl_handle_resolve_resp(struct sk_buff *skb, send_buf = query->mad_buf; + /* +* Make sure the IB_SA_NL_QUERY_SENT flag is set before +* process
Resolving use-after-free in ib_nl_send_msg
[PATCH] IB/sa: Resolving use-after-free in ib_nl_send_msg. Hi, This patch is in reply to - https://lkml.org/lkml/2020/4/24/1076 We have a use-after-free possibility in the ibacm code path - when the timer(ib_nl_request_timeout) kicks in before ib_nl_snd_msg has completed sending the query out to ibacm via netlink. The timeout handler ie ib_nl_request_timeout may result in releasing the query while ib_nl_snd_msg is still accessing query. Since the issue appears to be specific to the ibacm code path, we are trying to resolve it for the life cycle of sa_query in the ibacm code path. Please review the proposed fix ie the patch that follows. Would appreciate your thoughts and feedback on the same. Let me know if you have any questions! Thanks, Divya
Re: Request for feedback : Possible use-after-free in routing SA query via netlink
Hi Jason, Thanks for taking the time to review. On 4/24/20 11:22 AM, Jason Gunthorpe wrote: > On Fri, Apr 24, 2020 at 08:28:09AM -0700, Divya Indi wrote: > >> If we look at the query, it does not appear to be a valid ib_sa_query. >> Instead >> looks like a pid struct for a process -> Use-after-free situation. >> >> We could simulate the crash by explicitly introducing a delay in >> ib_nl_snd_msg with >> a sleep. The timer kicks in before ib_nl_send_msg has even sent out the >> request >> and releases the query. We could reproduce the crash with a similar stack >> trace. >> >> To summarize - We have a use-after-free possibility here when the >> timer(ib_nl_request_timeout) >> kicks in before ib_nl_snd_msg has completed sending the query out to ibacm >> via netlink. The >> timeout handler ie ib_nl_request_timeout may result in releasing the query >> while ib_nl_snd_msg >> is still accessing query. >> >> Appreciate your thoughts on the above issue. > Your assesment looks right to me. > > Fixing it will require being able to clearly explain what the lifetime > cycle is for ib_sa_query - and what is there today looks like a mess, > so I'm not sure what it should be changed into. The issue reported here appears to be restricted to the ibacm code path. Hence, we tried to resolve it for the life cycle of sa_query in the ibacm code path. I will follow up this email with a proposed fix(patch), it would be very helpful if you can provide your feedback on the same. Thanks, Divya > > The > re is lots of other stuff there that looks really weird, like > ib_sa_cancel_query() keeps going even though there are still timers > running.. > > Jason
Re: [PATCH 5/5] tracing: New functions for kernel access to Ftrace instances
Hi Steven, Please find my comments inline - On 10/15/19 4:19 PM, Steven Rostedt wrote: On Wed, 14 Aug 2019 10:55:27 -0700 Divya Indi wrote: Adding 2 new functions - 1) trace_array_lookup : Look up and return a trace array, given its name. 2) trace_array_set_clr_event : Enable/disable event recording to the given trace array. NOTE: trace_array_lookup returns a trace array and also increments the reference counter associated with the returned trace array. Make sure to call trace_array_put() once the use is done so that the instance can be removed at a later time. Example use: tr = trace_array_lookup("foo-bar"); Should probably be renamed to: trace_array_get_by_name("foo-bar") As the name should show that it adds a ref count. Makes sense! Will make this change. But if we make the change I suggested before, where we do not need to do the put before calling the destroy, we should have: if (!tr) { tr = trace_array_create("foo-bar"); trace_array_get(tr); This would need a corresponding trace_array_put(?). Additionally, this would again traverse the list of instances, So I went with incrementing the ref ctr inside the function. } // Log to tr // Enable/disable events to tr trace_array_set_clr_event(tr, _THIS_IP,"system","event",1); What's with the __THIS_IP? My bad! Maybe remnants from a trace_array_printk copy-paste. Will correct this. // Done using tr trace_array_put(tr); .. Also, use trace_array_set_clr_event to enable/disable events to a trace array. So now we no longer need to have ftrace_set_clr_event as an exported API. Signed-off-by: Divya Indi --- include/linux/trace.h| 2 ++ include/linux/trace_events.h | 3 ++- kernel/trace/trace.c | 28 kernel/trace/trace_events.c | 23 ++- 4 files changed, 54 insertions(+), 2 deletions(-) diff --git a/include/linux/trace.h b/include/linux/trace.h index 2c782d5..05164bb 100644 --- a/include/linux/trace.h +++ b/include/linux/trace.h @@ -32,6 +32,8 @@ int trace_array_printk(struct trace_array *tr, unsigned long ip, struct trace_array *trace_array_create(const char *name); int trace_array_destroy(struct trace_array *tr); void trace_array_put(struct trace_array *tr); +struct trace_array *trace_array_lookup(const char *name); + #endif/* CONFIG_TRACING */ #endif /* _LINUX_TRACE_H */ diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 8a62731..05a7514 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -540,7 +540,8 @@ extern int trace_define_field(struct trace_event_call *call, const char *type, #define is_signed_type(type) (((type)(-1)) < (type)1) int trace_set_clr_event(const char *system, const char *event, int set); - +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set); /* * The double __builtin_constant_p is because gcc will give us an error * if we try to allocate the static variable to fmt if it is not a diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 7b6a37a..e394d55 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8514,6 +8514,34 @@ static int instance_rmdir(const char *name) return ret; } +/** + * trace_array_lookup - Lookup the trace array, given its name. + * @name: The name of the trace array to be looked up. + * + * Lookup and return the trace array associated with @name. + * + * NOTE: The reference counter associated with the returned trace array is + * incremented to ensure it is not freed when in use. Make sure to call + * "trace_array_put" for this trace array when its use is done. + */ +struct trace_array *trace_array_lookup(const char *name) +{ + struct trace_array *tr; + + mutex_lock(_types_lock); + + list_for_each_entry(tr, _trace_arrays, list) { + if (tr->name && strcmp(tr->name, name) == 0) { + tr->ref++; + mutex_unlock(_types_lock); + return tr; + } + } + mutex_unlock(_types_lock); + return NULL; +} +EXPORT_SYMBOL_GPL(trace_array_lookup); + static __init void create_trace_instances(struct dentry *d_tracer) { trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer, diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 2621995..96dd997 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -834,7 +834,6 @@ static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) return ret; } -EXPORT_SYMBOL_GPL(ftrace_set_clr_event); /** * trace_set_clr_event - enable or disable an event @@ -859,6 +858,28 @@ int trace_set_clr_event(const char *system, const char *event, int set) } EXPORT_SYMBOL_GPL
Re: [PATCH 4/5] tracing: Handle the trace array ref counter in new functions
Hi Steve, Thanks again for taking the time to review and providing feedback. Please find my comments inline. On 10/15/19 4:04 PM, Steven Rostedt wrote: Sorry for taking so long to getting to these patches. On Wed, 14 Aug 2019 10:55:26 -0700 Divya Indi wrote: For functions returning a trace array Eg: trace_array_create(), we need to increment the reference counter associated with the trace array to ensure it does not get freed when in use. Once we are done using the trace array, we need to call trace_array_put() to make sure we are not holding a reference to it anymore and the instance/trace array can be removed when required. I think it would be more in line with other parts of the kernel if we don't need to do the trace_array_put() before calling trace_array_destroy(). The reason we went with this approach is instance_mkdir - ref_ctr = 0 // Does not return a trace array ptr. trace_array_create - ref_ctr = 1 // Since this returns a trace array ptr. trace_array_lookup - ref_ctr = 1 // Since this returns a trace array ptr. if we make trace_array_destroy to expect ref_ctr to be 1, we risk destroying the trace array while in use. We could make it - instance_mkdir -ref_ctr = 1 trace_array_create -ref_ctr = 2 trace_array_lookup -ref_ctr = 2+ // depending on no of lookups but, we'd still need the trace_array_put() (?) We can also have one function doing create (if does not exist) or lookup (if exists), but that would require some redundant code since instance_mkdir needs to return -EXIST when a trace array already exists. Let me know your thoughts on this. Thanks, Divya That is, if we make the following change: diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 5ff206ce259e..ae12aac21c6c 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8527,9 +8527,11 @@ static int __remove_instance(struct trace_array *tr) { int i; - if (tr->ref || (tr->current_trace && tr->current_trace->ref)) + if (tr->ref > 1 || (tr->current_trace && tr->current_trace->ref)) return -EBUSY; + WARN_ON_ONCE(tr->ref != 1); + list_del(>list); /* Disable all the flags that were enabled coming in */ Hence, additionally exporting trace_array_put(). Example use: tr = trace_array_create("foo-bar"); // Use this trace array // Log to this trace array or enable/disable events to this trace array. // tr no longer required trace_array_put(); Signed-off-by: Divya Indi --- include/linux/trace.h | 1 + kernel/trace/trace.c | 41 - 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/include/linux/trace.h b/include/linux/trace.h index 24fcf07..2c782d5 100644 --- a/include/linux/trace.h +++ b/include/linux/trace.h @@ -31,6 +31,7 @@ int trace_array_printk(struct trace_array *tr, unsigned long ip, const char *fmt, ...); struct trace_array *trace_array_create(const char *name); int trace_array_destroy(struct trace_array *tr); +void trace_array_put(struct trace_array *tr); #endif/* CONFIG_TRACING */ #endif /* _LINUX_TRACE_H */ diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 22bf166..7b6a37a 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -297,12 +297,22 @@ static void __trace_array_put(struct trace_array *this_tr) this_tr->ref--; } +/** + * trace_array_put - Decrement reference counter for the given trace array. + * @tr: Trace array for which reference counter needs to decrement. + * + * NOTE: Functions like trace_array_create increment the reference counter for + * the trace array to ensure they do not get freed while in use. Make sure to + * call trace_array_put() when the use is done. This will ensure that the + * instance can be later removed. + */ void trace_array_put(struct trace_array *this_tr) { mutex_lock(_types_lock); __trace_array_put(this_tr); mutex_unlock(_types_lock); } +EXPORT_SYMBOL_GPL(trace_array_put); int call_filter_check_discard(struct trace_event_call *call, void *rec, struct ring_buffer *buffer, @@ -8302,6 +8312,16 @@ static void update_tracer_options(struct trace_array *tr) mutex_unlock(_types_lock); } +/** + * trace_array_create - Create a new trace array with the given name. + * @name: The name of the trace array to be created. + * + * Create and return a trace array with given name if it does not exist. + * + * NOTE: The reference counter associated with the returned trace array is + * incremented to ensure it is not freed when in use. Make sure to call + * "trace_array_put" for this trace array when its use is done. + */ struct trace_array *trace_array_create(const char *name) { struct trace_array *tr; @@ -8364,6 +8384,8 @@ struct trace_array *trace_array_cre
Re: [PATCH] tracing: Sample module to demonstrate kernel access to Ftrace instances.
[Apologies for multiple resends, formatting issue] Hi Steven, Thanks for taking a look. This sample module is dependent on - - commit: f45d122 tracing: Kernel access to Ftrace instances - Patches pending review: https://lore.kernel.org/lkml/1565805327-579-1-git-send-email-divya.i...@oracle.com/ I mentioned it in the cover page, but will add to this patch as well to avoid confusion. So, I think the messages you are seeing is due to the absence of Patches pending review: https://lore.kernel.org/lkml/1565805327-579-1-git-send-email-divya.i...@oracle.com/ Let me know if you prefer to have the sample module and this patch-set (pending review) as part of one patch set. For reference, adding the cover page details here - -- This patch is for a sample module to demonstrate the use of APIs that were introduced/exported in order to access Ftrace instances from within the kernel. Please Note: This module is dependent on - - commit: f45d122 tracing: Kernel access to Ftrace instances - Patches pending review: https://lore.kernel.org/lkml/1565805327-579-1-git-send-email-divya.i...@oracle.com/ The sample module creates/lookup a trace array called sample-instance on module load time. We then start a kernel thread(simple-thread) to - 1) Enable tracing for event "sample_event" to buffer associated with the trace array - "sample-instance". 2) Start a timer that will disable tracing to this buffer after 5 sec. (Tracing disabled after 5 sec ie at count=4) 3) Write to the buffer using trace_array_printk() 4) Stop the kernel thread and destroy the buffer during module unload. A sample output for the same - # tracer: nop # # entries-in-buffer/entries-written: 16/16 #P:4 # # _-=> irqs-off # / _=> need-resched #| / _---=> hardirq/softirq #|| / _--=> preempt-depth #||| / delay # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | sample-instance-26797 [003] 955180.489833: simple_thread: trace_array_printk: count=0 sample-instance-26797 [003] 955180.489836: sample_event: count value=0 at jiffies=5249940864 sample-instance-26797 [003] 955181.513722: simple_thread: trace_array_printk: count=1 sample-instance-26797 [003] 955181.513724: sample_event: count value=1 at jiffies=5249941888 sample-instance-26797 [003] 955182.537629: simple_thread: trace_array_printk: count=2 sample-instance-26797 [003] 955182.537631: sample_event: count value=2 at jiffies=5249942912 sample-instance-26797 [003] 955183.561516: simple_thread: trace_array_printk: count=3 sample-instance-26797 [003] 955183.561518: sample_event: count value=3 at jiffies=5249943936 sample-instance-26797 [003] 955184.585423: simple_thread: trace_array_printk: count=4 sample-instance-26797 [003] 955184.585427: sample_event: count value=4 at jiffies=5249944960 sample-instance-26797 [003] 955185.609344: simple_thread: trace_array_printk: count=5 sample-instance-26797 [003] 955186.633241: simple_thread: trace_array_printk: count=6 sample-instance-26797 [003] 955187.657157: simple_thread: trace_array_printk: count=7 sample-instance-26797 [003] 955188.681039: simple_thread: trace_array_printk: count=8 sample-instance-26797 [003] 955189.704937: simple_thread: trace_array_printk: count=9 sample-instance-26797 [003] 955190.728840: simple_thread: trace_array_printk: count=10 Thanks, Divya On 10/15/19 10:05 AM, Steven Rostedt wrote: On Fri, 20 Sep 2019 16:59:26 -0700 Divya Indi wrote: This is a sample module to demostrate the use of the newly introduced and exported APIs to access Ftrace instances from within the kernel. Newly introduced APIs used here - 1. Create a new trace array if it does not exist. struct trace_array *trace_array_create(const char *name) 2. Destroy/Remove a trace array. int trace_array_destroy(struct trace_array *tr) 3. Lookup a trace array, given its name. struct trace_array *trace_array_lookup(const char *name) 4. Enable/Disable trace events: int trace_array_set_clr_event(struct trace_array *tr, const char *system, const char *event, int set); Exported APIs - 1. trace_printk equivalent for instances. int trace_array_printk(struct trace_array *tr, unsigned long ip, const char *fmt, ...); 2. Helper function. void trace_printk_init_buffers(void); 3. To decrement the reference counter. void trace_array_put(struct trace_array *tr) Signed-off-by: Divya Indi Reviewed-by: Manjunath Patil Reviewed-by: Joe Jin --- samples/Kconfig
[RFC]Sample module for Kernel access to Ftrace instances.
[PATCH] tracing: Sample module to demonstrate kernel access to Ftrace Hi, This patch is for a sample module to demonstrate the use of APIs that were introduced/exported in order to access Ftrace instances from within the kernel. Please Note: This module is dependent on - - commit: f45d122 tracing: Kernel access to Ftrace instances - Patches pending review: https://lore.kernel.org/lkml/1565805327-579-1-git-send-email-divya.i...@oracle.com/ The sample module creates/lookup a trace array called sample-instance on module load time. We then start a kernel thread(simple-thread) to - 1) Enable tracing for event "sample_event" to buffer associated with the trace array - "sample-instance". 2) Start a timer that will disable tracing to this buffer after 5 sec. (Tracing disabled after 5 sec ie at count=4) 3) Write to the buffer using trace_array_printk() 4) Stop the kernel thread and destroy the buffer during module unload. A sample output for the same - # tracer: nop # # entries-in-buffer/entries-written: 16/16 #P:4 # # _-=> irqs-off # / _=> need-resched #| / _---=> hardirq/softirq #|| / _--=> preempt-depth #||| / delay # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | sample-instance-26797 [003] 955180.489833: simple_thread: trace_array_printk: count=0 sample-instance-26797 [003] 955180.489836: sample_event: count value=0 at jiffies=5249940864 sample-instance-26797 [003] 955181.513722: simple_thread: trace_array_printk: count=1 sample-instance-26797 [003] 955181.513724: sample_event: count value=1 at jiffies=5249941888 sample-instance-26797 [003] 955182.537629: simple_thread: trace_array_printk: count=2 sample-instance-26797 [003] 955182.537631: sample_event: count value=2 at jiffies=5249942912 sample-instance-26797 [003] 955183.561516: simple_thread: trace_array_printk: count=3 sample-instance-26797 [003] 955183.561518: sample_event: count value=3 at jiffies=5249943936 sample-instance-26797 [003] 955184.585423: simple_thread: trace_array_printk: count=4 sample-instance-26797 [003] 955184.585427: sample_event: count value=4 at jiffies=5249944960 sample-instance-26797 [003] 955185.609344: simple_thread: trace_array_printk: count=5 sample-instance-26797 [003] 955186.633241: simple_thread: trace_array_printk: count=6 sample-instance-26797 [003] 955187.657157: simple_thread: trace_array_printk: count=7 sample-instance-26797 [003] 955188.681039: simple_thread: trace_array_printk: count=8 sample-instance-26797 [003] 955189.704937: simple_thread: trace_array_printk: count=9 sample-instance-26797 [003] 955190.728840: simple_thread: trace_array_printk: count=10 Let me know if you have any questions. Thanks, Divya
[PATCH] tracing: Sample module to demonstrate kernel access to Ftrace instances.
This is a sample module to demostrate the use of the newly introduced and exported APIs to access Ftrace instances from within the kernel. Newly introduced APIs used here - 1. Create a new trace array if it does not exist. struct trace_array *trace_array_create(const char *name) 2. Destroy/Remove a trace array. int trace_array_destroy(struct trace_array *tr) 3. Lookup a trace array, given its name. struct trace_array *trace_array_lookup(const char *name) 4. Enable/Disable trace events: int trace_array_set_clr_event(struct trace_array *tr, const char *system, const char *event, int set); Exported APIs - 1. trace_printk equivalent for instances. int trace_array_printk(struct trace_array *tr, unsigned long ip, const char *fmt, ...); 2. Helper function. void trace_printk_init_buffers(void); 3. To decrement the reference counter. void trace_array_put(struct trace_array *tr) Signed-off-by: Divya Indi Reviewed-by: Manjunath Patil Reviewed-by: Joe Jin --- samples/Kconfig | 7 ++ samples/Makefile | 1 + samples/ftrace_instance/Makefile | 6 ++ samples/ftrace_instance/sample-trace-array.c | 134 +++ samples/ftrace_instance/sample-trace-array.h | 84 + 5 files changed, 232 insertions(+) create mode 100644 samples/ftrace_instance/Makefile create mode 100644 samples/ftrace_instance/sample-trace-array.c create mode 100644 samples/ftrace_instance/sample-trace-array.h diff --git a/samples/Kconfig b/samples/Kconfig index d63cc8a..1c7864b 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -20,6 +20,13 @@ config SAMPLE_TRACE_PRINTK This builds a module that calls trace_printk() and can be used to test various trace_printk() calls from a module. +config SAMPLE_TRACE_ARRAY +tristate "Build sample module for kernel access to Ftrace instancess" + depends on EVENT_TRACING && m + help +This builds a module that demonstrates the use of various APIs to +access Ftrace instances from within the kernel. + config SAMPLE_KOBJECT tristate "Build kobject examples" help diff --git a/samples/Makefile b/samples/Makefile index debf892..02c444e 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -17,6 +17,7 @@ obj-$(CONFIG_SAMPLE_RPMSG_CLIENT) += rpmsg/ subdir-$(CONFIG_SAMPLE_SECCOMP)+= seccomp obj-$(CONFIG_SAMPLE_TRACE_EVENTS) += trace_events/ obj-$(CONFIG_SAMPLE_TRACE_PRINTK) += trace_printk/ +obj-$(CONFIG_SAMPLE_TRACE_ARRAY) += ftrace_instance/ obj-$(CONFIG_VIDEO_PCI_SKELETON) += v4l/ obj-y += vfio-mdev/ subdir-$(CONFIG_SAMPLE_VFS)+= vfs diff --git a/samples/ftrace_instance/Makefile b/samples/ftrace_instance/Makefile new file mode 100644 index 000..3603b13 --- /dev/null +++ b/samples/ftrace_instance/Makefile @@ -0,0 +1,6 @@ +# Builds a module that calls various routines to access Ftrace instances. +# To use(as root): insmod sample-trace-array.ko + +CFLAGS_sample-trace-array.o := -I$(src) + +obj-$(CONFIG_SAMPLE_TRACE_ARRAY) += sample-trace-array.o diff --git a/samples/ftrace_instance/sample-trace-array.c b/samples/ftrace_instance/sample-trace-array.c new file mode 100644 index 000..0595bc7 --- /dev/null +++ b/samples/ftrace_instance/sample-trace-array.c @@ -0,0 +1,134 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include +#include +#include +#include + +/* + * Any file that uses trace points, must include the header. + * But only one file, must include the header by defining + * CREATE_TRACE_POINTS first. This will make the C code that + * creates the handles for the trace points. + */ +#define CREATE_TRACE_POINTS +#include "sample-trace-array.h" + +struct trace_array *tr; +static void mytimer_handler(struct timer_list *unused); +static struct task_struct *simple_tsk; + +/* + * mytimer: Timer setup to disable tracing for event "sample_event". This + * timer is only for the purposes of the sample module to demonstrate access of + * Ftrace instances from within kernel. + */ +static DEFINE_TIMER(mytimer, mytimer_handler); + +static void mytimer_handler(struct timer_list *unused) +{ + /* +* Disable tracing for event "sample_event". +*/ + trace_array_set_clr_event(tr, "sample-subsystem", "sample_event", 0); +} + +static void simple_thread_func(int count) +{ + set_current_state(TASK_INTERRUPTIBLE); + schedule_timeout(HZ); + + /* +* Printing count value using trace_array_printk() - trace_printk() +* equivalent for the instance buffers. +*/ + trace_array_printk(tr, _THIS_IP_, "trace_array_printk: count=%d\n", + count); + /* +* Tracepoint for event &
[PATCH 4/5] tracing: Handle the trace array ref counter in new functions
For functions returning a trace array Eg: trace_array_create(), we need to increment the reference counter associated with the trace array to ensure it does not get freed when in use. Once we are done using the trace array, we need to call trace_array_put() to make sure we are not holding a reference to it anymore and the instance/trace array can be removed when required. Hence, additionally exporting trace_array_put(). Example use: tr = trace_array_create("foo-bar"); // Use this trace array // Log to this trace array or enable/disable events to this trace array. // tr no longer required trace_array_put(); Signed-off-by: Divya Indi --- include/linux/trace.h | 1 + kernel/trace/trace.c | 41 - 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/include/linux/trace.h b/include/linux/trace.h index 24fcf07..2c782d5 100644 --- a/include/linux/trace.h +++ b/include/linux/trace.h @@ -31,6 +31,7 @@ int trace_array_printk(struct trace_array *tr, unsigned long ip, const char *fmt, ...); struct trace_array *trace_array_create(const char *name); int trace_array_destroy(struct trace_array *tr); +void trace_array_put(struct trace_array *tr); #endif /* CONFIG_TRACING */ #endif /* _LINUX_TRACE_H */ diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 22bf166..7b6a37a 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -297,12 +297,22 @@ static void __trace_array_put(struct trace_array *this_tr) this_tr->ref--; } +/** + * trace_array_put - Decrement reference counter for the given trace array. + * @tr: Trace array for which reference counter needs to decrement. + * + * NOTE: Functions like trace_array_create increment the reference counter for + * the trace array to ensure they do not get freed while in use. Make sure to + * call trace_array_put() when the use is done. This will ensure that the + * instance can be later removed. + */ void trace_array_put(struct trace_array *this_tr) { mutex_lock(_types_lock); __trace_array_put(this_tr); mutex_unlock(_types_lock); } +EXPORT_SYMBOL_GPL(trace_array_put); int call_filter_check_discard(struct trace_event_call *call, void *rec, struct ring_buffer *buffer, @@ -8302,6 +8312,16 @@ static void update_tracer_options(struct trace_array *tr) mutex_unlock(_types_lock); } +/** + * trace_array_create - Create a new trace array with the given name. + * @name: The name of the trace array to be created. + * + * Create and return a trace array with given name if it does not exist. + * + * NOTE: The reference counter associated with the returned trace array is + * incremented to ensure it is not freed when in use. Make sure to call + * "trace_array_put" for this trace array when its use is done. + */ struct trace_array *trace_array_create(const char *name) { struct trace_array *tr; @@ -8364,6 +8384,8 @@ struct trace_array *trace_array_create(const char *name) list_add(>list, _trace_arrays); + tr->ref++; + mutex_unlock(_types_lock); mutex_unlock(_mutex); @@ -8385,7 +8407,19 @@ struct trace_array *trace_array_create(const char *name) static int instance_mkdir(const char *name) { - return PTR_ERR_OR_ZERO(trace_array_create(name)); + struct trace_array *tr; + + tr = trace_array_create(name); + if (IS_ERR(tr)) + return PTR_ERR(tr); + + /* This function does not return a reference to the created trace array, +* so the reference counter is to be 0 when it returns. +* trace_array_create increments the ref counter, decrement it here +* by calling trace_array_put() */ + trace_array_put(tr); + + return 0; } static int __remove_instance(struct trace_array *tr) @@ -8424,6 +8458,11 @@ static int __remove_instance(struct trace_array *tr) return 0; } +/* + * NOTE: An instance cannot be removed if there is still a reference to it. + * Make sure to call "trace_array_put" for a trace array returned by functions + * like trace_array_create(), otherwise trace_array_destroy will not succeed. + */ int trace_array_destroy(struct trace_array *this_tr) { struct trace_array *tr; -- 1.8.3.1
[PATCH 1/5] tracing: Declare newly exported APIs in include/linux/trace.h
Declare the newly introduced and exported APIs in the header file - include/linux/trace.h. Moving previous declarations from kernel/trace/trace.h to include/linux/trace.h. Signed-off-by: Divya Indi --- include/linux/trace.h | 7 +++ kernel/trace/trace.h | 4 +--- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/trace.h b/include/linux/trace.h index b95ffb2..24fcf07 100644 --- a/include/linux/trace.h +++ b/include/linux/trace.h @@ -24,6 +24,13 @@ struct trace_export { int register_ftrace_export(struct trace_export *export); int unregister_ftrace_export(struct trace_export *export); +struct trace_array; + +void trace_printk_init_buffers(void); +int trace_array_printk(struct trace_array *tr, unsigned long ip, + const char *fmt, ...); +struct trace_array *trace_array_create(const char *name); +int trace_array_destroy(struct trace_array *tr); #endif /* CONFIG_TRACING */ #endif /* _LINUX_TRACE_H */ diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 005f086..66ff63e 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -852,8 +853,6 @@ extern int trace_selftest_startup_branch(struct tracer *trace, extern int trace_array_vprintk(struct trace_array *tr, unsigned long ip, const char *fmt, va_list args); -int trace_array_printk(struct trace_array *tr, - unsigned long ip, const char *fmt, ...); int trace_array_printk_buf(struct ring_buffer *buffer, unsigned long ip, const char *fmt, ...); void trace_printk_seq(struct trace_seq *s); @@ -1869,7 +1868,6 @@ extern int trace_event_enable_disable(struct trace_event_file *file, extern const char *__stop___tracepoint_str[]; void trace_printk_control(bool enabled); -void trace_printk_init_buffers(void); void trace_printk_start_comm(void); int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set); int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled); -- 1.8.3.1
[PATCH 0/5 v4]Kernel Access to Ftrace instances.
In addition to patches introduced by commit f45d1225adb0 "tracing: Kernel access to Ftrace instances") we also need the following patches to reliably access ftrace instances from other kernel modules or components. This version addresses the review comments/suggestions received for v3. Please review the patches that follow. Divya Indi (5): tracing: Declare newly exported APIs in include/linux/trace.h tracing: Verify if trace array exists before destroying it. tracing: Adding NULL checks tracing: Handle the trace array ref counter in new functions tracing: New functions for kernel access to Ftrace instances include/linux/trace.h| 10 + include/linux/trace_events.h | 3 +- kernel/trace/trace.c | 88 ++-- kernel/trace/trace.h | 4 +- kernel/trace/trace_events.c | 25 - 5 files changed, 121 insertions(+), 9 deletions(-) -- 1.8.3.1
[PATCH 5/5] tracing: New functions for kernel access to Ftrace instances
Adding 2 new functions - 1) trace_array_lookup : Look up and return a trace array, given its name. 2) trace_array_set_clr_event : Enable/disable event recording to the given trace array. NOTE: trace_array_lookup returns a trace array and also increments the reference counter associated with the returned trace array. Make sure to call trace_array_put() once the use is done so that the instance can be removed at a later time. Example use: tr = trace_array_lookup("foo-bar"); if (!tr) tr = trace_array_create("foo-bar"); // Log to tr // Enable/disable events to tr trace_array_set_clr_event(tr, _THIS_IP,"system","event",1); // Done using tr trace_array_put(tr); .. Also, use trace_array_set_clr_event to enable/disable events to a trace array. So now we no longer need to have ftrace_set_clr_event as an exported API. Signed-off-by: Divya Indi --- include/linux/trace.h| 2 ++ include/linux/trace_events.h | 3 ++- kernel/trace/trace.c | 28 kernel/trace/trace_events.c | 23 ++- 4 files changed, 54 insertions(+), 2 deletions(-) diff --git a/include/linux/trace.h b/include/linux/trace.h index 2c782d5..05164bb 100644 --- a/include/linux/trace.h +++ b/include/linux/trace.h @@ -32,6 +32,8 @@ int trace_array_printk(struct trace_array *tr, unsigned long ip, struct trace_array *trace_array_create(const char *name); int trace_array_destroy(struct trace_array *tr); void trace_array_put(struct trace_array *tr); +struct trace_array *trace_array_lookup(const char *name); + #endif /* CONFIG_TRACING */ #endif /* _LINUX_TRACE_H */ diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 8a62731..05a7514 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -540,7 +540,8 @@ extern int trace_define_field(struct trace_event_call *call, const char *type, #define is_signed_type(type) (((type)(-1)) < (type)1) int trace_set_clr_event(const char *system, const char *event, int set); - +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set); /* * The double __builtin_constant_p is because gcc will give us an error * if we try to allocate the static variable to fmt if it is not a diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 7b6a37a..e394d55 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8514,6 +8514,34 @@ static int instance_rmdir(const char *name) return ret; } +/** + * trace_array_lookup - Lookup the trace array, given its name. + * @name: The name of the trace array to be looked up. + * + * Lookup and return the trace array associated with @name. + * + * NOTE: The reference counter associated with the returned trace array is + * incremented to ensure it is not freed when in use. Make sure to call + * "trace_array_put" for this trace array when its use is done. + */ +struct trace_array *trace_array_lookup(const char *name) +{ + struct trace_array *tr; + + mutex_lock(_types_lock); + + list_for_each_entry(tr, _trace_arrays, list) { + if (tr->name && strcmp(tr->name, name) == 0) { + tr->ref++; + mutex_unlock(_types_lock); + return tr; + } + } + mutex_unlock(_types_lock); + return NULL; +} +EXPORT_SYMBOL_GPL(trace_array_lookup); + static __init void create_trace_instances(struct dentry *d_tracer) { trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer, diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 2621995..96dd997 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -834,7 +834,6 @@ static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) return ret; } -EXPORT_SYMBOL_GPL(ftrace_set_clr_event); /** * trace_set_clr_event - enable or disable an event @@ -859,6 +858,28 @@ int trace_set_clr_event(const char *system, const char *event, int set) } EXPORT_SYMBOL_GPL(trace_set_clr_event); +/** + * trace_array_set_clr_event - enable or disable an event for a trace array + * @system: system name to match (NULL for any system) + * @event: event name to match (NULL for all events, within system) + * @set: 1 to enable, 0 to disable + * + * This is a way for other parts of the kernel to enable or disable + * event recording to instances. + * + * Returns 0 on success, -EINVAL if the parameters do not match any + * registered events. + */ +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set) +{ + if (!tr) + return -ENOENT; + + return __ftrace_set_clr_event(tr, NULL, system, event, set); +} +EXPORT_SYMBOL_GPL(trace_array_set_clr_event); + /* 128 should be much more than enough */ #define EVENT_BUF_SIZE 127 -- 1.8.3.1
[PATCH 2/5] tracing: Verify if trace array exists before destroying it.
A trace array can be destroyed from userspace or kernel. Verify if the trace exists before proceeding to destroy/remove it. Signed-off-by: Divya Indi --- kernel/trace/trace.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 1c80521..468ecc5 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8421,17 +8421,27 @@ static int __remove_instance(struct trace_array *tr) return 0; } -int trace_array_destroy(struct trace_array *tr) +int trace_array_destroy(struct trace_array *this_tr) { + struct trace_array *tr; int ret; - if (!tr) + if (!this_tr) { return -EINVAL; + } mutex_lock(_mutex); mutex_lock(_types_lock); - ret = __remove_instance(tr); + ret = -ENODEV; + + /* Making sure trace array exists before destroying it. */ + list_for_each_entry(tr, _trace_arrays, list) { + if (tr == this_tr) { + ret = __remove_instance(tr); + break; + } + } mutex_unlock(_types_lock); mutex_unlock(_mutex); -- 1.8.3.1
[PATCH 3/5] tracing: Adding NULL checks
As part of commit f45d1225adb0 ("tracing: Kernel access to Ftrace instances") we exported certain functions. Here, we are adding some additional NULL checks to ensure safe usage by users of these APIs. Signed-off-by: Divya Indi --- kernel/trace/trace.c| 3 +++ kernel/trace/trace_events.c | 2 ++ 2 files changed, 5 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 468ecc5..22bf166 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -3205,6 +3205,9 @@ int trace_array_printk(struct trace_array *tr, if (!(global_trace.trace_flags & TRACE_ITER_PRINTK)) return 0; + if (!tr) + return -ENOENT; + va_start(ap, fmt); ret = trace_array_vprintk(tr, ip, fmt, ap); va_end(ap); diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 0ce3db6..2621995 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -800,6 +800,8 @@ static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) char *event = NULL, *sub = NULL, *match; int ret; + if (!tr) + return -ENOENT; /* * The buf format can be : * *: means any event by that name. -- 1.8.3.1
Re: [PATCH 7/7] tracing: Un-export ftrace_set_clr_event
Hi Steven, On 8/2/19 2:25 PM, Steven Rostedt wrote: On Fri, 2 Aug 2019 14:12:54 -0700 Divya Indi wrote: Hi Steve, The first patch would be like a temporary fix in case we need more changes to the patches that add the new function - trace_array_set_clr() and unexport ftrace_set_clr_event() and might take some time. In which case I think it would be good to have this in place (But, not part of this series). If they all are to go in together as part of the same release ie if all is good with the concerned patches (Patch 6 & Patch 7), then I think having this patch would be meaningless. Can you just do this part of patch 6 instead? Yes, will merge the two ie - 1) Add 2 new functions - trace_array_set_clr_event(), trace_array_lookup() 2) Unexport ftrace_set_clr_event. into a single patch. Thanks, Divya +/** + * trace_array_set_clr_event - enable or disable an event for a trace array + * @system: system name to match (NULL for any system) + * @event: event name to match (NULL for all events, within system) + * @set: 1 to enable, 0 to disable + * + * This is a way for other parts of the kernel to enable or disable + * event recording to instances. + * + * Returns 0 on success, -EINVAL if the parameters do not match any + * registered events. + */ +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set) +{ + if (!tr) + return -ENOENT; + + return __ftrace_set_clr_event(tr, NULL, system, event, set); +} +EXPORT_SYMBOL_GPL(trace_array_set_clr_event); + -- Steve
Re: [PATCH 7/7] tracing: Un-export ftrace_set_clr_event
Hi Steve, The first patch would be like a temporary fix in case we need more changes to the patches that add the new function - trace_array_set_clr() and unexport ftrace_set_clr_event() and might take some time. In which case I think it would be good to have this in place (But, not part of this series). If they all are to go in together as part of the same release ie if all is good with the concerned patches (Patch 6 & Patch 7), then I think having this patch would be meaningless. Thanks, Divya On 8/2/19 1:46 PM, Steven Rostedt wrote: On Fri, 2 Aug 2019 13:41:20 -0700 Divya Indi wrote: As a stand alone patch, the first one may be fine. But as part of a series, it doesn't make sense to add it. I see. Will separate this out from the series. Is that really needed? Do you need to have that patch in the kernel? Do you plan on marking it for stable? -- Steve
Re: [PATCH 7/7] tracing: Un-export ftrace_set_clr_event
Hi Steven, On 7/29/19 5:51 PM, Steven Rostedt wrote: On Mon, 29 Jul 2019 17:02:34 -0700 Divya Indi wrote: Use "trace_array_set_clr_event" to enable/disable events to a trace array from other kernel modules/components. Hence, we no longer need to have ftrace_set_clr_event as an exported API. Hmm, this simply reverts patch 1. Why have patch 1 and this patch in the first place? Right! First patch fixes issues in a previous commit and then this patch reverts the part of previous commit that required the fix. We can eliminate the first patch if it seems counter intuitive. Thanks, Divya -- Steve Signed-off-by: Divya Indi Reviewed-By: Aruna Ramakrishna
[PATCH 3/7] tracing: Verify if trace array exists before destroying it.
A trace array can be destroyed from userspace or kernel. Verify if the trace array exists before proceeding to destroy/remove it. Signed-off-by: Divya Indi Reviewed-By: Aruna Ramakrishna --- kernel/trace/trace.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 1c80521..468ecc5 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8421,17 +8421,27 @@ static int __remove_instance(struct trace_array *tr) return 0; } -int trace_array_destroy(struct trace_array *tr) +int trace_array_destroy(struct trace_array *this_tr) { + struct trace_array *tr; int ret; - if (!tr) + if (!this_tr) { return -EINVAL; + } mutex_lock(_mutex); mutex_lock(_types_lock); - ret = __remove_instance(tr); + ret = -ENODEV; + + /* Making sure trace array exists before destroying it. */ + list_for_each_entry(tr, _trace_arrays, list) { + if (tr == this_tr) { + ret = __remove_instance(tr); + break; + } + } mutex_unlock(_types_lock); mutex_unlock(_mutex); -- 1.8.3.1
[PATCH 1/7] tracing: Required changes to use ftrace_set_clr_event.
As part of commit f45d1225adb0 ("tracing: Kernel access to Ftrace instances") ftrace_set_clr_event was exported, but for other kernel modules to use the function, we require the following additional changes 1. Removing the static keyword for this newly exported function. 2. Declaring it in the header file - include/linux/trace_events.h Signed-off-by: Divya Indi Reviewed-By: Aruna Ramakrishna --- include/linux/trace_events.h | 1 + kernel/trace/trace_events.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 8a62731..0f874fb 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -540,6 +540,7 @@ extern int trace_define_field(struct trace_event_call *call, const char *type, #define is_signed_type(type) (((type)(-1)) < (type)1) int trace_set_clr_event(const char *system, const char *event, int set); +int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); /* * The double __builtin_constant_p is because gcc will give us an error diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 0ce3db6..b6b4618 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -795,7 +795,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, const char *match, return ret; } -static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) +int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) { char *event = NULL, *sub = NULL, *match; int ret; -- 1.8.3.1
[PATCH 7/7] tracing: Un-export ftrace_set_clr_event
Use "trace_array_set_clr_event" to enable/disable events to a trace array from other kernel modules/components. Hence, we no longer need to have ftrace_set_clr_event as an exported API. Signed-off-by: Divya Indi Reviewed-By: Aruna Ramakrishna --- include/linux/trace_events.h | 1 - kernel/trace/trace_events.c | 3 +-- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 6da3600..025ae2b 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -542,7 +542,6 @@ extern int trace_define_field(struct trace_event_call *call, const char *type, int trace_set_clr_event(const char *system, const char *event, int set); int trace_array_set_clr_event(struct trace_array *tr, const char *system, const char *event, int set); -int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); /* * The double __builtin_constant_p is because gcc will give us an error diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 9ee6b52..96dd997 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -795,7 +795,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, const char *match, return ret; } -int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) +static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) { char *event = NULL, *sub = NULL, *match; int ret; @@ -834,7 +834,6 @@ int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) return ret; } -EXPORT_SYMBOL_GPL(ftrace_set_clr_event); /** * trace_set_clr_event - enable or disable an event -- 1.8.3.1
[PATCH 2/7] tracing: Declare newly exported APIs in include/linux/trace.h
Declare the newly introduced and exported APIs in the header file - include/linux/trace.h. Moving previous declarations from kernel/trace/trace.h to include/linux/trace.h. Signed-off-by: Divya Indi Reviewed-By: Aruna Ramakrishna --- include/linux/trace.h | 7 +++ kernel/trace/trace.h | 4 +--- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/trace.h b/include/linux/trace.h index b95ffb2..24fcf07 100644 --- a/include/linux/trace.h +++ b/include/linux/trace.h @@ -24,6 +24,13 @@ struct trace_export { int register_ftrace_export(struct trace_export *export); int unregister_ftrace_export(struct trace_export *export); +struct trace_array; + +void trace_printk_init_buffers(void); +int trace_array_printk(struct trace_array *tr, unsigned long ip, + const char *fmt, ...); +struct trace_array *trace_array_create(const char *name); +int trace_array_destroy(struct trace_array *tr); #endif /* CONFIG_TRACING */ #endif /* _LINUX_TRACE_H */ diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 005f086..66ff63e 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -852,8 +853,6 @@ extern int trace_selftest_startup_branch(struct tracer *trace, extern int trace_array_vprintk(struct trace_array *tr, unsigned long ip, const char *fmt, va_list args); -int trace_array_printk(struct trace_array *tr, - unsigned long ip, const char *fmt, ...); int trace_array_printk_buf(struct ring_buffer *buffer, unsigned long ip, const char *fmt, ...); void trace_printk_seq(struct trace_seq *s); @@ -1869,7 +1868,6 @@ extern int trace_event_enable_disable(struct trace_event_file *file, extern const char *__stop___tracepoint_str[]; void trace_printk_control(bool enabled); -void trace_printk_init_buffers(void); void trace_printk_start_comm(void); int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set); int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled); -- 1.8.3.1
[PATCH 5/7] tracing: Handle the trace array ref counter in new functions
For functions returning a trace array Eg: trace_array_create(), we need to increment the reference counter associated with the trace array to ensure it does not get freed when in use. Once we are done using the trace array, we need to call trace_array_put() to make sure we are not holding a reference to it anymore and the instance/trace array can be removed when required. Hence, additionally exporting trace_array_put(). Example use: tr = trace_array_create("foo-bar"); // Use this trace array // Log to this trace array or enable/disable events to this trace array. // tr no longer required trace_array_put(); Signed-off-by: Divya Indi Reviewed-By: Aruna Ramakrishna --- include/linux/trace.h | 1 + kernel/trace/trace.c | 43 ++- 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/include/linux/trace.h b/include/linux/trace.h index 24fcf07..2c782d5 100644 --- a/include/linux/trace.h +++ b/include/linux/trace.h @@ -31,6 +31,7 @@ int trace_array_printk(struct trace_array *tr, unsigned long ip, const char *fmt, ...); struct trace_array *trace_array_create(const char *name); int trace_array_destroy(struct trace_array *tr); +void trace_array_put(struct trace_array *tr); #endif /* CONFIG_TRACING */ #endif /* _LINUX_TRACE_H */ diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 22bf166..130579b 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -297,12 +297,22 @@ static void __trace_array_put(struct trace_array *this_tr) this_tr->ref--; } +/** + * trace_array_put - Decrement reference counter for the given trace array. + * @tr: Trace array for which reference counter needs to decrement. + * + * NOTE: Functions like trace_array_create increment the reference counter for + * the trace array to ensure they do not get freed while in use. Make sure to + * call trace_array_put() when the use is done. This will ensure that the + * instance can be later removed. + */ void trace_array_put(struct trace_array *this_tr) { mutex_lock(_types_lock); __trace_array_put(this_tr); mutex_unlock(_types_lock); } +EXPORT_SYMBOL_GPL(trace_array_put); int call_filter_check_discard(struct trace_event_call *call, void *rec, struct ring_buffer *buffer, @@ -8302,6 +8312,16 @@ static void update_tracer_options(struct trace_array *tr) mutex_unlock(_types_lock); } +/** + * trace_array_create - Create a new trace array with the given name. + * @name: The name of the trace array to be created. + * + * Create and return a trace array with given name if it does not exist. + * + * NOTE: The reference counter associated with the returned trace array is + * incremented to ensure it is not freed when in use. Make sure to call + * "trace_array_put" for this trace array when its use is done. + */ struct trace_array *trace_array_create(const char *name) { struct trace_array *tr; @@ -8364,6 +8384,8 @@ struct trace_array *trace_array_create(const char *name) list_add(>list, _trace_arrays); + tr->ref++; + mutex_unlock(_types_lock); mutex_unlock(_mutex); @@ -8385,7 +8407,21 @@ struct trace_array *trace_array_create(const char *name) static int instance_mkdir(const char *name) { - return PTR_ERR_OR_ZERO(trace_array_create(name)); + struct trace_array *tr; + + tr = trace_array_create(name); + if (IS_ERR(tr)) + return PTR_ERR(tr); + + /* +* This function does not return a reference to the created trace array, +* so the reference counter is to be 0 when it returns. +* trace_array_create increments the ref counter, decrement it here +* by calling trace_array_put() +*/ + trace_array_put(tr); + + return 0; } static int __remove_instance(struct trace_array *tr) @@ -8424,6 +8460,11 @@ static int __remove_instance(struct trace_array *tr) return 0; } +/* + * NOTE: An instance cannot be removed if there is still a reference to it. + * Make sure to call "trace_array_put" for a trace array returned by functions + * like trace_array_create(), otherwise trace_array_destroy will not succeed. + */ int trace_array_destroy(struct trace_array *this_tr) { struct trace_array *tr; -- 1.8.3.1
[PATCH 4/7] tracing: Adding NULL checks
As part of commit f45d1225adb0 ("tracing: Kernel access to Ftrace instances") we exported certain functions. Here, we are adding some additional NULL checks to ensure safe usage by users of these APIs. Signed-off-by: Divya Indi Reviewed-By: Aruna Ramakrishna --- kernel/trace/trace.c| 3 +++ kernel/trace/trace_events.c | 2 ++ 2 files changed, 5 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 468ecc5..22bf166 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -3205,6 +3205,9 @@ int trace_array_printk(struct trace_array *tr, if (!(global_trace.trace_flags & TRACE_ITER_PRINTK)) return 0; + if (!tr) + return -ENOENT; + va_start(ap, fmt); ret = trace_array_vprintk(tr, ip, fmt, ap); va_end(ap); diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index b6b4618..99eb5f8 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -800,6 +800,8 @@ int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) char *event = NULL, *sub = NULL, *match; int ret; + if (!tr) + return -ENOENT; /* * The buf format can be : * *: means any event by that name. -- 1.8.3.1
[PATCH 6/7] tracing: New functions for kernel access to Ftrace instances
Adding 2 new functions - 1) trace_array_lookup : Look up and return a trace array, given its name. 2) trace_array_set_clr_event : Enable/disable event recording to the given trace array. NOTE: trace_array_lookup returns a trace array and also increments the reference counter associated with the returned trace array. Make sure to call trace_array_put() once the use is done so that the instance can be removed at a later time. Example use: tr = trace_array_lookup("foo-bar"); if (!tr) tr = trace_array_create("foo-bar"); // Log to tr // Enable/disable events to tr trace_array_set_clr_event(tr, _THIS_IP,"system","event",1); // Done using tr trace_array_put(tr); .. Signed-off-by: Divya Indi Reviewed-By: Aruna Ramakrishna --- include/linux/trace.h| 2 ++ include/linux/trace_events.h | 2 ++ kernel/trace/trace.c | 28 kernel/trace/trace_events.c | 22 ++ 4 files changed, 54 insertions(+) diff --git a/include/linux/trace.h b/include/linux/trace.h index 2c782d5..05164bb 100644 --- a/include/linux/trace.h +++ b/include/linux/trace.h @@ -32,6 +32,8 @@ int trace_array_printk(struct trace_array *tr, unsigned long ip, struct trace_array *trace_array_create(const char *name); int trace_array_destroy(struct trace_array *tr); void trace_array_put(struct trace_array *tr); +struct trace_array *trace_array_lookup(const char *name); + #endif /* CONFIG_TRACING */ #endif /* _LINUX_TRACE_H */ diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 0f874fb..6da3600 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -540,6 +540,8 @@ extern int trace_define_field(struct trace_event_call *call, const char *type, #define is_signed_type(type) (((type)(-1)) < (type)1) int trace_set_clr_event(const char *system, const char *event, int set); +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set); int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); /* diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 130579b..fcf42bb 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8516,6 +8516,34 @@ static int instance_rmdir(const char *name) return ret; } +/** + * trace_array_lookup - Lookup the trace array, given its name. + * @name: The name of the trace array to be looked up. + * + * Lookup and return the trace array associated with @name. + * + * NOTE: The reference counter associated with the returned trace array is + * incremented to ensure it is not freed when in use. Make sure to call + * "trace_array_put" for this trace array when its use is done. + */ +struct trace_array *trace_array_lookup(const char *name) +{ + struct trace_array *tr; + + mutex_lock(_types_lock); + + list_for_each_entry(tr, _trace_arrays, list) { + if (tr->name && strcmp(tr->name, name) == 0) { + tr->ref++; + mutex_unlock(_types_lock); + return tr; + } + } + mutex_unlock(_types_lock); + return NULL; +} +EXPORT_SYMBOL_GPL(trace_array_lookup); + static __init void create_trace_instances(struct dentry *d_tracer) { trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer, diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 99eb5f8..9ee6b52 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -859,6 +859,28 @@ int trace_set_clr_event(const char *system, const char *event, int set) } EXPORT_SYMBOL_GPL(trace_set_clr_event); +/** + * trace_array_set_clr_event - enable or disable an event for a trace array + * @system: system name to match (NULL for any system) + * @event: event name to match (NULL for all events, within system) + * @set: 1 to enable, 0 to disable + * + * This is a way for other parts of the kernel to enable or disable + * event recording to instances. + * + * Returns 0 on success, -EINVAL if the parameters do not match any + * registered events. + */ +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set) +{ + if (!tr) + return -ENOENT; + + return __ftrace_set_clr_event(tr, NULL, system, event, set); +} +EXPORT_SYMBOL_GPL(trace_array_set_clr_event); + /* 128 should be much more than enough */ #define EVENT_BUF_SIZE 127 -- 1.8.3.1
[PATCH 0/7[v3]] Kernel access to Ftrace instances.
In addition to patches introduced by commit f45d1225adb0 "tracing: Kernel access to Ftrace instances") we also need the following patches to reliably access ftrace instances from other kernel modules or components. Please review the patches that follow. Divya Indi (7): tracing: Required changes to use ftrace_set_clr_event. tracing: Declare newly exported APIs in include/linux/trace.h tracing: Verify if trace array exists before destroying it. tracing: Adding NULL checks tracing: Handle the trace array ref counter in new functions tracing: New functions for kernel access to Ftrace instances tracing: Un-export ftrace_set_clr_event include/linux/trace.h| 10 + include/linux/trace_events.h | 2 + kernel/trace/trace.c | 90 ++-- kernel/trace/trace.h | 4 +- kernel/trace/trace_events.c | 25 +++- 5 files changed, 123 insertions(+), 8 deletions(-) -- 1.8.3.1
Re: [PATCH 3/3] tracing: Add 2 new funcs. for kernel access to Ftrace instances.
Hi Steven, Thanks for reviewing the patch. I have made a note of all the suggested changes, comment requirements. Will send them out soon as v2. (comments inline) On 6/17/19 4:45 PM, Steven Rostedt wrote: On Wed, 12 Jun 2019 09:34:19 -0700 Divya Indi wrote: Adding 2 new functions - 1) trace_array_lookup : Look up and return a trace array, given its name. 2) trace_array_set_clr_event : Enable/disable event recording to the given trace array. Newly added functions trace_array_lookup and trace_array_create also need to increment the reference counter associated with the trace array they return. This is to ensure the trace array does not get freed while in use by the newly introduced APIs. The reference ctr is decremented in the trace_array_destroy. Signed-off-by: Divya Indi --- include/linux/trace_events.h | 3 +++ kernel/trace/trace.c | 30 +- kernel/trace/trace_events.c | 22 ++ 3 files changed, 54 insertions(+), 1 deletion(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index d7b7d85..0cc99a8 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -545,7 +545,10 @@ int trace_array_printk(struct trace_array *tr, unsigned long ip, struct trace_array *trace_array_create(const char *name); int trace_array_destroy(struct trace_array *tr); int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); +struct trace_array *trace_array_lookup(const char *name); int trace_set_clr_event(const char *system, const char *event, int set); +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set); /* * The double __builtin_constant_p is because gcc will give us an error diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index a60dc13..fb70ccc 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8364,6 +8364,8 @@ struct trace_array *trace_array_create(const char *name) list_add(>list, _trace_arrays); + tr->ref++; + Needs a comment for why the ref is incremented. mutex_unlock(_types_lock); mutex_unlock(_mutex); @@ -8385,7 +8387,14 @@ struct trace_array *trace_array_create(const char *name) static int instance_mkdir(const char *name) { - return PTR_ERR_OR_ZERO(trace_array_create(name)); + struct trace_array *tr; + + tr = trace_array_create(name); + if (IS_ERR(tr)) + return PTR_ERR(tr); + trace_array_put(tr); Need a comment to why we need to do a put here. + + return 0; } static int __remove_instance(struct trace_array *tr) @@ -8434,6 +8443,7 @@ int trace_array_destroy(struct trace_array *tr) mutex_lock(_mutex); mutex_lock(_types_lock); + tr->ref--; ret = __remove_instance(tr); mutex_unlock(_types_lock); @@ -8465,6 +8475,24 @@ static int instance_rmdir(const char *name) return ret; } Need a kerneldoc heading here, that also states that if a trace_array is returned, it requires a call to trace_array_put(). +struct trace_array *trace_array_lookup(const char *name) +{ + struct trace_array *tr; + + mutex_lock(_types_lock); + + list_for_each_entry(tr, _trace_arrays, list) { + if (tr->name && strcmp(tr->name, name) == 0) { + tr->ref++; + mutex_unlock(_types_lock); + return tr; + } + } + mutex_unlock(_types_lock); + return NULL; +} +EXPORT_SYMBOL_GPL(trace_array_lookup); + static __init void create_trace_instances(struct dentry *d_tracer) { trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer, diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 445b059..c126d2c 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -859,6 +859,28 @@ int trace_set_clr_event(const char *system, const char *event, int set) } EXPORT_SYMBOL_GPL(trace_set_clr_event); +/** + * trace_array_set_clr_event - enable or disable an event for a trace array + * @system: system name to match (NULL for any system) + * @event: event name to match (NULL for all events, within system) + * @set: 1 to enable, 0 to disable + * + * This is a way for other parts of the kernel to enable or disable + * event recording to instances. + * + * Returns 0 on success, -EINVAL if the parameters do not match any + * registered events. + */ +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set) +{ + if (!tr) + return -ENODEV; If we have this, should we get rid of the external use of ftrace_set_clr_event()? I think so too. External use of trace_array_set_clr_event is more convenient and intuitive. Will make the change. Thanks, Divya -- Steve + + return __f
[PATCH 3/3] tracing: Add 2 new funcs. for kernel access to Ftrace instances.
Adding 2 new functions - 1) trace_array_lookup : Look up and return a trace array, given its name. 2) trace_array_set_clr_event : Enable/disable event recording to the given trace array. Newly added functions trace_array_lookup and trace_array_create also need to increment the reference counter associated with the trace array they return. This is to ensure the trace array does not get freed while in use by the newly introduced APIs. The reference ctr is decremented in the trace_array_destroy. Signed-off-by: Divya Indi --- include/linux/trace_events.h | 3 +++ kernel/trace/trace.c | 30 +- kernel/trace/trace_events.c | 22 ++ 3 files changed, 54 insertions(+), 1 deletion(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index d7b7d85..0cc99a8 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -545,7 +545,10 @@ int trace_array_printk(struct trace_array *tr, unsigned long ip, struct trace_array *trace_array_create(const char *name); int trace_array_destroy(struct trace_array *tr); int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); +struct trace_array *trace_array_lookup(const char *name); int trace_set_clr_event(const char *system, const char *event, int set); +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set); /* * The double __builtin_constant_p is because gcc will give us an error diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index a60dc13..fb70ccc 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8364,6 +8364,8 @@ struct trace_array *trace_array_create(const char *name) list_add(>list, _trace_arrays); + tr->ref++; + mutex_unlock(_types_lock); mutex_unlock(_mutex); @@ -8385,7 +8387,14 @@ struct trace_array *trace_array_create(const char *name) static int instance_mkdir(const char *name) { - return PTR_ERR_OR_ZERO(trace_array_create(name)); + struct trace_array *tr; + + tr = trace_array_create(name); + if (IS_ERR(tr)) + return PTR_ERR(tr); + trace_array_put(tr); + + return 0; } static int __remove_instance(struct trace_array *tr) @@ -8434,6 +8443,7 @@ int trace_array_destroy(struct trace_array *tr) mutex_lock(_mutex); mutex_lock(_types_lock); + tr->ref--; ret = __remove_instance(tr); mutex_unlock(_types_lock); @@ -8465,6 +8475,24 @@ static int instance_rmdir(const char *name) return ret; } +struct trace_array *trace_array_lookup(const char *name) +{ + struct trace_array *tr; + + mutex_lock(_types_lock); + + list_for_each_entry(tr, _trace_arrays, list) { + if (tr->name && strcmp(tr->name, name) == 0) { + tr->ref++; + mutex_unlock(_types_lock); + return tr; + } + } + mutex_unlock(_types_lock); + return NULL; +} +EXPORT_SYMBOL_GPL(trace_array_lookup); + static __init void create_trace_instances(struct dentry *d_tracer) { trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer, diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 445b059..c126d2c 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -859,6 +859,28 @@ int trace_set_clr_event(const char *system, const char *event, int set) } EXPORT_SYMBOL_GPL(trace_set_clr_event); +/** + * trace_array_set_clr_event - enable or disable an event for a trace array + * @system: system name to match (NULL for any system) + * @event: event name to match (NULL for all events, within system) + * @set: 1 to enable, 0 to disable + * + * This is a way for other parts of the kernel to enable or disable + * event recording to instances. + * + * Returns 0 on success, -EINVAL if the parameters do not match any + * registered events. + */ +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set) +{ + if (!tr) + return -ENODEV; + + return __ftrace_set_clr_event(tr, NULL, system, event, set); +} +EXPORT_SYMBOL_GPL(trace_array_set_clr_event); + /* 128 should be much more than enough */ #define EVENT_BUF_SIZE 127 -- 1.8.3.1
[PATCH 2/3] tracing: Adding additional NULL checks.
commit f45d1225adb0 ("tracing: Kernel access to Ftrace instances") exported certain functions providing access to Ftrace instances from other kernel components. Adding some additional NULL checks to ensure safe usage by the users. Signed-off-by: Divya Indi --- kernel/trace/trace.c| 3 +++ kernel/trace/trace_events.c | 2 ++ 2 files changed, 5 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 1c80521..a60dc13 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -3205,6 +3205,9 @@ int trace_array_printk(struct trace_array *tr, if (!(global_trace.trace_flags & TRACE_ITER_PRINTK)) return 0; + if (!tr) + return -EINVAL; + va_start(ap, fmt); ret = trace_array_vprintk(tr, ip, fmt, ap); va_end(ap); diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index b6b4618..445b059 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -800,6 +800,8 @@ int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) char *event = NULL, *sub = NULL, *match; int ret; + if (!tr) + return -ENODEV; /* * The buf format can be : * *: means any event by that name. -- 1.8.3.1
[PATCH 1/3] tracing: Relevant changes for kernel access to Ftrace instances.
For commit f45d1225adb0 ("tracing: Kernel access to Ftrace instances") Adding the following changes to ensure other kernel components can use these functions - 1) Remove static keyword for newly exported fn - ftrace_set_clr_event. 2) Add the req functions to header file include/linux/trace_events.h. Signed-off-by: Divya Indi --- include/linux/trace_events.h | 6 ++ kernel/trace/trace_events.c | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 8a62731..d7b7d85 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -539,6 +539,12 @@ extern int trace_define_field(struct trace_event_call *call, const char *type, #define is_signed_type(type) (((type)(-1)) < (type)1) +void trace_printk_init_buffers(void); +int trace_array_printk(struct trace_array *tr, unsigned long ip, + const char *fmt, ...); +struct trace_array *trace_array_create(const char *name); +int trace_array_destroy(struct trace_array *tr); +int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); int trace_set_clr_event(const char *system, const char *event, int set); /* diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 0ce3db6..b6b4618 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -795,7 +795,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, const char *match, return ret; } -static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) +int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) { char *event = NULL, *sub = NULL, *match; int ret; -- 1.8.3.1
[RFC][v2] Kernel Access to Ftrace Instances.
Hi, Please review the patches that follow - [PATCH 1/3] tracing: Relevant changes for kernel access to Ftrace instances. [PATCH 2/3] tracing: Adding additional NULL checks. [PATCH 3/3] tracing: Add 2 new funcs. for kernel access to Ftrace instances. This is v2 for the series with changes made to Patch 3/3 to address review comments. The new changes ensure that a trace array created by trace_array_create or accessed via trace_array_lookup cannot be freed if still in use. Let me know if you have any questions/concerns/suggestions. Thanks, Divya
Re: [PATCH 3/3] tracing: Add 2 new funcs. for kernel access to Ftrace instances.
Hi Steven, Thanks for taking the time to review. Please find my comments inline. On 6/8/19 2:51 PM, Steven Rostedt wrote: On Tue, 4 Jun 2019 17:42:05 -0700 Divya Indi wrote: Adding 2 new functions - 1) trace_array_lookup : Look up and return a trace array, given its name. 2) trace_array_set_clr_event : Enable/disable event recording to the given trace array. Signed-off-by: Divya Indi --- include/linux/trace_events.h | 3 +++ kernel/trace/trace.c | 11 +++ kernel/trace/trace_events.c | 22 ++ 3 files changed, 36 insertions(+) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index d7b7d85..0cc99a8 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -545,7 +545,10 @@ int trace_array_printk(struct trace_array *tr, unsigned long ip, struct trace_array *trace_array_create(const char *name); int trace_array_destroy(struct trace_array *tr); int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); +struct trace_array *trace_array_lookup(const char *name); int trace_set_clr_event(const char *system, const char *event, int set); +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set); /* * The double __builtin_constant_p is because gcc will give us an error diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index a60dc13..1d171fd 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8465,6 +8465,17 @@ static int instance_rmdir(const char *name) return ret; } +struct trace_array *trace_array_lookup(const char *name) +{ + struct trace_array *tr; + list_for_each_entry(tr, _trace_arrays, list) { Accessing the ftrace_trace_arrays requires taking the trace_types_lock. It should also probably increment the ref counter too, and then trace_array_put() needs to be called. This prevents the trace array from being freed while something has access to it. -- Steve Agree - Noted! Also, adding a similar change for trace_array_create which also returns a ptr to a newly created trace_array so will face the same issue. Since trace_array_lookup and trace_array_create will be accompanied by a trace_array_destroy once the use of the trace_array is done, decrementing the reference ctr here. Sending a v2 to address this. Thanks, Divya + if (tr->name && strcmp(tr->name, name) == 0) + return tr; + } + return NULL; +} +EXPORT_SYMBOL_GPL(trace_array_lookup); + static __init void create_trace_instances(struct dentry *d_tracer) { trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer, diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 445b059..c126d2c 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -859,6 +859,28 @@ int trace_set_clr_event(const char *system, const char *event, int set) } EXPORT_SYMBOL_GPL(trace_set_clr_event); +/** + * trace_array_set_clr_event - enable or disable an event for a trace array + * @system: system name to match (NULL for any system) + * @event: event name to match (NULL for all events, within system) + * @set: 1 to enable, 0 to disable + * + * This is a way for other parts of the kernel to enable or disable + * event recording to instances. + * + * Returns 0 on success, -EINVAL if the parameters do not match any + * registered events. + */ +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set) +{ + if (!tr) + return -ENODEV; + + return __ftrace_set_clr_event(tr, NULL, system, event, set); +} +EXPORT_SYMBOL_GPL(trace_array_set_clr_event); + /* 128 should be much more than enough */ #define EVENT_BUF_SIZE127
[RFC] Kernel Access to Ftrace instances.
Hi, Please Review the patches that follow. These include - [PATCH 1/3] tracing: Relevant changes for kernel access to Ftrace instances. [PATCH 2/3] tracing: Adding additional NULL checks. [PATCH 3/3] tracing: Add 2 new funcs. for kernel access to Ftrace instances. Let me know if you have any concerns or questions. A sample module demonstrating the use of the above functions will follow soon. Thanks, Divya
[PATCH 2/3] tracing: Adding additional NULL checks.
Now that we have exported certain functions providing access to Ftrace instances from other kernel components, we are adding some additional NULL checks to ensure safe usage by the users. Signed-off-by: Divya Indi --- kernel/trace/trace.c| 3 +++ kernel/trace/trace_events.c | 2 ++ 2 files changed, 5 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 1c80521..a60dc13 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -3205,6 +3205,9 @@ int trace_array_printk(struct trace_array *tr, if (!(global_trace.trace_flags & TRACE_ITER_PRINTK)) return 0; + if (!tr) + return -EINVAL; + va_start(ap, fmt); ret = trace_array_vprintk(tr, ip, fmt, ap); va_end(ap); diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index b6b4618..445b059 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -800,6 +800,8 @@ int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) char *event = NULL, *sub = NULL, *match; int ret; + if (!tr) + return -ENODEV; /* * The buf format can be : * *: means any event by that name. -- 1.8.3.1
[PATCH 1/3] tracing: Relevant changes for kernel access to Ftrace instances.
For commit (f45d122): tracing: Kernel access to Ftrace instances. We need the following additional changes to ensure other kernel components can use these functions - 1) Remove static keyword for newly exported fn - ftrace_set_clr_event. 2) Add the req functions to header file include/linux/trace_events.h. Signed-off-by: Divya Indi --- include/linux/trace_events.h | 6 ++ kernel/trace/trace_events.c | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 8a62731..d7b7d85 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -539,6 +539,12 @@ extern int trace_define_field(struct trace_event_call *call, const char *type, #define is_signed_type(type) (((type)(-1)) < (type)1) +void trace_printk_init_buffers(void); +int trace_array_printk(struct trace_array *tr, unsigned long ip, + const char *fmt, ...); +struct trace_array *trace_array_create(const char *name); +int trace_array_destroy(struct trace_array *tr); +int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); int trace_set_clr_event(const char *system, const char *event, int set); /* diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 0ce3db6..b6b4618 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -795,7 +795,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, const char *match, return ret; } -static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) +int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) { char *event = NULL, *sub = NULL, *match; int ret; -- 1.8.3.1
[PATCH 3/3] tracing: Add 2 new funcs. for kernel access to Ftrace instances.
Adding 2 new functions - 1) trace_array_lookup : Look up and return a trace array, given its name. 2) trace_array_set_clr_event : Enable/disable event recording to the given trace array. Signed-off-by: Divya Indi --- include/linux/trace_events.h | 3 +++ kernel/trace/trace.c | 11 +++ kernel/trace/trace_events.c | 22 ++ 3 files changed, 36 insertions(+) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index d7b7d85..0cc99a8 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -545,7 +545,10 @@ int trace_array_printk(struct trace_array *tr, unsigned long ip, struct trace_array *trace_array_create(const char *name); int trace_array_destroy(struct trace_array *tr); int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); +struct trace_array *trace_array_lookup(const char *name); int trace_set_clr_event(const char *system, const char *event, int set); +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set); /* * The double __builtin_constant_p is because gcc will give us an error diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index a60dc13..1d171fd 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8465,6 +8465,17 @@ static int instance_rmdir(const char *name) return ret; } +struct trace_array *trace_array_lookup(const char *name) +{ + struct trace_array *tr; + list_for_each_entry(tr, _trace_arrays, list) { + if (tr->name && strcmp(tr->name, name) == 0) + return tr; + } + return NULL; +} +EXPORT_SYMBOL_GPL(trace_array_lookup); + static __init void create_trace_instances(struct dentry *d_tracer) { trace_instance_dir = tracefs_create_instance_dir("instances", d_tracer, diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 445b059..c126d2c 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -859,6 +859,28 @@ int trace_set_clr_event(const char *system, const char *event, int set) } EXPORT_SYMBOL_GPL(trace_set_clr_event); +/** + * trace_array_set_clr_event - enable or disable an event for a trace array + * @system: system name to match (NULL for any system) + * @event: event name to match (NULL for all events, within system) + * @set: 1 to enable, 0 to disable + * + * This is a way for other parts of the kernel to enable or disable + * event recording to instances. + * + * Returns 0 on success, -EINVAL if the parameters do not match any + * registered events. + */ +int trace_array_set_clr_event(struct trace_array *tr, const char *system, + const char *event, int set) +{ + if (!tr) + return -ENODEV; + + return __ftrace_set_clr_event(tr, NULL, system, event, set); +} +EXPORT_SYMBOL_GPL(trace_array_set_clr_event); + /* 128 should be much more than enough */ #define EVENT_BUF_SIZE 127 -- 1.8.3.1
Re: [PATCH] tracing: Kernel access to Ftrace instances
On 05/20/2019 08:23 AM, Steven Rostedt wrote: On Wed, 15 May 2019 20:24:43 -0700 Steven Rostedt wrote: It seems your's already in Steve's ftrace/core branch, so I think you can make additional patch to fix it. Steve, is that OK? Yes. In fact I already sent a pull request to Linus. Please send a patch on top of my ftrace/core branch. And now it is in mainline (v5.2-rc1). Please send a patch with a sample module (as Masami requested). Also, that function not only needs to be changed to not being static, you need to add it to a header (include/linux/trace_events.h?) Thanks! Working on it. Will send it out soon. Thanks, Divya -- Steve
Re: [PATCH] tracing: Kernel access to Ftrace instances
Hi Masami, Thanks for pointing it out. Yes, it should not be static. On 5/15/19 5:09 PM, Masami Hiramatsu wrote: HI Divya, On Wed, 20 Mar 2019 11:28:51 -0700 Divya Indi wrote: Ftrace provides the feature “instances” that provides the capability to create multiple Ftrace ring buffers. However, currently these buffers are created/accessed via userspace only. The kernel APIs providing these features are not exported, hence cannot be used by other kernel components. This patch aims to extend this infrastructure to provide the flexibility to create/log/remove/ enable-disable existing trace events to these buffers from within the kernel. Signed-off-by: Divya Indi Reviewed-by: Joe Jin Would you tested these APIs with your module? Since, [...] diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 5b3b0c3..81c038e 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -832,6 +832,7 @@ static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) return ret; } +EXPORT_SYMBOL_GPL(ftrace_set_clr_event); I found this exports a static function to module. Did it work? I had tested the changes with my module. This change to static was added in the test patch, but somehow missed it in the final patch that was sent out. Will send a new patch along with a few additional ones to add some NULL checks to ensure safe usage by modules and add the APIs to a header file that can be used by the modules. Thanks, Divya Thank you,
[PATCH] tracing: Kernel access to Ftrace instances
Ftrace provides the feature “instances” that provides the capability to create multiple Ftrace ring buffers. However, currently these buffers are created/accessed via userspace only. The kernel APIs providing these features are not exported, hence cannot be used by other kernel components. This patch aims to extend this infrastructure to provide the flexibility to create/log/remove/ enable-disable existing trace events to these buffers from within the kernel. Signed-off-by: Divya Indi Reviewed-by: Joe Jin --- kernel/trace/trace.c| 74 ++--- kernel/trace/trace_events.c | 1 + 2 files changed, 51 insertions(+), 24 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c4238b4..eaf163a 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2878,6 +2878,7 @@ void trace_printk_init_buffers(void) if (global_trace.trace_buffer.buffer) tracing_start_cmdline_record(); } +EXPORT_SYMBOL_GPL(trace_printk_init_buffers); void trace_printk_start_comm(void) { @@ -3038,6 +3039,7 @@ int trace_array_printk(struct trace_array *tr, va_end(ap); return ret; } +EXPORT_SYMBOL_GPL(trace_array_printk); __printf(3, 4) int trace_array_printk_buf(struct ring_buffer *buffer, @@ -7832,7 +7834,7 @@ static void update_tracer_options(struct trace_array *tr) mutex_unlock(_types_lock); } -static int instance_mkdir(const char *name) +struct trace_array *trace_array_create(const char *name) { struct trace_array *tr; int ret; @@ -7896,7 +7898,7 @@ static int instance_mkdir(const char *name) mutex_unlock(_types_lock); mutex_unlock(_mutex); - return 0; + return tr; out_free_tr: free_trace_buffers(tr); @@ -7908,33 +7910,21 @@ static int instance_mkdir(const char *name) mutex_unlock(_types_lock); mutex_unlock(_mutex); - return ret; + return ERR_PTR(ret); +} +EXPORT_SYMBOL_GPL(trace_array_create); +static int instance_mkdir(const char *name) +{ + return PTR_ERR_OR_ZERO(trace_array_create(name)); } -static int instance_rmdir(const char *name) +static int __remove_instance(struct trace_array *tr) { - struct trace_array *tr; - int found = 0; - int ret; int i; - mutex_lock(_mutex); - mutex_lock(_types_lock); - - ret = -ENODEV; - list_for_each_entry(tr, _trace_arrays, list) { - if (tr->name && strcmp(tr->name, name) == 0) { - found = 1; - break; - } - } - if (!found) - goto out_unlock; - - ret = -EBUSY; if (tr->ref || (tr->current_trace && tr->current_trace->ref)) - goto out_unlock; + return -EBUSY; list_del(>list); @@ -7960,10 +7950,46 @@ static int instance_rmdir(const char *name) free_cpumask_var(tr->tracing_cpumask); kfree(tr->name); kfree(tr); + tr = NULL; - ret = 0; + return 0; +} + +int trace_array_destroy(struct trace_array *tr) +{ + int ret; + + if (!tr) + return -EINVAL; + + mutex_lock(_mutex); + mutex_lock(_types_lock); + + ret = __remove_instance(tr); + + mutex_unlock(_types_lock); + mutex_unlock(_mutex); + + return ret; +} +EXPORT_SYMBOL_GPL(trace_array_destroy); + +static int instance_rmdir(const char *name) +{ + struct trace_array *tr; + int ret; + + mutex_lock(_mutex); + mutex_lock(_types_lock); + + ret = -ENODEV; + list_for_each_entry(tr, _trace_arrays, list) { + if (tr->name && strcmp(tr->name, name) == 0) { + ret = __remove_instance(tr); + break; + } + } - out_unlock: mutex_unlock(_types_lock); mutex_unlock(_mutex); diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 5b3b0c3..81c038e 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -832,6 +832,7 @@ static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) return ret; } +EXPORT_SYMBOL_GPL(ftrace_set_clr_event); /** * trace_set_clr_event - enable or disable an event -- 1.8.3.1
[RFC] Kernel access to Ftrace instances.
[PATCH] tracing: Kernel access to Ftrace instances. Please review the patch that follows. Below are some details providing the goal and justification for the patch. === Goal: Ftrace provides the feature “instances” that provides the capability to create multiple Ftrace ring buffers. However, currently these buffers are created/accessed via userspace only. The kernel APIs providing these features are not exported, hence cannot be used by other kernel components. We want to extend this infrastructure to provide the flexibility to create/log/remove/ enable-disable existing trace events to these buffers from within the kernel. Justification: 1. We need module-specific/use-case specific ring buffers (apart from the global trace buffer) to avoid overwrite by other components. Hence, the need to use Ftrace "instances". 2. Flexibility to add additional logging to these module-specific buffers via ksplice/live patch - Having a trace_printk counterpart for these additional ring buffers. 3. Most often certain issues and events can be best monitored within kernel. 4. Time sensitivity - We need the capability to dynamically enable and disable tracing from within kernel to extract relevant debugging info for the right time-window. Example: When the kernel detects an unexpected event such as connection drop (Eg: RDS/NFS connection drops), we need the capability to enable specific event tracing to capture relevant info during reconnect. This event tracing will help us diagnose issues that occur during reconnect like RCA longer reconnect times. In such cases we also want to disable the tracing at the right moment and capture a snapshot from within kernel to make sure we have the relevant diagnostics data and nothing is overwritten or lost. Note: The additional logging is not part of the kernel. We intend to only provide the flexibility to add the logging as part of diagnostics via ksplice/live-patch on need-basis. Please find below the compilation of APIs to be introduced or exported as is. We propose adding two new functions: 1. struct trace_array *trace_array_create(const char *name); 2. int trace_array_destroy(struct trace_array *tr); In addition, we need to export functions: 3. int trace_array_printk(struct trace_array *tr, unsigned long ip, const char *fmt, ...); 4. int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); 5. void trace_printk_init_buffers(void); To workaround the redundancy due to the newly introduced APIs, we propose the following restructuring - 1. Move the contents of instance_mkdir to the new API. static int instance_mkdir(const char *name) { return PTR_ERR_OR_ZERO(trace_array_create(name)); } 2. Introduce internal static function: __remove_instance(struct trace_array *tr) This will be almost similar to old instance_rmdir which identified the trace_array to be removed based on the name. Modify existing API to use the internal function: static int instance_rmdir(const char *name) { struct trace_array *tr; int err = -ENODEV; mutex_lock(_mutex); mutex_lock(_types_lock); list_for_each_entry(tr, _trace_arrays, list) { if (tr->name && strcmp(tr->name, name) == 0) { err = __remove_instance(tr); break; } } mutex_unlock(_types_lock); mutex_unlock(_mutex); return err; } New API to be exported: int trace_array_destroy(struct trace_array *tr) { int err; mutex_lock(_mutex); mutex_lock(_types_lock); err = __remove_instance(tr); mutex_unlock(_types_lock); mutex_unlock(_mutex); return err; } Thanks, Divya
Re: [PATCH] tracing: Kernel access to ftrace instances
Sure, thanks for taking the time to review. There are a few issues with the patch styling and some minor modifications. I will shortly send the v2 for the same. Regards, Divya On 03/19/2019 01:16 PM, Steven Rostedt wrote: On Fri, 15 Mar 2019 16:35:25 -0700 Divya Indi wrote: Ftrace provides the feature “instances” that provides the capability to create multiple Ftrace ring buffers. However, currently these buffers are created/accessed via userspace only. The kernel APIs providing these features are not exported, hence cannot be used by other kernel components. This patch aims to extend this infrastructure to provide the flexibility to create/log/remove/ enable-disable existing trace events to these buffers from within the kernel. Thanks for sending this. I'm currently working on some other changes, but will look in this when I'm finished with the other work. Should be within this week. -- Steve
[PATCH] tracing: Kernel access to ftrace instances
Ftrace provides the feature “instances” that provides the capability to create multiple Ftrace ring buffers. However, currently these buffers are created/accessed via userspace only. The kernel APIs providing these features are not exported, hence cannot be used by other kernel components. This patch aims to extend this infrastructure to provide the flexibility to create/log/remove/ enable-disable existing trace events to these buffers from within the kernel. Signed-off-by: Divya Indi --- kernel/trace/trace.c| 72 + kernel/trace/trace_events.c | 1 + 2 files changed, 48 insertions(+), 25 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c4238b4..a5e7e51 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2878,6 +2878,7 @@ void trace_printk_init_buffers(void) if (global_trace.trace_buffer.buffer) tracing_start_cmdline_record(); } +EXPORT_SYMBOL_GPL(trace_printk_init_buffers); void trace_printk_start_comm(void) { @@ -3038,6 +3039,7 @@ int trace_array_printk(struct trace_array *tr, va_end(ap); return ret; } +EXPORT_SYMBOL_GPL(trace_array_printk); __printf(3, 4) int trace_array_printk_buf(struct ring_buffer *buffer, @@ -7832,7 +7834,7 @@ static void update_tracer_options(struct trace_array *tr) mutex_unlock(_types_lock); } -static int instance_mkdir(const char *name) +struct trace_array *trace_array_create(const char *name) { struct trace_array *tr; int ret; @@ -7896,7 +7898,7 @@ static int instance_mkdir(const char *name) mutex_unlock(_types_lock); mutex_unlock(_mutex); - return 0; + return tr; out_free_tr: free_trace_buffers(tr); @@ -7908,33 +7910,21 @@ static int instance_mkdir(const char *name) mutex_unlock(_types_lock); mutex_unlock(_mutex); - return ret; + return ERR_PTR(ret); +} +EXPORT_SYMBOL_GPL(trace_array_create); +static int instance_mkdir(const char *name) +{ + return PTR_ERR_OR_ZERO(trace_array_create(name)); } -static int instance_rmdir(const char *name) +static int __remove_instance(struct trace_array *tr) { - struct trace_array *tr; - int found = 0; - int ret; int i; - - mutex_lock(_mutex); - mutex_lock(_types_lock); - - ret = -ENODEV; - list_for_each_entry(tr, _trace_arrays, list) { - if (tr->name && strcmp(tr->name, name) == 0) { - found = 1; - break; - } - } - if (!found) - goto out_unlock; - - ret = -EBUSY; + if (tr->ref || (tr->current_trace && tr->current_trace->ref)) - goto out_unlock; + return -EBUSY; list_del(>list); @@ -7961,9 +7951,41 @@ static int instance_rmdir(const char *name) kfree(tr->name); kfree(tr); - ret = 0; + return 0; +} - out_unlock: +int trace_array_destroy(struct trace_array *tr) +{ + int ret; + + mutex_lock(_mutex); + mutex_lock(_types_lock); + + ret = __remove_instance(tr); + + mutex_unlock(_types_lock); + mutex_unlock(_mutex); + + return ret; +} +EXPORT_SYMBOL_GPL(trace_array_destroy); + +static int instance_rmdir(const char *name) +{ + struct trace_array *tr; + int ret; + + mutex_lock(_mutex); + mutex_lock(_types_lock); + + ret = -ENODEV; + list_for_each_entry(tr, _trace_arrays, list) { + if (tr->name && strcmp(tr->name, name) == 0) { + ret = __remove_instance(tr); + break; + } + } + mutex_unlock(_types_lock); mutex_unlock(_mutex); diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 5b3b0c3..81c038e 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -832,6 +832,7 @@ static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) return ret; } +EXPORT_SYMBOL_GPL(ftrace_set_clr_event); /** * trace_set_clr_event - enable or disable an event -- 1.8.3.1
[RFC] Kernel access to Ftrace instances.
[PATCH] tracing: Kernel access to ftrace instances. Please review the patch that follows. Below are some details providing the goal and justification for the patch. === Goal: Ftrace provides the feature “instances” that provides the capability to create multiple Ftrace ring buffers. However, currently these buffers are created/accessed via userspace only. The kernel APIs providing these features are not exported, hence cannot be used by other kernel components. We want to extend this infrastructure to provide the flexibility to create/log/remove/ enable-disable existing trace events to these buffers from within the kernel. Justification: 1. We need module-specific/use-case specific ring buffers (apart from the global trace buffer) to avoid overwrite by other components. Hence, the need to use Ftrace "instances". 2. Flexibility to add additional logging to these module-specific buffers via ksplice/live patch - Having a trace_printk counterpart for these additional ring buffers. 3. Most often certain issues and events can be best monitored within kernel. 4. Time sensitivity - We need the capability to dynamically enable and disable tracing from within kernel to extract relevant debugging info for the right time-window. Example: When the kernel detects an unexpected event such as connection drop (Eg: RDS/NFS connection drops), we need the capability to enable specific event tracing to capture relevant info during reconnect. This event tracing will help us diagnose issues that occur during reconnect like RCA longer reconnect times. In such cases we also want to disable the tracing at the right moment and capture a snapshot from within kernel to make sure we have the relevant diagnostics data and nothing is overwritten or lost. Note: The additional logging is not part of the kernel. We intend to only provide the flexibility to add the logging as part of diagnostics via ksplice/live-patch on need-basis. Please find below the compilation of APIs to be introduced or exported as is. We propose adding two new functions: 1. struct trace_array *trace_array_create(const char *name); 2. int trace_array_destroy(struct trace_array *tr); In addition, we need to export functions: 3. int trace_array_printk(struct trace_array *tr, unsigned long ip, const char *fmt, ...); 4. int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); 5. void trace_printk_init_buffers(void); To workaround the redundancy due to the newly introduced APIs, we propose the following restructuring - 1. Move the contents of instance_mkdir to the new API. static int instance_mkdir(const char *name) { return PTR_ERR_OR_ZERO(trace_array_create(name)); } 2. Introduce internal static function: __remove_instance(struct trace_array *tr) This will be almost similar to old instance_rmdir which identified the trace_array to be removed based on the name. Modify existing API to use the internal function: static int instance_rmdir(const char *name) { struct trace_array *tr; int err = -ENODEV; mutex_lock(_mutex); mutex_lock(_types_lock); list_for_each_entry(tr, _trace_arrays, list) { if (tr->name && strcmp(tr->name, name) == 0) { err = __remove_instance(tr); break; } } mutex_unlock(_types_lock); mutex_unlock(_mutex); return err; } New API to be exported: int trace_array_destroy(struct trace_array *tr) { int err; mutex_lock(_mutex); mutex_lock(_types_lock); err = __remove_instance(tr); mutex_unlock(_types_lock); mutex_unlock(_mutex); return err; } Thanks, Divya