Re: [PATCH] target: Fix a double put in transport_free_session
On 3/26/21 7:31 AM, Maurizio Lombardi wrote: > > > Dne 23. 03. 21 v 3:58 Lv Yunlong napsal(a): >> In transport_free_session, se_nacl is got from se_sess >> with the initial reference. If se_nacl->acl_sess_list is >> empty, se_nacl->dynamic_stop is set to true. Then the first >> target_put_nacl(se_nacl) will drop the initial reference >> and free se_nacl. Later there is a second target_put_nacl() >> to put se_nacl. It may cause error in race. >> >> My patch sets se_nacl->dynamic_stop to false to avoid the >> double put. >> >> Signed-off-by: Lv Yunlong >> --- >> drivers/target/target_core_transport.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/target/target_core_transport.c >> b/drivers/target/target_core_transport.c >> index 5ecb9f18a53d..c266defe694f 100644 >> --- a/drivers/target/target_core_transport.c >> +++ b/drivers/target/target_core_transport.c >> @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess) >> } >> mutex_unlock(&se_tpg->acl_node_mutex); >> >> -if (se_nacl->dynamic_stop) >> +if (se_nacl->dynamic_stop) { >> target_put_nacl(se_nacl); >> +se_nacl->dynamic_stop = false; >> +} >> >> target_put_nacl(se_nacl); >> } >> > > FYI, > > I have received a bug report against the 5.8 kernel about task hangs that > seems to involve the nacl "dynamic_stop" code > > Mar 4 16:49:44 gzboot kernel: [186322.177819] INFO: task targetcli:2359053 > blocked for more than 120 seconds. > Mar 4 16:49:44 gzboot kernel: [186322.178862] Tainted: P O >5.8.0-44-generic #50-Ubuntu > Mar 4 16:49:44 gzboot kernel: [186322.179746] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Mar 4 16:49:44 gzboot kernel: [186322.180583] targetcli D0 2359053 > 2359052 0x > Mar 4 16:49:44 gzboot kernel: [186322.180586] Call Trace: > Mar 4 16:49:44 gzboot kernel: [186322.180592] __schedule+0x212/0x5d0 > Mar 4 16:49:44 gzboot kernel: [186322.180595] ? usleep_range+0x90/0x90 > Mar 4 16:49:44 gzboot kernel: [186322.180596] schedule+0x55/0xc0 > Mar 4 16:49:44 gzboot kernel: [186322.180597] schedule_timeout+0x10f/0x160 > Mar 4 16:49:44 gzboot kernel: [186322.180601] ? evict+0x14c/0x1b0 > Mar 4 16:49:44 gzboot kernel: [186322.180602] __wait_for_common+0xa8/0x150 > Mar 4 16:49:44 gzboot kernel: [186322.180603] wait_for_completion+0x24/0x30 > Mar 4 16:49:44 gzboot kernel: [186322.180637] > core_tpg_del_initiator_node_acl+0x8e/0x120 [target_core_mod] We would need more details. We can hit that normally when the target driver waits for stuck IO to complete before calling transport_deregister_session. I think if you hit the put bug, then we would have seen the refcount warning in refcount.h fire before the hung task because we do an extra put. Did the user switch to ACLs? Did you see my comment about it looks like there is a bug if the user were to add an acl while dynamic was used for the same initiatorname. In that case, we do not do the put to match the one taken core_tpg_check_initiator_node_acl. In that case we would hit your hang since no one ever does the last put.
Re: [PATCH] target: Fix a double put in transport_free_session
Dne 23. 03. 21 v 3:58 Lv Yunlong napsal(a): > In transport_free_session, se_nacl is got from se_sess > with the initial reference. If se_nacl->acl_sess_list is > empty, se_nacl->dynamic_stop is set to true. Then the first > target_put_nacl(se_nacl) will drop the initial reference > and free se_nacl. Later there is a second target_put_nacl() > to put se_nacl. It may cause error in race. > > My patch sets se_nacl->dynamic_stop to false to avoid the > double put. > > Signed-off-by: Lv Yunlong > --- > drivers/target/target_core_transport.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/target/target_core_transport.c > b/drivers/target/target_core_transport.c > index 5ecb9f18a53d..c266defe694f 100644 > --- a/drivers/target/target_core_transport.c > +++ b/drivers/target/target_core_transport.c > @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess) > } > mutex_unlock(&se_tpg->acl_node_mutex); > > - if (se_nacl->dynamic_stop) > + if (se_nacl->dynamic_stop) { > target_put_nacl(se_nacl); > + se_nacl->dynamic_stop = false; > + } > > target_put_nacl(se_nacl); > } > FYI, I have received a bug report against the 5.8 kernel about task hangs that seems to involve the nacl "dynamic_stop" code Mar 4 16:49:44 gzboot kernel: [186322.177819] INFO: task targetcli:2359053 blocked for more than 120 seconds. Mar 4 16:49:44 gzboot kernel: [186322.178862] Tainted: P O 5.8.0-44-generic #50-Ubuntu Mar 4 16:49:44 gzboot kernel: [186322.179746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 4 16:49:44 gzboot kernel: [186322.180583] targetcli D0 2359053 2359052 0x Mar 4 16:49:44 gzboot kernel: [186322.180586] Call Trace: Mar 4 16:49:44 gzboot kernel: [186322.180592] __schedule+0x212/0x5d0 Mar 4 16:49:44 gzboot kernel: [186322.180595] ? usleep_range+0x90/0x90 Mar 4 16:49:44 gzboot kernel: [186322.180596] schedule+0x55/0xc0 Mar 4 16:49:44 gzboot kernel: [186322.180597] schedule_timeout+0x10f/0x160 Mar 4 16:49:44 gzboot kernel: [186322.180601] ? evict+0x14c/0x1b0 Mar 4 16:49:44 gzboot kernel: [186322.180602] __wait_for_common+0xa8/0x150 Mar 4 16:49:44 gzboot kernel: [186322.180603] wait_for_completion+0x24/0x30 Mar 4 16:49:44 gzboot kernel: [186322.180637] core_tpg_del_initiator_node_acl+0x8e/0x120 [target_core_mod] Mar 4 16:49:44 gzboot kernel: [186322.180643] target_fabric_nacl_base_release+0x26/0x30 [target_core_mod] Mar 4 16:49:44 gzboot kernel: [186322.180647] config_item_cleanup+0x5d/0xf0 Mar 4 16:49:44 gzboot kernel: [186322.180649] config_item_put+0x2d/0x40 Mar 4 16:49:44 gzboot kernel: [186322.180651] configfs_rmdir+0x1d8/0x350 Mar 4 16:49:44 gzboot kernel: [186322.180653] vfs_rmdir.part.0+0x66/0x190 Mar 4 16:49:44 gzboot kernel: [186322.180654] do_rmdir+0x1b4/0x200 Mar 4 16:49:44 gzboot kernel: [186322.180655] __x64_sys_rmdir+0x17/0x20 Mar 4 16:49:44 gzboot kernel: [186322.180657] do_syscall_64+0x49/0xc0 Mar 4 16:49:44 gzboot kernel: [186322.180659] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Mar 4 16:49:44 gzboot kernel: [186322.180660] RIP: 0033:0x7f30cf1ca9eb Mar 4 16:49:44 gzboot kernel: [186322.180661] RSP: 002b:7ffd72030bd8 EFLAGS: 0246 ORIG_RAX: 0054 Mar 4 16:49:44 gzboot kernel: [186322.180662] RAX: ffda RBX: ff9c RCX: 7f30cf1ca9eb Mar 4 16:49:44 gzboot kernel: [186322.180663] RDX: RSI: RDI: 7f30cc1e2a50 Mar 4 16:49:44 gzboot kernel: [186322.180664] RBP: 00a4b7a0 R08: R09: 7ffd72030b7f Mar 4 16:49:44 gzboot kernel: [186322.180664] R10: R11: 0246 R12: 7ffd72030c00 Mar 4 16:49:44 gzboot kernel: [186322.180665] R13: 7f30cdd706a8 R14: 7f30ced00cc0 R15: 7f30cdd70698 Maurizio
Re: Re: [PATCH] target: Fix a double put in transport_free_session
> -原始邮件- > 发件人: "Mike Christie" > 发送时间: 2021-03-26 01:24:58 (星期五) > 收件人: lyl2...@mail.ustc.edu.cn > 抄送: martin.peter...@oracle.com, linux-s...@vger.kernel.org, > target-de...@vger.kernel.org, linux-kernel@vger.kernel.org, "Nilesh Javali" > > 主题: Re: [PATCH] target: Fix a double put in transport_free_session > > On 3/25/21 2:48 AM, lyl2...@mail.ustc.edu.cn wrote: > > > > > > > >> -原始邮件- > >> 发件人: michael.chris...@oracle.com > >> 发送时间: 2021-03-24 00:28:35 (星期三) > >> 收件人: "Lv Yunlong" , martin.peter...@oracle.com > >> 抄送: linux-s...@vger.kernel.org, target-de...@vger.kernel.org, > >> linux-kernel@vger.kernel.org > >> 主题: Re: [PATCH] target: Fix a double put in transport_free_session > >> > >> On 3/22/21 9:58 PM, Lv Yunlong wrote: > >>> In transport_free_session, se_nacl is got from se_sess > >>> with the initial reference. If se_nacl->acl_sess_list is > >>> empty, se_nacl->dynamic_stop is set to true. Then the first > >>> target_put_nacl(se_nacl) will drop the initial reference > >>> and free se_nacl. Later there is a second target_put_nacl() > >>> to put se_nacl. It may cause error in race. > >>>> My patch sets se_nacl->dynamic_stop to false to avoid the > >>> double put. > >>> > >>> Signed-off-by: Lv Yunlong > >>> --- > >>> drivers/target/target_core_transport.c | 4 +++- > >>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/drivers/target/target_core_transport.c > >>> b/drivers/target/target_core_transport.c > >>> index 5ecb9f18a53d..c266defe694f 100644 > >>> --- a/drivers/target/target_core_transport.c > >>> +++ b/drivers/target/target_core_transport.c > >>> @@ -584,8 +584,10 @@ void transport_free_session(struct se_session > >>> *se_sess) > >>> } > >>> mutex_unlock(&se_tpg->acl_node_mutex); > >>> > >>> - if (se_nacl->dynamic_stop) > >>> + if (se_nacl->dynamic_stop) { > >>> target_put_nacl(se_nacl); > >>> + se_nacl->dynamic_stop = false; > >>> + } > >>> > >>> target_put_nacl(se_nacl); > >> Could you describe the race a little more? > >> > >> Is the race: > >> > >> 1. thread1 called core_tpg_check_initiator_node_acl and found the acl. > >> sess->se_node_acl is set to the found acl. > >> 2. thread2 is running transport_free_session. It now grabs the > >> acl_node_mutex > >> and sees se_nacl->acl_sess_list is empty. > >> 3. thread2 does the dynamic_stop=true operations in transport_free_session. > >> 4. thread1 now calls transport_register_session now adds the sess to acl's > >> acl_sess_list. > >> > >> Later when the session that thread 1 created is deleted dynamic_stop is > >> still > >> set, so we do an extra target_put_nacl? > >> > >> I'm not sure your patch will handle this race. When we delete the session > >> thread1 > >> created dynamic_node_acl is still set, so this: > >> > >> mutex_lock(&se_tpg->acl_node_mutex); > >> if (se_nacl->dynamic_node_acl && > >> !se_tfo->tpg_check_demo_mode_cache(se_tpg)) { > >> spin_lock_irqsave(&se_nacl->nacl_sess_lock, flags); > >> if (list_empty(&se_nacl->acl_sess_list)) > >> se_nacl->dynamic_stop = true; > >> > >> can set dynamic_stop to true again and we can end up doing the extra put > >> still. > >> > >> On top of the extra put we also do > >> > >> list_del(&se_nacl->acl_list); > >> > >> twice so we have to handle that as well. > >> > >> Is there also another bug in this code. If someone adds an acl while there > >> is a > >> dynamic acl in place core_tpg_add_initiator_node_acl will clear > >> dynamic_node_acl > >> but we leave the extra reference, so later when transport_free_session is > >> called > >> we will not do the extra put. > >> > > > > Ok, thanks for your answer. According the description above, i think it is > > a false > > positive now. > > > > Did you hit this bug, are you using an inspection tool, or did you find this > by code > review? > > I think there was a misunderstanding. I was saying it looks like a race could > happen. > There is no protection in lio core. > > I think it's hard to hit because most drivers do not allow the combo: > > tpg_check_demo_mode == true > tpg_check_demo_mode_cache = false > > It looks like those settings are allowed with tcm_qla2xxx and usb, but: > > usb - has a mutex around creation and removal so we can't race. > tcm qla - I don't know this driver will enough, but I cc'd the maintainer. This bug is detected by a static analyzer tool. Thanks.
Re: [PATCH] target: Fix a double put in transport_free_session
On 3/25/21 2:48 AM, lyl2...@mail.ustc.edu.cn wrote: > > > >> -原始邮件- >> 发件人: michael.chris...@oracle.com >> 发送时间: 2021-03-24 00:28:35 (星期三) >> 收件人: "Lv Yunlong" , martin.peter...@oracle.com >> 抄送: linux-s...@vger.kernel.org, target-de...@vger.kernel.org, >> linux-kernel@vger.kernel.org >> 主题: Re: [PATCH] target: Fix a double put in transport_free_session >> >> On 3/22/21 9:58 PM, Lv Yunlong wrote: >>> In transport_free_session, se_nacl is got from se_sess >>> with the initial reference. If se_nacl->acl_sess_list is >>> empty, se_nacl->dynamic_stop is set to true. Then the first >>> target_put_nacl(se_nacl) will drop the initial reference >>> and free se_nacl. Later there is a second target_put_nacl() >>> to put se_nacl. It may cause error in race. >>>> My patch sets se_nacl->dynamic_stop to false to avoid the >>> double put. >>> >>> Signed-off-by: Lv Yunlong >>> --- >>> drivers/target/target_core_transport.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/target/target_core_transport.c >>> b/drivers/target/target_core_transport.c >>> index 5ecb9f18a53d..c266defe694f 100644 >>> --- a/drivers/target/target_core_transport.c >>> +++ b/drivers/target/target_core_transport.c >>> @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess) >>> } >>> mutex_unlock(&se_tpg->acl_node_mutex); >>> >>> - if (se_nacl->dynamic_stop) >>> + if (se_nacl->dynamic_stop) { >>> target_put_nacl(se_nacl); >>> + se_nacl->dynamic_stop = false; >>> + } >>> >>> target_put_nacl(se_nacl); >> Could you describe the race a little more? >> >> Is the race: >> >> 1. thread1 called core_tpg_check_initiator_node_acl and found the acl. >> sess->se_node_acl is set to the found acl. >> 2. thread2 is running transport_free_session. It now grabs the acl_node_mutex >> and sees se_nacl->acl_sess_list is empty. >> 3. thread2 does the dynamic_stop=true operations in transport_free_session. >> 4. thread1 now calls transport_register_session now adds the sess to acl's >> acl_sess_list. >> >> Later when the session that thread 1 created is deleted dynamic_stop is still >> set, so we do an extra target_put_nacl? >> >> I'm not sure your patch will handle this race. When we delete the session >> thread1 >> created dynamic_node_acl is still set, so this: >> >> mutex_lock(&se_tpg->acl_node_mutex); >> if (se_nacl->dynamic_node_acl && >> !se_tfo->tpg_check_demo_mode_cache(se_tpg)) { >> spin_lock_irqsave(&se_nacl->nacl_sess_lock, flags); >> if (list_empty(&se_nacl->acl_sess_list)) >> se_nacl->dynamic_stop = true; >> >> can set dynamic_stop to true again and we can end up doing the extra put >> still. >> >> On top of the extra put we also do >> >> list_del(&se_nacl->acl_list); >> >> twice so we have to handle that as well. >> >> Is there also another bug in this code. If someone adds an acl while there >> is a >> dynamic acl in place core_tpg_add_initiator_node_acl will clear >> dynamic_node_acl >> but we leave the extra reference, so later when transport_free_session is >> called >> we will not do the extra put. >> > > Ok, thanks for your answer. According the description above, i think it is a > false > positive now. > Did you hit this bug, are you using an inspection tool, or did you find this by code review? I think there was a misunderstanding. I was saying it looks like a race could happen. There is no protection in lio core. I think it's hard to hit because most drivers do not allow the combo: tpg_check_demo_mode == true tpg_check_demo_mode_cache = false It looks like those settings are allowed with tcm_qla2xxx and usb, but: usb - has a mutex around creation and removal so we can't race. tcm qla - I don't know this driver will enough, but I cc'd the maintainer.
Re: Re: [PATCH] target: Fix a double put in transport_free_session
> -原始邮件- > 发件人: michael.chris...@oracle.com > 发送时间: 2021-03-24 00:28:35 (星期三) > 收件人: "Lv Yunlong" , martin.peter...@oracle.com > 抄送: linux-s...@vger.kernel.org, target-de...@vger.kernel.org, > linux-kernel@vger.kernel.org > 主题: Re: [PATCH] target: Fix a double put in transport_free_session > > On 3/22/21 9:58 PM, Lv Yunlong wrote: > > In transport_free_session, se_nacl is got from se_sess > > with the initial reference. If se_nacl->acl_sess_list is > > empty, se_nacl->dynamic_stop is set to true. Then the first > > target_put_nacl(se_nacl) will drop the initial reference > > and free se_nacl. Later there is a second target_put_nacl() > > to put se_nacl. It may cause error in race. > >> My patch sets se_nacl->dynamic_stop to false to avoid the > > double put. > > > > Signed-off-by: Lv Yunlong > > --- > > drivers/target/target_core_transport.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/target/target_core_transport.c > > b/drivers/target/target_core_transport.c > > index 5ecb9f18a53d..c266defe694f 100644 > > --- a/drivers/target/target_core_transport.c > > +++ b/drivers/target/target_core_transport.c > > @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess) > > } > > mutex_unlock(&se_tpg->acl_node_mutex); > > > > - if (se_nacl->dynamic_stop) > > + if (se_nacl->dynamic_stop) { > > target_put_nacl(se_nacl); > > + se_nacl->dynamic_stop = false; > > + } > > > > target_put_nacl(se_nacl); > Could you describe the race a little more? > > Is the race: > > 1. thread1 called core_tpg_check_initiator_node_acl and found the acl. > sess->se_node_acl is set to the found acl. > 2. thread2 is running transport_free_session. It now grabs the acl_node_mutex > and sees se_nacl->acl_sess_list is empty. > 3. thread2 does the dynamic_stop=true operations in transport_free_session. > 4. thread1 now calls transport_register_session now adds the sess to acl's > acl_sess_list. > > Later when the session that thread 1 created is deleted dynamic_stop is still > set, so we do an extra target_put_nacl? > > I'm not sure your patch will handle this race. When we delete the session > thread1 > created dynamic_node_acl is still set, so this: > > mutex_lock(&se_tpg->acl_node_mutex); > if (se_nacl->dynamic_node_acl && > !se_tfo->tpg_check_demo_mode_cache(se_tpg)) { > spin_lock_irqsave(&se_nacl->nacl_sess_lock, flags); > if (list_empty(&se_nacl->acl_sess_list)) > se_nacl->dynamic_stop = true; > > can set dynamic_stop to true again and we can end up doing the extra put > still. > > On top of the extra put we also do > > list_del(&se_nacl->acl_list); > > twice so we have to handle that as well. > > Is there also another bug in this code. If someone adds an acl while there is > a > dynamic acl in place core_tpg_add_initiator_node_acl will clear > dynamic_node_acl > but we leave the extra reference, so later when transport_free_session is > called > we will not do the extra put. > Ok, thanks for your answer. According the description above, i think it is a false positive now. Thanks.
Re: [PATCH] target: Fix a double put in transport_free_session
On 3/22/21 9:58 PM, Lv Yunlong wrote: > In transport_free_session, se_nacl is got from se_sess > with the initial reference. If se_nacl->acl_sess_list is > empty, se_nacl->dynamic_stop is set to true. Then the first > target_put_nacl(se_nacl) will drop the initial reference > and free se_nacl. Later there is a second target_put_nacl() > to put se_nacl. It may cause error in race. >> My patch sets se_nacl->dynamic_stop to false to avoid the > double put. > > Signed-off-by: Lv Yunlong > --- > drivers/target/target_core_transport.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/target/target_core_transport.c > b/drivers/target/target_core_transport.c > index 5ecb9f18a53d..c266defe694f 100644 > --- a/drivers/target/target_core_transport.c > +++ b/drivers/target/target_core_transport.c > @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess) > } > mutex_unlock(&se_tpg->acl_node_mutex); > > - if (se_nacl->dynamic_stop) > + if (se_nacl->dynamic_stop) { > target_put_nacl(se_nacl); > + se_nacl->dynamic_stop = false; > + } > > target_put_nacl(se_nacl); Could you describe the race a little more? Is the race: 1. thread1 called core_tpg_check_initiator_node_acl and found the acl. sess->se_node_acl is set to the found acl. 2. thread2 is running transport_free_session. It now grabs the acl_node_mutex and sees se_nacl->acl_sess_list is empty. 3. thread2 does the dynamic_stop=true operations in transport_free_session. 4. thread1 now calls transport_register_session now adds the sess to acl's acl_sess_list. Later when the session that thread 1 created is deleted dynamic_stop is still set, so we do an extra target_put_nacl? I'm not sure your patch will handle this race. When we delete the session thread1 created dynamic_node_acl is still set, so this: mutex_lock(&se_tpg->acl_node_mutex); if (se_nacl->dynamic_node_acl && !se_tfo->tpg_check_demo_mode_cache(se_tpg)) { spin_lock_irqsave(&se_nacl->nacl_sess_lock, flags); if (list_empty(&se_nacl->acl_sess_list)) se_nacl->dynamic_stop = true; can set dynamic_stop to true again and we can end up doing the extra put still. On top of the extra put we also do list_del(&se_nacl->acl_list); twice so we have to handle that as well. Is there also another bug in this code. If someone adds an acl while there is a dynamic acl in place core_tpg_add_initiator_node_acl will clear dynamic_node_acl but we leave the extra reference, so later when transport_free_session is called we will not do the extra put.
[PATCH] target: Fix a double put in transport_free_session
In transport_free_session, se_nacl is got from se_sess with the initial reference. If se_nacl->acl_sess_list is empty, se_nacl->dynamic_stop is set to true. Then the first target_put_nacl(se_nacl) will drop the initial reference and free se_nacl. Later there is a second target_put_nacl() to put se_nacl. It may cause error in race. My patch sets se_nacl->dynamic_stop to false to avoid the double put. Signed-off-by: Lv Yunlong --- drivers/target/target_core_transport.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 5ecb9f18a53d..c266defe694f 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess) } mutex_unlock(&se_tpg->acl_node_mutex); - if (se_nacl->dynamic_stop) + if (se_nacl->dynamic_stop) { target_put_nacl(se_nacl); + se_nacl->dynamic_stop = false; + } target_put_nacl(se_nacl); } -- 2.25.1