Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock
Hello, On Mon, Jul 30, 2018 at 05:50:02PM +, Bart Van Assche wrote: > On Mon, 2018-07-30 at 10:31 -0700, t...@kernel.org wrote: > > On Mon, Jul 30, 2018 at 05:28:11PM +, Bart Van Assche wrote: > > > It's not clear to me how the sysfs_break_active_protection() should obtain > > > the struct kernfs_node pointer to the attribute. Calling that function > > > before > > > device_remove_file_self() causes a double call to > > > kernfs_break_active_protection(), > > > which is wrong. Calling kernfs_find_and_get(kobj->sd, attr->name) after > > > the > > > > So, if you braek active protection explicitly, there's no need to call > > remove_self(). It can just use regular remove. > > But how to avoid that scsi_remove_device(to_scsi_device(dev)) gets called > multiple times when using device_remove_self() and in case of concurrent > writes into the SCSI device "delete" sysfs attribute? So, scsi_remove_device() internally protects using scan_mutex and if the whole thing is wrapped with break_active_prot, I don't think you need to call remove_file_self at all, right? Thanks. -- tejun
Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock
Hello, Bart. On Mon, Jul 30, 2018 at 05:28:11PM +, Bart Van Assche wrote: > It's not clear to me how the sysfs_break_active_protection() should obtain > the struct kernfs_node pointer to the attribute. Calling that function before > device_remove_file_self() causes a double call to > kernfs_break_active_protection(), > which is wrong. Calling kernfs_find_and_get(kobj->sd, attr->name) after the So, if you braek active protection explicitly, there's no need to call remove_self(). It can just use regular remove. > attribute has been removed results in a NULL pointer because the attribute > that > that call tries to look up has already been removed. Should I proceed with the > approach proposed in the patches attached to a previous e-mail? I don't have an objection for that. Thanks. -- tejun
Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock
On Sun, Jul 29, 2018 at 04:03:41AM +, Bart Van Assche wrote: > On Thu, 2018-07-26 at 06:35 -0700, Tejun Heo wrote: > > Making removal asynchronous this way sometimes causes issues because > > whether the user sees the device released or not is racy. > > Hello Tejun, > > How could this change cause any user-visible changes? My understanding is > that any work queued with task_work_add() is executed before system call > execution leaves kernel context and returns back to user space context. Ah, you're right. This isn't workqueue. Hmm... yeah, needing to allocate sth in removal path is a bit icky but, it should be okay I guess. I *think* the kernfs active break/unbreak is likely gonna be cleaner tho. Thanks. -- tejun
Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock
Hello, Bart. On Thu, Jul 26, 2018 at 09:57:40PM +, Bart Van Assche wrote: ... > @@ -440,11 +445,21 @@ bool sysfs_remove_file_self(struct kobject *kobj, const > struct attribute *attr) > return false; > > ret = kernfs_remove_self(kn); > + if (ret && cb) { > + kernfs_break_active_protection(kn); > + cb(kobj, attr, data); > + kernfs_break_active_protection(kn); unbreak? Also, wouldn't it be better to just expose sysfs_break/unbreak and then do sth like the following from scsi? kobject_get(); sysfs_break_active_protection(); do normal sysfs removal; sysfs_unbreak..(); kobject_put(); Thanks. -- tejun
Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock
Hello, On Thu, Jul 26, 2018 at 02:09:41PM +, Bart Van Assche wrote: > On Thu, 2018-07-26 at 06:35 -0700, Tejun Heo wrote: > > Making removal asynchronous this way sometimes causes issues because > > whether the user sees the device released or not is racy. > > kernfs/sysfs have mechanisms to deal with these cases - remove_self > > and kernfs_break_active_protection(). Have you looked at those? > > Hello Tejun, > > The call stack in the patch description shows that sdev_store_delete() is > involved in the deadlock. The implementation of that function is as follows: > > static ssize_t > sdev_store_delete(struct device *dev, struct device_attribute *attr, > const char *buf, size_t count) > { > if (device_remove_file_self(dev, attr)) > scsi_remove_device(to_scsi_device(dev)); > return count; > }; > > device_remove_file_self() calls sysfs_remove_file_self() and that last > function calls kernfs_remove_self(). In other words, kernfs_remove_self() > is already being used. Please let me know if I misunderstood your comment. So, here, because scsi_remove_device() is the one involved in the circular dependency, just breaking the dependency chain on the file itself (self removal) isn't enough. You can wrap the whole operation with kernfs_break_active_protection() to also move scsi_remove_device() invocation outside the kernfs synchronization. This will need to be piped through sysfs but shouldn't be too complex. Thanks. -- tejun
Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()
Hey, On Mon, Mar 19, 2018 at 04:57:45PM +, Bart Van Assche wrote: > For synchronization primitives that wait having a stronger synchronization > primitive nested inside a more relaxed one can lead to a deadlock. But since > the rcu read lock primitives do not wait it could be safe to use that kind > of nesting with RCU. Do you perhaps know whether any documentation is > available about that kind of nesting or whether it is already used elsewhere > in the kernel? Oh, we nest them all the time. They're like (and sometimes literally are) preempt_disable() and don't care about nest ordering. Thanks. -- tejun
Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()
Hello, On Mon, Mar 19, 2018 at 04:18:54PM +, Bart Van Assche wrote: > The algorithm explained above does not depend on sched-rcu. But because > percpu_ref_tryget_live() uses sched-rcu and because we need to add an RCU lock > around that call we are forced to use sched-rcu. I hope this makes it clear. This could be me being slow but can you explain how what percpu_ref_tryget_live() uses internally affects whether we can use regular RCU around it? Thanks. -- tejun
Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()
Hello, On Mon, Mar 19, 2018 at 03:16:07PM +, Bart Van Assche wrote: > As explained in the comment in scsi_device_quiesce(), the effect of > blk_set_preempt_only() must be visible for percpu_ref_tryget() calls that > occur after the queue unfreeze triggered by scsi_device_quiesce(). Hence the > RCU read lock calls in blk_queue_enter(). Since these RCU read lock calls > surround a function call that uses rcu_read_lock_sched(), namely > percpu_ref_tryget_live(), we have to use sched-RCU in both blk_queue_enter() > and scsi_device_quiesce(). Let's please not depend on rcu_read_lock_sched() in percpu_ref_tryget_live(). That's an implementation detail. If the code needs RCU based synchronization, it should perform them explicitly. Thanks. -- tejun
Re: BUG: KASAN: use-after-free in scsi_exit_rq
(cc'ing Jan) Hello, Bart. On Fri, Apr 21, 2017 at 09:49:17PM +, Bart Van Assche wrote: > On Thu, 2017-04-20 at 15:18 -0600, Scott Bauer wrote: > > [ 642.638860] BUG: KASAN: use-after-free in scsi_exit_rq+0xf3/0x120 at > > addr 8802b7fedf00 > > [ 642.639362] Read of size 1 by task rcuos/5/53 > > [ 642.639713] CPU: 7 PID: 53 Comm: rcuos/6 Not tainted 4.11.0-rc5+ #13 > > [ 642.640170] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 > > [ 642.640923] Call Trace: > > [ 642.641080] dump_stack+0x63/0x8f > > [ 642.641289] kasan_object_err+0x21/0x70 > > [ 642.641531] kasan_report.part.1+0x231/0x500 > > [ 642.641823] ? scsi_exit_rq+0xf3/0x120 > > [ 642.642054] ? _raw_spin_unlock_irqrestore+0xe/0x10 > > [ 642.642353] ? free_percpu+0x1b7/0x340 > > [ 642.642586] ? put_task_stack+0x117/0x2b0 > > [ 642.642837] __asan_report_load1_noabort+0x2e/0x30 > > [ 642.643138] scsi_exit_rq+0xf3/0x120 > > [ 642.643366] free_request_size+0x44/0x60 > > [ 642.643614] mempool_destroy.part.6+0x9b/0x150 > > [ 642.643899] ? kasan_slab_free+0x87/0xb0 > > [ 642.644152] mempool_destroy+0x13/0x20 > > [ 642.644394] blk_exit_rl+0x36/0x40 > > [ 642.644614] blkg_free+0x146/0x200 > > [ 642.644836] __blkg_release_rcu+0x121/0x220 > > [ 642.645112] rcu_nocb_kthread+0x61f/0xca0 > > [ 642.645376] ? get_state_synchronize_rcu+0x20/0x20 > > [ 642.645690] ? pci_mmcfg_check_reserved+0x110/0x110 > > [ 642.646011] kthread+0x298/0x390 > > [ 642.646224] ? get_state_synchronize_rcu+0x20/0x20 > > [ 642.646535] ? kthread_park+0x160/0x160 > > [ 642.646787] ret_from_fork+0x2c/0x40 > > I'm not familiar with cgroups but seeing this makes me wonder whether it would > be possible to move the blk_exit_rl() calls from blk_release_queue() into > blk_cleanup_queue()? The SCSI core frees a SCSI host after blk_cleanup_queue() > has finished for all associated SCSI devices. This is why I think that calling > blk_exit_rl() earlier would be sufficient to avoid that scsi_exit_rq() > dereferences a SCSI host pointer after it has been freed. Hmm... I see. Didn't know request put path involved derefing those structs. The blk_exit_rl() call just replaced open coded mempool_destroy call, so the destruction order was always like this. We shouldn't have any request in flight by cleanup, so moving it there should be fine. Thanks. -- tejun