Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-07-30 Thread t...@kernel.org
Hello,

On Mon, Jul 30, 2018 at 05:50:02PM +, Bart Van Assche wrote:
> On Mon, 2018-07-30 at 10:31 -0700, t...@kernel.org wrote:
> > On Mon, Jul 30, 2018 at 05:28:11PM +, Bart Van Assche wrote:
> > > It's not clear to me how the sysfs_break_active_protection() should obtain
> > > the struct kernfs_node pointer to the attribute. Calling that function 
> > > before
> > > device_remove_file_self() causes a double call to 
> > > kernfs_break_active_protection(),
> > > which is wrong. Calling kernfs_find_and_get(kobj->sd, attr->name) after 
> > > the
> > 
> > So, if you braek active protection explicitly, there's no need to call
> > remove_self().  It can just use regular remove.
> 
> But how to avoid that scsi_remove_device(to_scsi_device(dev)) gets called
> multiple times when using device_remove_self() and in case of concurrent
> writes into the SCSI device "delete" sysfs attribute?

So, scsi_remove_device() internally protects using scan_mutex and if
the whole thing is wrapped with break_active_prot, I don't think you
need to call remove_file_self at all, right?

Thanks.

-- 
tejun


Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-07-30 Thread t...@kernel.org
Hello, Bart.

On Mon, Jul 30, 2018 at 05:28:11PM +, Bart Van Assche wrote:
> It's not clear to me how the sysfs_break_active_protection() should obtain
> the struct kernfs_node pointer to the attribute. Calling that function before
> device_remove_file_self() causes a double call to 
> kernfs_break_active_protection(),
> which is wrong. Calling kernfs_find_and_get(kobj->sd, attr->name) after the

So, if you braek active protection explicitly, there's no need to call
remove_self().  It can just use regular remove.

> attribute has been removed results in a NULL pointer because the attribute 
> that
> that call tries to look up has already been removed. Should I proceed with the
> approach proposed in the patches attached to a previous e-mail?

I don't have an objection for that.

Thanks.

-- 
tejun


Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-07-30 Thread t...@kernel.org
On Sun, Jul 29, 2018 at 04:03:41AM +, Bart Van Assche wrote:
> On Thu, 2018-07-26 at 06:35 -0700, Tejun Heo wrote:
> > Making removal asynchronous this way sometimes causes issues because
> > whether the user sees the device released or not is racy.
> 
> Hello Tejun,
> 
> How could this change cause any user-visible changes? My understanding is
> that any work queued with task_work_add() is executed before system call
> execution leaves kernel context and returns back to user space context.

Ah, you're right.  This isn't workqueue.  Hmm... yeah, needing to
allocate sth in removal path is a bit icky but, it should be okay I
guess.  I *think* the kernfs active break/unbreak is likely gonna be
cleaner tho.

Thanks.

-- 
tejun


Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-07-30 Thread t...@kernel.org
Hello, Bart.

On Thu, Jul 26, 2018 at 09:57:40PM +, Bart Van Assche wrote:
...
> @@ -440,11 +445,21 @@ bool sysfs_remove_file_self(struct kobject *kobj, const 
> struct attribute *attr)
>   return false;
>  
>   ret = kernfs_remove_self(kn);
> + if (ret && cb) {
> + kernfs_break_active_protection(kn);
> + cb(kobj, attr, data);
> + kernfs_break_active_protection(kn);

unbreak?

Also, wouldn't it be better to just expose sysfs_break/unbreak and
then do sth like the following from scsi?

kobject_get();
sysfs_break_active_protection();
do normal sysfs removal;
sysfs_unbreak..();
kobject_put();

Thanks.

-- 
tejun


Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-07-26 Thread t...@kernel.org
Hello,

On Thu, Jul 26, 2018 at 02:09:41PM +, Bart Van Assche wrote:
> On Thu, 2018-07-26 at 06:35 -0700, Tejun Heo wrote:
> > Making removal asynchronous this way sometimes causes issues because
> > whether the user sees the device released or not is racy.
> > kernfs/sysfs have mechanisms to deal with these cases - remove_self
> > and kernfs_break_active_protection().  Have you looked at those?
> 
> Hello Tejun,
> 
> The call stack in the patch description shows that sdev_store_delete() is
> involved in the deadlock. The implementation of that function is as follows:
> 
> static ssize_t
> sdev_store_delete(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t count)
> {
>   if (device_remove_file_self(dev, attr))
>   scsi_remove_device(to_scsi_device(dev));
>   return count;
> };
> 
> device_remove_file_self() calls sysfs_remove_file_self() and that last
> function calls kernfs_remove_self(). In other words, kernfs_remove_self()
> is already being used. Please let me know if I misunderstood your comment.

So, here, because scsi_remove_device() is the one involved in the
circular dependency, just breaking the dependency chain on the file
itself (self removal) isn't enough.  You can wrap the whole operation
with kernfs_break_active_protection() to also move
scsi_remove_device() invocation outside the kernfs synchronization.
This will need to be piped through sysfs but shouldn't be too complex.

Thanks.

-- 
tejun


Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread t...@kernel.org
Hey,

On Mon, Mar 19, 2018 at 04:57:45PM +, Bart Van Assche wrote:
> For synchronization primitives that wait having a stronger synchronization
> primitive nested inside a more relaxed one can lead to a deadlock. But since
> the rcu read lock primitives do not wait it could be safe to use that kind
> of nesting with RCU. Do you perhaps know whether any documentation is
> available about that kind of nesting or whether it is already used elsewhere
> in the kernel?

Oh, we nest them all the time.  They're like (and sometimes literally
are) preempt_disable() and don't care about nest ordering.

Thanks.

-- 
tejun


Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread t...@kernel.org
Hello,

On Mon, Mar 19, 2018 at 04:18:54PM +, Bart Van Assche wrote:
> The algorithm explained above does not depend on sched-rcu. But because
> percpu_ref_tryget_live() uses sched-rcu and because we need to add an RCU lock
> around that call we are forced to use sched-rcu. I hope this makes it clear.

This could be me being slow but can you explain how what
percpu_ref_tryget_live() uses internally affects whether we can use
regular RCU around it?

Thanks.

-- 
tejun


Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread t...@kernel.org
Hello,

On Mon, Mar 19, 2018 at 03:16:07PM +, Bart Van Assche wrote:
> As explained in the comment in scsi_device_quiesce(), the effect of
> blk_set_preempt_only() must be visible for percpu_ref_tryget() calls that
> occur after the queue unfreeze triggered by scsi_device_quiesce(). Hence the
> RCU read lock calls in blk_queue_enter(). Since these RCU read lock calls
> surround a function call that uses rcu_read_lock_sched(), namely
> percpu_ref_tryget_live(), we have to use sched-RCU in both blk_queue_enter()
> and scsi_device_quiesce().

Let's please not depend on rcu_read_lock_sched() in
percpu_ref_tryget_live().  That's an implementation detail.  If the
code needs RCU based synchronization, it should perform them
explicitly.

Thanks.

-- 
tejun


Re: BUG: KASAN: use-after-free in scsi_exit_rq

2017-04-28 Thread t...@kernel.org
(cc'ing Jan)

Hello, Bart.

On Fri, Apr 21, 2017 at 09:49:17PM +, Bart Van Assche wrote:
> On Thu, 2017-04-20 at 15:18 -0600, Scott Bauer wrote:
> > [  642.638860] BUG: KASAN: use-after-free in scsi_exit_rq+0xf3/0x120 at 
> > addr 8802b7fedf00
> > [  642.639362] Read of size 1 by task rcuos/5/53
> > [  642.639713] CPU: 7 PID: 53 Comm: rcuos/6 Not tainted 4.11.0-rc5+ #13
> > [  642.640170] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
> > [  642.640923] Call Trace:
> > [  642.641080]  dump_stack+0x63/0x8f
> > [  642.641289]  kasan_object_err+0x21/0x70
> > [  642.641531]  kasan_report.part.1+0x231/0x500
> > [  642.641823]  ? scsi_exit_rq+0xf3/0x120
> > [  642.642054]  ? _raw_spin_unlock_irqrestore+0xe/0x10
> > [  642.642353]  ? free_percpu+0x1b7/0x340
> > [  642.642586]  ? put_task_stack+0x117/0x2b0
> > [  642.642837]  __asan_report_load1_noabort+0x2e/0x30
> > [  642.643138]  scsi_exit_rq+0xf3/0x120
> > [  642.643366]  free_request_size+0x44/0x60
> > [  642.643614]  mempool_destroy.part.6+0x9b/0x150
> > [  642.643899]  ? kasan_slab_free+0x87/0xb0
> > [  642.644152]  mempool_destroy+0x13/0x20
> > [  642.644394]  blk_exit_rl+0x36/0x40
> > [  642.644614]  blkg_free+0x146/0x200
> > [  642.644836]  __blkg_release_rcu+0x121/0x220
> > [  642.645112]  rcu_nocb_kthread+0x61f/0xca0
> > [  642.645376]  ? get_state_synchronize_rcu+0x20/0x20
> > [  642.645690]  ? pci_mmcfg_check_reserved+0x110/0x110
> > [  642.646011]  kthread+0x298/0x390
> > [  642.646224]  ? get_state_synchronize_rcu+0x20/0x20
> > [  642.646535]  ? kthread_park+0x160/0x160
> > [  642.646787]  ret_from_fork+0x2c/0x40
> 
> I'm not familiar with cgroups but seeing this makes me wonder whether it would
> be possible to move the blk_exit_rl() calls from blk_release_queue() into
> blk_cleanup_queue()? The SCSI core frees a SCSI host after blk_cleanup_queue()
> has finished for all associated SCSI devices. This is why I think that calling
> blk_exit_rl() earlier would be sufficient to avoid that scsi_exit_rq()
> dereferences a SCSI host pointer after it has been freed.

Hmm... I see.  Didn't know request put path involved derefing those
structs.  The blk_exit_rl() call just replaced open coded
mempool_destroy call, so the destruction order was always like this.
We shouldn't have any request in flight by cleanup, so moving it there
should be fine.

Thanks.

-- 
tejun