Re: [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue()

2018-01-17 Thread Bart Van Assche
On Wed, 2018-01-17 at 21:04 +0800, Ming Lei wrote:
> On Wed, Jan 17, 2018 at 10:17 AM, Bart Van Assche
>  wrote:
> > Sorry but I think what you wrote is wrong. kobject_del(>kobj) waits 
> > until all
> > ongoing sysfs callback methods, including elv_iosched_store(), have 
> > finished and
> > prevents that any new elv_iosched_store() calls start. That is why I think 
> > the
> > above changes do not reintroduce the race fixed by commit e9a823fb34a8 
> > ("block:
> > fix warning when I/O elevator is changed as request_queue is being 
> > removed").
> 
> But your patch basically reverts commit e9a823fb34a8, and I just saw the 
> warning
> again after applying your patch in my stress test of switching elelvato:
> 
> [  225.999505] kobject_add_internal failed for iosched (error: -2 parent: 
> queue)
> [  225.999521] WARNING: CPU: 0 PID: 12707 at lib/kobject.c:244
> kobject_add_internal+0x236/0x24c
> [  225.999566] Call Trace:
> [  225.999570]  kobject_add+0x9e/0xc5
> [  225.999573]  elv_register_queue+0x35/0xa2
> [  225.999575]  elevator_switch+0x7a/0x1a4
> [  225.999577]  elv_iosched_store+0xd2/0x103
> [  225.999579]  queue_attr_store+0x66/0x82
> [  225.999581]  kernfs_fop_write+0xf3/0x135
> [  225.999583]  __vfs_write+0x31/0x142
> [  225.999591]  vfs_write+0xcb/0x16e
> [  225.999593]  SyS_write+0x5d/0xab
> [  225.999595]  entry_SYSCALL_64_fastpath+0x1a/0x7d

The above call stack means that sysfs_create_dir_ns() returned -ENOENT because
kobj->parent->sd was NULL (where kobj is the elevator kernel object). That's
probably becase sysfs_remove_dir() clears the sd pointer before performing the
actual removal. From fs/sysfs/dir.c:

spin_lock(_symlink_target_lock);
kobj->sd = NULL;
spin_unlock(_symlink_target_lock);

if (kn) {
WARN_ON_ONCE(kernfs_type(kn) != KERNFS_DIR);
kernfs_remove(kn);
}

In the past these two actions were performed in the opposite order. See also
commit aecdcedaab49 ("sysfs: implement kobj_sysfs_assoc_lock"). Anyway, I will
rework this patch.

Bart.

Re: [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue()

2018-01-17 Thread Ming Lei
On Wed, Jan 17, 2018 at 10:17 AM, Bart Van Assche
 wrote:
> On Wed, 2018-01-17 at 09:23 +0800, Ming Lei wrote:
>> On Wed, Jan 17, 2018 at 8:03 AM, Bart Van Assche  
>> wrote:
>> > On Tue, 2018-01-16 at 17:32 -0500, Mike Snitzer wrote:
>> > > Therefore it seems to me that all queue_attr_{show,store} are racey vs
>> > > blk_unregister_queue() removing the 'queue' kobject.
>> > >
>> > > And it was just that __elevator_change() was myopicly fixed to address
>> > > the race whereas a more generic solution was/is needed.  But short of
>> > > that more generic fix your change will reintroduce the potential for
>> > > hitting the issue that commit e9a823fb34a8b fixed.
>> > >
>> > > In that light, think it best to leave blk_unregister_queue()'s
>> > > mutex_lock() above the QUEUE_FLAG_REGISTERED clearing _and_ update
>> > > queue_attr_{show,store} to test for QUEUE_FLAG_REGISTERED while holding
>> > > sysfs_lock.
>> > >
>> > > Then remove the unicorn test_bit for QUEUE_FLAG_REGISTERED from
>> > > __elevator_change().
>> >
>> > Thanks Mike for the feedback. However, I think a simpler approach exists 
>> > than
>> > what has been described above, namely by unregistering the sysfs attributes
>> > earlier. How about the patch below?
>> >
>> > [PATCH] block: Protect less code with sysfs_lock in 
>> > blk_{un,}register_queue()
>> > ---
>> >  block/blk-sysfs.c | 39 ++-
>> >  block/elevator.c  |  4 
>> >  2 files changed, 26 insertions(+), 17 deletions(-)
>> >
>> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> > index 4a6a40ffd78e..ce32366c6db7 100644
>> > --- a/block/blk-sysfs.c
>> > +++ b/block/blk-sysfs.c
>> > @@ -853,6 +853,10 @@ struct kobj_type blk_queue_ktype = {
>> > .release= blk_release_queue,
>> >  };
>> >
>> > +/**
>> > + * blk_register_queue - register a block layer queue with sysfs
>> > + * @disk: Disk of which the request queue should be registered with sysfs.
>> > + */
>> >  int blk_register_queue(struct gendisk *disk)
>> >  {
>> > int ret;
>> > @@ -909,11 +913,12 @@ int blk_register_queue(struct gendisk *disk)
>> > if (q->request_fn || (q->mq_ops && q->elevator)) {
>> > ret = elv_register_queue(q);
>> > if (ret) {
>> > +   mutex_unlock(>sysfs_lock);
>> > kobject_uevent(>kobj, KOBJ_REMOVE);
>> > kobject_del(>kobj);
>> > blk_trace_remove_sysfs(dev);
>> > kobject_put(>kobj);
>> > -   goto unlock;
>> > +   return ret;
>> > }
>> > }
>> > ret = 0;
>> > @@ -923,6 +928,13 @@ int blk_register_queue(struct gendisk *disk)
>> >  }
>> >  EXPORT_SYMBOL_GPL(blk_register_queue);
>> >
>> > +/**
>> > + * blk_unregister_queue - counterpart of blk_register_queue()
>> > + * @disk: Disk of which the request queue should be unregistered from 
>> > sysfs.
>> > + *
>> > + * Note: the caller is responsible for guaranteeing that this function is 
>> > called
>> > + * after blk_register_queue() has finished.
>> > + */
>> >  void blk_unregister_queue(struct gendisk *disk)
>> >  {
>> > struct request_queue *q = disk->queue;
>> > @@ -934,28 +946,29 @@ void blk_unregister_queue(struct gendisk *disk)
>> > if (!test_bit(QUEUE_FLAG_REGISTERED, >queue_flags))
>> > return;
>> >
>> > -   /*
>> > -* Protect against the 'queue' kobj being accessed
>> > -* while/after it is removed.
>> > -*/
>> > -   mutex_lock(>sysfs_lock);
>> > -
>> > spin_lock_irq(q->queue_lock);
>> > queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
>> > spin_unlock_irq(q->queue_lock);
>> >
>> > -   wbt_exit(q);
>> > -
>> > +   /*
>> > +* Remove the sysfs attributes before unregistering the queue data
>> > +* structures that can be modified through sysfs.
>> > +*/
>> > +   mutex_lock(>sysfs_lock);
>> > if (q->mq_ops)
>> > blk_mq_unregister_dev(disk_to_dev(disk), q);
>> > -
>> > -   if (q->request_fn || (q->mq_ops && q->elevator))
>> > -   elv_unregister_queue(q);
>> > +   mutex_unlock(>sysfs_lock);
>> >
>> > kobject_uevent(>kobj, KOBJ_REMOVE);
>> > kobject_del(>kobj);
>>
>> elevator switch still can come just after the above line code is completed,
>> so the previous warning addressed in e9a823fb34a8b can be triggered
>> again.
>>
>> > blk_trace_remove_sysfs(disk_to_dev(disk));
>> > -   kobject_put(_to_dev(disk)->kobj);
>> >
>> > +   wbt_exit(q);
>> > +
>> > +   mutex_lock(>sysfs_lock);
>> > +   if (q->request_fn || (q->mq_ops && q->elevator))
>> > +   elv_unregister_queue(q);
>> > mutex_unlock(>sysfs_lock);
>> > +
>> > +   kobject_put(_to_dev(disk)->kobj);
>> >  }
>> > diff --git a/block/elevator.c 

Re: [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue()

2018-01-16 Thread Bart Van Assche
On Wed, 2018-01-17 at 09:23 +0800, Ming Lei wrote:
> On Wed, Jan 17, 2018 at 8:03 AM, Bart Van Assche  
> wrote:
> > On Tue, 2018-01-16 at 17:32 -0500, Mike Snitzer wrote:
> > > Therefore it seems to me that all queue_attr_{show,store} are racey vs
> > > blk_unregister_queue() removing the 'queue' kobject.
> > > 
> > > And it was just that __elevator_change() was myopicly fixed to address
> > > the race whereas a more generic solution was/is needed.  But short of
> > > that more generic fix your change will reintroduce the potential for
> > > hitting the issue that commit e9a823fb34a8b fixed.
> > > 
> > > In that light, think it best to leave blk_unregister_queue()'s
> > > mutex_lock() above the QUEUE_FLAG_REGISTERED clearing _and_ update
> > > queue_attr_{show,store} to test for QUEUE_FLAG_REGISTERED while holding
> > > sysfs_lock.
> > > 
> > > Then remove the unicorn test_bit for QUEUE_FLAG_REGISTERED from
> > > __elevator_change().
> > 
> > Thanks Mike for the feedback. However, I think a simpler approach exists 
> > than
> > what has been described above, namely by unregistering the sysfs attributes
> > earlier. How about the patch below?
> > 
> > [PATCH] block: Protect less code with sysfs_lock in 
> > blk_{un,}register_queue()
> > ---
> >  block/blk-sysfs.c | 39 ++-
> >  block/elevator.c  |  4 
> >  2 files changed, 26 insertions(+), 17 deletions(-)
> > 
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index 4a6a40ffd78e..ce32366c6db7 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -853,6 +853,10 @@ struct kobj_type blk_queue_ktype = {
> > .release= blk_release_queue,
> >  };
> > 
> > +/**
> > + * blk_register_queue - register a block layer queue with sysfs
> > + * @disk: Disk of which the request queue should be registered with sysfs.
> > + */
> >  int blk_register_queue(struct gendisk *disk)
> >  {
> > int ret;
> > @@ -909,11 +913,12 @@ int blk_register_queue(struct gendisk *disk)
> > if (q->request_fn || (q->mq_ops && q->elevator)) {
> > ret = elv_register_queue(q);
> > if (ret) {
> > +   mutex_unlock(>sysfs_lock);
> > kobject_uevent(>kobj, KOBJ_REMOVE);
> > kobject_del(>kobj);
> > blk_trace_remove_sysfs(dev);
> > kobject_put(>kobj);
> > -   goto unlock;
> > +   return ret;
> > }
> > }
> > ret = 0;
> > @@ -923,6 +928,13 @@ int blk_register_queue(struct gendisk *disk)
> >  }
> >  EXPORT_SYMBOL_GPL(blk_register_queue);
> > 
> > +/**
> > + * blk_unregister_queue - counterpart of blk_register_queue()
> > + * @disk: Disk of which the request queue should be unregistered from 
> > sysfs.
> > + *
> > + * Note: the caller is responsible for guaranteeing that this function is 
> > called
> > + * after blk_register_queue() has finished.
> > + */
> >  void blk_unregister_queue(struct gendisk *disk)
> >  {
> > struct request_queue *q = disk->queue;
> > @@ -934,28 +946,29 @@ void blk_unregister_queue(struct gendisk *disk)
> > if (!test_bit(QUEUE_FLAG_REGISTERED, >queue_flags))
> > return;
> > 
> > -   /*
> > -* Protect against the 'queue' kobj being accessed
> > -* while/after it is removed.
> > -*/
> > -   mutex_lock(>sysfs_lock);
> > -
> > spin_lock_irq(q->queue_lock);
> > queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
> > spin_unlock_irq(q->queue_lock);
> > 
> > -   wbt_exit(q);
> > -
> > +   /*
> > +* Remove the sysfs attributes before unregistering the queue data
> > +* structures that can be modified through sysfs.
> > +*/
> > +   mutex_lock(>sysfs_lock);
> > if (q->mq_ops)
> > blk_mq_unregister_dev(disk_to_dev(disk), q);
> > -
> > -   if (q->request_fn || (q->mq_ops && q->elevator))
> > -   elv_unregister_queue(q);
> > +   mutex_unlock(>sysfs_lock);
> > 
> > kobject_uevent(>kobj, KOBJ_REMOVE);
> > kobject_del(>kobj);
> 
> elevator switch still can come just after the above line code is completed,
> so the previous warning addressed in e9a823fb34a8b can be triggered
> again.
> 
> > blk_trace_remove_sysfs(disk_to_dev(disk));
> > -   kobject_put(_to_dev(disk)->kobj);
> > 
> > +   wbt_exit(q);
> > +
> > +   mutex_lock(>sysfs_lock);
> > +   if (q->request_fn || (q->mq_ops && q->elevator))
> > +   elv_unregister_queue(q);
> > mutex_unlock(>sysfs_lock);
> > +
> > +   kobject_put(_to_dev(disk)->kobj);
> >  }
> > diff --git a/block/elevator.c b/block/elevator.c
> > index e87e9b43aba0..4b7957b28a99 100644
> > --- a/block/elevator.c
> > +++ b/block/elevator.c
> > @@ -1080,10 +1080,6 @@ static int __elevator_change(struct request_queue 
> > *q, 

Re: [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue()

2018-01-16 Thread Ming Lei
On Wed, Jan 17, 2018 at 8:03 AM, Bart Van Assche  wrote:
> On Tue, 2018-01-16 at 17:32 -0500, Mike Snitzer wrote:
>> Therefore it seems to me that all queue_attr_{show,store} are racey vs
>> blk_unregister_queue() removing the 'queue' kobject.
>>
>> And it was just that __elevator_change() was myopicly fixed to address
>> the race whereas a more generic solution was/is needed.  But short of
>> that more generic fix your change will reintroduce the potential for
>> hitting the issue that commit e9a823fb34a8b fixed.
>>
>> In that light, think it best to leave blk_unregister_queue()'s
>> mutex_lock() above the QUEUE_FLAG_REGISTERED clearing _and_ update
>> queue_attr_{show,store} to test for QUEUE_FLAG_REGISTERED while holding
>> sysfs_lock.
>>
>> Then remove the unicorn test_bit for QUEUE_FLAG_REGISTERED from
>> __elevator_change().
>
> Thanks Mike for the feedback. However, I think a simpler approach exists than
> what has been described above, namely by unregistering the sysfs attributes
> earlier. How about the patch below?
>
> [PATCH] block: Protect less code with sysfs_lock in blk_{un,}register_queue()
> ---
>  block/blk-sysfs.c | 39 ++-
>  block/elevator.c  |  4 
>  2 files changed, 26 insertions(+), 17 deletions(-)
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 4a6a40ffd78e..ce32366c6db7 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -853,6 +853,10 @@ struct kobj_type blk_queue_ktype = {
> .release= blk_release_queue,
>  };
>
> +/**
> + * blk_register_queue - register a block layer queue with sysfs
> + * @disk: Disk of which the request queue should be registered with sysfs.
> + */
>  int blk_register_queue(struct gendisk *disk)
>  {
> int ret;
> @@ -909,11 +913,12 @@ int blk_register_queue(struct gendisk *disk)
> if (q->request_fn || (q->mq_ops && q->elevator)) {
> ret = elv_register_queue(q);
> if (ret) {
> +   mutex_unlock(>sysfs_lock);
> kobject_uevent(>kobj, KOBJ_REMOVE);
> kobject_del(>kobj);
> blk_trace_remove_sysfs(dev);
> kobject_put(>kobj);
> -   goto unlock;
> +   return ret;
> }
> }
> ret = 0;
> @@ -923,6 +928,13 @@ int blk_register_queue(struct gendisk *disk)
>  }
>  EXPORT_SYMBOL_GPL(blk_register_queue);
>
> +/**
> + * blk_unregister_queue - counterpart of blk_register_queue()
> + * @disk: Disk of which the request queue should be unregistered from sysfs.
> + *
> + * Note: the caller is responsible for guaranteeing that this function is 
> called
> + * after blk_register_queue() has finished.
> + */
>  void blk_unregister_queue(struct gendisk *disk)
>  {
> struct request_queue *q = disk->queue;
> @@ -934,28 +946,29 @@ void blk_unregister_queue(struct gendisk *disk)
> if (!test_bit(QUEUE_FLAG_REGISTERED, >queue_flags))
> return;
>
> -   /*
> -* Protect against the 'queue' kobj being accessed
> -* while/after it is removed.
> -*/
> -   mutex_lock(>sysfs_lock);
> -
> spin_lock_irq(q->queue_lock);
> queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
> spin_unlock_irq(q->queue_lock);
>
> -   wbt_exit(q);
> -
> +   /*
> +* Remove the sysfs attributes before unregistering the queue data
> +* structures that can be modified through sysfs.
> +*/
> +   mutex_lock(>sysfs_lock);
> if (q->mq_ops)
> blk_mq_unregister_dev(disk_to_dev(disk), q);
> -
> -   if (q->request_fn || (q->mq_ops && q->elevator))
> -   elv_unregister_queue(q);
> +   mutex_unlock(>sysfs_lock);
>
> kobject_uevent(>kobj, KOBJ_REMOVE);
> kobject_del(>kobj);

elevator switch still can come just after the above line code is completed,
so the previous warning addressed in e9a823fb34a8b can be triggered
again.

> blk_trace_remove_sysfs(disk_to_dev(disk));
> -   kobject_put(_to_dev(disk)->kobj);
>
> +   wbt_exit(q);
> +
> +   mutex_lock(>sysfs_lock);
> +   if (q->request_fn || (q->mq_ops && q->elevator))
> +   elv_unregister_queue(q);
> mutex_unlock(>sysfs_lock);
> +
> +   kobject_put(_to_dev(disk)->kobj);
>  }
> diff --git a/block/elevator.c b/block/elevator.c
> index e87e9b43aba0..4b7957b28a99 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -1080,10 +1080,6 @@ static int __elevator_change(struct request_queue *q, 
> const char *name)
> char elevator_name[ELV_NAME_MAX];
> struct elevator_type *e;
>
> -   /* Make sure queue is not in the middle of being removed */
> -   if (!test_bit(QUEUE_FLAG_REGISTERED, >queue_flags))
> -   return -ENOENT;
> -

The above check shouldn't be removed, as I explained above.



-- 
Ming Lei


Re: [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue()

2018-01-16 Thread Bart Van Assche
On Tue, 2018-01-16 at 17:32 -0500, Mike Snitzer wrote:
> Therefore it seems to me that all queue_attr_{show,store} are racey vs
> blk_unregister_queue() removing the 'queue' kobject.
> 
> And it was just that __elevator_change() was myopicly fixed to address
> the race whereas a more generic solution was/is needed.  But short of
> that more generic fix your change will reintroduce the potential for
> hitting the issue that commit e9a823fb34a8b fixed.
> 
> In that light, think it best to leave blk_unregister_queue()'s
> mutex_lock() above the QUEUE_FLAG_REGISTERED clearing _and_ update
> queue_attr_{show,store} to test for QUEUE_FLAG_REGISTERED while holding
> sysfs_lock.
> 
> Then remove the unicorn test_bit for QUEUE_FLAG_REGISTERED from
> __elevator_change().

Thanks Mike for the feedback. However, I think a simpler approach exists than
what has been described above, namely by unregistering the sysfs attributes
earlier. How about the patch below?

[PATCH] block: Protect less code with sysfs_lock in blk_{un,}register_queue()
---
 block/blk-sysfs.c | 39 ++-
 block/elevator.c  |  4 
 2 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 4a6a40ffd78e..ce32366c6db7 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -853,6 +853,10 @@ struct kobj_type blk_queue_ktype = {
.release= blk_release_queue,
 };
 
+/**
+ * blk_register_queue - register a block layer queue with sysfs
+ * @disk: Disk of which the request queue should be registered with sysfs.
+ */
 int blk_register_queue(struct gendisk *disk)
 {
int ret;
@@ -909,11 +913,12 @@ int blk_register_queue(struct gendisk *disk)
if (q->request_fn || (q->mq_ops && q->elevator)) {
ret = elv_register_queue(q);
if (ret) {
+   mutex_unlock(>sysfs_lock);
kobject_uevent(>kobj, KOBJ_REMOVE);
kobject_del(>kobj);
blk_trace_remove_sysfs(dev);
kobject_put(>kobj);
-   goto unlock;
+   return ret;
}
}
ret = 0;
@@ -923,6 +928,13 @@ int blk_register_queue(struct gendisk *disk)
 }
 EXPORT_SYMBOL_GPL(blk_register_queue);
 
+/**
+ * blk_unregister_queue - counterpart of blk_register_queue()
+ * @disk: Disk of which the request queue should be unregistered from sysfs.
+ *
+ * Note: the caller is responsible for guaranteeing that this function is 
called
+ * after blk_register_queue() has finished.
+ */
 void blk_unregister_queue(struct gendisk *disk)
 {
struct request_queue *q = disk->queue;
@@ -934,28 +946,29 @@ void blk_unregister_queue(struct gendisk *disk)
if (!test_bit(QUEUE_FLAG_REGISTERED, >queue_flags))
return;
 
-   /*
-* Protect against the 'queue' kobj being accessed
-* while/after it is removed.
-*/
-   mutex_lock(>sysfs_lock);
-
spin_lock_irq(q->queue_lock);
queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
spin_unlock_irq(q->queue_lock);
 
-   wbt_exit(q);
-
+   /*
+* Remove the sysfs attributes before unregistering the queue data
+* structures that can be modified through sysfs.
+*/
+   mutex_lock(>sysfs_lock);
if (q->mq_ops)
blk_mq_unregister_dev(disk_to_dev(disk), q);
-
-   if (q->request_fn || (q->mq_ops && q->elevator))
-   elv_unregister_queue(q);
+   mutex_unlock(>sysfs_lock);
 
kobject_uevent(>kobj, KOBJ_REMOVE);
kobject_del(>kobj);
blk_trace_remove_sysfs(disk_to_dev(disk));
-   kobject_put(_to_dev(disk)->kobj);
 
+   wbt_exit(q);
+
+   mutex_lock(>sysfs_lock);
+   if (q->request_fn || (q->mq_ops && q->elevator))
+   elv_unregister_queue(q);
mutex_unlock(>sysfs_lock);
+
+   kobject_put(_to_dev(disk)->kobj);
 }
diff --git a/block/elevator.c b/block/elevator.c
index e87e9b43aba0..4b7957b28a99 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -1080,10 +1080,6 @@ static int __elevator_change(struct request_queue *q, 
const char *name)
char elevator_name[ELV_NAME_MAX];
struct elevator_type *e;
 
-   /* Make sure queue is not in the middle of being removed */
-   if (!test_bit(QUEUE_FLAG_REGISTERED, >queue_flags))
-   return -ENOENT;
-
/*
 * Special case for mq, turn off scheduling
 */
-- 
2.15.1


Re: [PATCH 3/3] block: Protect less code with sysfs_lock in blk_{un,}register_queue()

2018-01-16 Thread Mike Snitzer
On Tue, Jan 16 2018 at  1:17pm -0500,
Bart Van Assche  wrote:

> The __blk_mq_register_dev(), blk_mq_unregister_dev(),
> elv_register_queue() and elv_unregister_queue() calls need to be
> protected with sysfs_lock but other code in these functions not.
> Hence protect only this code with sysfs_lock. This patch fixes a
> locking inversion issue in blk_unregister_queue() and also in an
> error path of blk_register_queue(): it is not allowed to hold
> sysfs_lock around the kobject_del(>kobj) call.
> 
> Signed-off-by: Bart Van Assche 
> ---
>  block/blk-sysfs.c | 13 -
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 4a6a40ffd78e..e9ce45ff0ef2 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -909,11 +909,12 @@ int blk_register_queue(struct gendisk *disk)
>   if (q->request_fn || (q->mq_ops && q->elevator)) {
>   ret = elv_register_queue(q);
>   if (ret) {
> + mutex_unlock(>sysfs_lock);
>   kobject_uevent(>kobj, KOBJ_REMOVE);
>   kobject_del(>kobj);
>   blk_trace_remove_sysfs(dev);
>   kobject_put(>kobj);
> - goto unlock;
> + return ret;
>   }
>   }
>   ret = 0;
> @@ -934,28 +935,22 @@ void blk_unregister_queue(struct gendisk *disk)
>   if (!test_bit(QUEUE_FLAG_REGISTERED, >queue_flags))
>   return;
>  
> - /*
> -  * Protect against the 'queue' kobj being accessed
> -  * while/after it is removed.
> -  */
> - mutex_lock(>sysfs_lock);
> -
>   spin_lock_irq(q->queue_lock);
>   queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
>   spin_unlock_irq(q->queue_lock);
>  
>   wbt_exit(q);
>  
> + mutex_lock(>sysfs_lock);
>   if (q->mq_ops)
>   blk_mq_unregister_dev(disk_to_dev(disk), q);
>  
>   if (q->request_fn || (q->mq_ops && q->elevator))
>   elv_unregister_queue(q);

My concern with this change is detailed in the following portion of
the header for commit 667257e8b2988c0183ba23e2bcd6900e87961606:

2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED
in case elv_iosched_store() loses the race with blk_unregister_queue(),
it needs a way to know the 'queue' kobj isn't there.

I don't think moving mutex_lock(>sysfs_lock); after the clearing of
QUEUE_FLAG_REGISTERED is a step in the right direction.

Current code shows:

blk_cleanup_queue() calls blk_set_queue_dying() while holding
the sysfs_lock.

queue_attr_{show,store} both test if blk_queue_dying(q) while holding
the sysfs_lock.

BUT drivers can/do call del_gendisk() _before_ blk_cleanup_queue().
(if your proposed change above were to go in all of the block drivers
would first need to be audited for the need to call blk_cleanup_queue()
before del_gendisk() -- seems awful).

Therefore it seems to me that all queue_attr_{show,store} are racey vs
blk_unregister_queue() removing the 'queue' kobject.

And it was just that __elevator_change() was myopicly fixed to address
the race whereas a more generic solution was/is needed.  But short of
that more generic fix your change will reintroduce the potential for
hitting the issue that commit e9a823fb34a8b fixed.

In that light, think it best to leave blk_unregister_queue()'s
mutex_lock() above the QUEUE_FLAG_REGISTERED clearing _and_ update
queue_attr_{show,store} to test for QUEUE_FLAG_REGISTERED while holding
sysfs_lock.

Then remove the unicorn test_bit for QUEUE_FLAG_REGISTERED from
__elevator_change().

But it could be I'm wrong for some reason.. as you know that happens ;)

Mike