Re: [PATCH 2/7] aoe: avoid races between device destruction and discovery

2012-12-06 Thread Ed Cashin
On Dec 4, 2012, at 6:45 PM, Andrew Morton wrote:

> On Mon, 3 Dec 2012 20:42:56 -0500
> Ed Cashin  wrote:
> 
>> This change avoids a race that could result in a NULL pointer
>> derference following a WARNing from kobject_add_internal, "don't
>> try to register things with the same name in the same directory."
...
>> The check for a bad aoedev pointer remains from a time when about
>> half of this patch was done, and it was possible for the
>> bdev->bd_disk->private_data to become corrupted.  The check
>> should be removed eventually, but it is not expected to add
>> significant overhead, occurring in the aoeblk_open routine.
...
>> --- a/drivers/block/aoe/aoeblk.c
>> +++ b/drivers/block/aoe/aoeblk.c
>> @@ -147,9 +147,18 @@ aoeblk_open(struct block_device *bdev, fmode_t mode)
>>  struct aoedev *d = bdev->bd_disk->private_data;
>>  ulong flags;
>> 
>> +if (!virt_addr_valid(d)) {
>> +pr_crit("aoe: invalid device pointer in %s\n",
>> +__func__);
>> +WARN_ON(1);
>> +return -ENODEV;
>> +}
> 
> Can this ever happen?

This is the check mentioned in the last paragraph of the changelog
message.  I don't think it can happen now.  Folks have been using
it like this and nobody has seen the invalid device pointer message.

I'll go ahead and remove the check and resubmit the patch.

...
>> @@ -259,6 +268,18 @@ aoeblk_gdalloc(void *vp)
>>  struct request_queue *q;
>>  enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
>>  ulong flags;
>> +int late = 0;
>> +
>> +spin_lock_irqsave(>lock, flags);
>> +if (d->flags & DEVFL_GDALLOC
>> +&& !(d->flags & DEVFL_TKILL)
>> +&& !(d->flags & DEVFL_GD_NOW))
> 
> That's pretty sickly-looking code layout.
> 
> We often do
> 
>   if ((d->flags & (DEVFL_GDALLOC|DEVFL_TKILL|DEVFL_GD_NOW)) ==
>   DEVFL_GDALLOC)
> 
> in these cases.

OK.  When I'm resubmitting these patches, I'm planning to submit
the series of 7 patches and include info in the cover letter about
what has changed in the resubmission.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] aoe: avoid races between device destruction and discovery

2012-12-06 Thread Ed Cashin
On Dec 4, 2012, at 6:45 PM, Andrew Morton wrote:

 On Mon, 3 Dec 2012 20:42:56 -0500
 Ed Cashin ecas...@coraid.com wrote:
 
 This change avoids a race that could result in a NULL pointer
 derference following a WARNing from kobject_add_internal, don't
 try to register things with the same name in the same directory.
...
 The check for a bad aoedev pointer remains from a time when about
 half of this patch was done, and it was possible for the
 bdev-bd_disk-private_data to become corrupted.  The check
 should be removed eventually, but it is not expected to add
 significant overhead, occurring in the aoeblk_open routine.
...
 --- a/drivers/block/aoe/aoeblk.c
 +++ b/drivers/block/aoe/aoeblk.c
 @@ -147,9 +147,18 @@ aoeblk_open(struct block_device *bdev, fmode_t mode)
  struct aoedev *d = bdev-bd_disk-private_data;
  ulong flags;
 
 +if (!virt_addr_valid(d)) {
 +pr_crit(aoe: invalid device pointer in %s\n,
 +__func__);
 +WARN_ON(1);
 +return -ENODEV;
 +}
 
 Can this ever happen?

This is the check mentioned in the last paragraph of the changelog
message.  I don't think it can happen now.  Folks have been using
it like this and nobody has seen the invalid device pointer message.

I'll go ahead and remove the check and resubmit the patch.

...
 @@ -259,6 +268,18 @@ aoeblk_gdalloc(void *vp)
  struct request_queue *q;
  enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
  ulong flags;
 +int late = 0;
 +
 +spin_lock_irqsave(d-lock, flags);
 +if (d-flags  DEVFL_GDALLOC
 + !(d-flags  DEVFL_TKILL)
 + !(d-flags  DEVFL_GD_NOW))
 
 That's pretty sickly-looking code layout.
 
 We often do
 
   if ((d-flags  (DEVFL_GDALLOC|DEVFL_TKILL|DEVFL_GD_NOW)) ==
   DEVFL_GDALLOC)
 
 in these cases.

OK.  When I'm resubmitting these patches, I'm planning to submit
the series of 7 patches and include info in the cover letter about
what has changed in the resubmission.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] aoe: avoid races between device destruction and discovery

2012-12-04 Thread Andrew Morton
On Mon, 3 Dec 2012 20:42:56 -0500
Ed Cashin  wrote:

> This change avoids a race that could result in a NULL pointer
> derference following a WARNing from kobject_add_internal, "don't
> try to register things with the same name in the same directory."
> 
> The problem was found with a test that forgets and discovers an
> aoe device in a loop:
> 
>   while test ! -r /tmp/stop; do
>   aoe-flush -a
>   aoe-discover
>   done
> 
> The race was between aoedev_flush taking aoedevs out of the
> devlist, allowing a new discovery of the same AoE target to take
> place before the driver gets around to calling
> sysfs_remove_group.  Fixing that one revealed another race
> between do_open and add_disk, and this patch avoids that, too.
> 
> The fix required some care, because for flushing (forgetting) an
> aoedev, some of the steps must be performed under lock and some
> must be able to sleep.  Also, for discovering a new aoedev, some
> steps might sleep.
> 
> The check for a bad aoedev pointer remains from a time when about
> half of this patch was done, and it was possible for the
> bdev->bd_disk->private_data to become corrupted.  The check
> should be removed eventually, but it is not expected to add
> significant overhead, occurring in the aoeblk_open routine.
> 
>
> ...
>
> --- a/drivers/block/aoe/aoeblk.c
> +++ b/drivers/block/aoe/aoeblk.c
> @@ -147,9 +147,18 @@ aoeblk_open(struct block_device *bdev, fmode_t mode)
>   struct aoedev *d = bdev->bd_disk->private_data;
>   ulong flags;
>  
> + if (!virt_addr_valid(d)) {
> + pr_crit("aoe: invalid device pointer in %s\n",
> + __func__);
> + WARN_ON(1);
> + return -ENODEV;
> + }

Can this ever happen?

> + if (!(d->flags & DEVFL_UP) || d->flags & DEVFL_TKILL)
> + return -ENODEV;
> +
>   mutex_lock(_mutex);
>   spin_lock_irqsave(>lock, flags);
> - if (d->flags & DEVFL_UP) {
> + if (d->flags & DEVFL_UP && !(d->flags & DEVFL_TKILL)) {
>   d->nopen++;
>   spin_unlock_irqrestore(>lock, flags);
>   mutex_unlock(_mutex);
> @@ -259,6 +268,18 @@ aoeblk_gdalloc(void *vp)
>   struct request_queue *q;
>   enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
>   ulong flags;
> + int late = 0;
> +
> + spin_lock_irqsave(>lock, flags);
> + if (d->flags & DEVFL_GDALLOC
> + && !(d->flags & DEVFL_TKILL)
> + && !(d->flags & DEVFL_GD_NOW))

That's pretty sickly-looking code layout.

We often do

if ((d->flags & (DEVFL_GDALLOC|DEVFL_TKILL|DEVFL_GD_NOW)) ==
DEVFL_GDALLOC)

in these cases.

> + d->flags |= DEVFL_GD_NOW;
> + else
> + late = 1;
> + spin_unlock_irqrestore(>lock, flags);
> + if (late)
> + return;
>  
>   gd = alloc_disk(AOE_PARTITIONS);
>   if (gd == NULL) {
>
> ...
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] aoe: avoid races between device destruction and discovery

2012-12-04 Thread Andrew Morton
On Mon, 3 Dec 2012 20:42:56 -0500
Ed Cashin ecas...@coraid.com wrote:

 This change avoids a race that could result in a NULL pointer
 derference following a WARNing from kobject_add_internal, don't
 try to register things with the same name in the same directory.
 
 The problem was found with a test that forgets and discovers an
 aoe device in a loop:
 
   while test ! -r /tmp/stop; do
   aoe-flush -a
   aoe-discover
   done
 
 The race was between aoedev_flush taking aoedevs out of the
 devlist, allowing a new discovery of the same AoE target to take
 place before the driver gets around to calling
 sysfs_remove_group.  Fixing that one revealed another race
 between do_open and add_disk, and this patch avoids that, too.
 
 The fix required some care, because for flushing (forgetting) an
 aoedev, some of the steps must be performed under lock and some
 must be able to sleep.  Also, for discovering a new aoedev, some
 steps might sleep.
 
 The check for a bad aoedev pointer remains from a time when about
 half of this patch was done, and it was possible for the
 bdev-bd_disk-private_data to become corrupted.  The check
 should be removed eventually, but it is not expected to add
 significant overhead, occurring in the aoeblk_open routine.
 

 ...

 --- a/drivers/block/aoe/aoeblk.c
 +++ b/drivers/block/aoe/aoeblk.c
 @@ -147,9 +147,18 @@ aoeblk_open(struct block_device *bdev, fmode_t mode)
   struct aoedev *d = bdev-bd_disk-private_data;
   ulong flags;
  
 + if (!virt_addr_valid(d)) {
 + pr_crit(aoe: invalid device pointer in %s\n,
 + __func__);
 + WARN_ON(1);
 + return -ENODEV;
 + }

Can this ever happen?

 + if (!(d-flags  DEVFL_UP) || d-flags  DEVFL_TKILL)
 + return -ENODEV;
 +
   mutex_lock(aoeblk_mutex);
   spin_lock_irqsave(d-lock, flags);
 - if (d-flags  DEVFL_UP) {
 + if (d-flags  DEVFL_UP  !(d-flags  DEVFL_TKILL)) {
   d-nopen++;
   spin_unlock_irqrestore(d-lock, flags);
   mutex_unlock(aoeblk_mutex);
 @@ -259,6 +268,18 @@ aoeblk_gdalloc(void *vp)
   struct request_queue *q;
   enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
   ulong flags;
 + int late = 0;
 +
 + spin_lock_irqsave(d-lock, flags);
 + if (d-flags  DEVFL_GDALLOC
 +  !(d-flags  DEVFL_TKILL)
 +  !(d-flags  DEVFL_GD_NOW))

That's pretty sickly-looking code layout.

We often do

if ((d-flags  (DEVFL_GDALLOC|DEVFL_TKILL|DEVFL_GD_NOW)) ==
DEVFL_GDALLOC)

in these cases.

 + d-flags |= DEVFL_GD_NOW;
 + else
 + late = 1;
 + spin_unlock_irqrestore(d-lock, flags);
 + if (late)
 + return;
  
   gd = alloc_disk(AOE_PARTITIONS);
   if (gd == NULL) {

 ...


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/7] aoe: avoid races between device destruction and discovery

2012-12-03 Thread Ed Cashin
This change avoids a race that could result in a NULL pointer
derference following a WARNing from kobject_add_internal, "don't
try to register things with the same name in the same directory."

The problem was found with a test that forgets and discovers an
aoe device in a loop:

  while test ! -r /tmp/stop; do
aoe-flush -a
aoe-discover
  done

The race was between aoedev_flush taking aoedevs out of the
devlist, allowing a new discovery of the same AoE target to take
place before the driver gets around to calling
sysfs_remove_group.  Fixing that one revealed another race
between do_open and add_disk, and this patch avoids that, too.

The fix required some care, because for flushing (forgetting) an
aoedev, some of the steps must be performed under lock and some
must be able to sleep.  Also, for discovering a new aoedev, some
steps might sleep.

The check for a bad aoedev pointer remains from a time when about
half of this patch was done, and it was possible for the
bdev->bd_disk->private_data to become corrupted.  The check
should be removed eventually, but it is not expected to add
significant overhead, occurring in the aoeblk_open routine.

Signed-off-by: Ed Cashin 
---
 drivers/block/aoe/aoe.h|7 ++-
 drivers/block/aoe/aoeblk.c |   36 +-
 drivers/block/aoe/aoedev.c |  166 
 3 files changed, 146 insertions(+), 63 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index b6d2b16..d50e945 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -74,8 +74,11 @@ enum {
DEVFL_TKILL = (1<<1),   /* flag for timer to know when to kill self */
DEVFL_EXT = (1<<2), /* device accepts lba48 commands */
DEVFL_GDALLOC = (1<<3), /* need to alloc gendisk */
-   DEVFL_KICKME = (1<<4),  /* slow polling network card catch */
-   DEVFL_NEWSIZE = (1<<5), /* need to update dev size in block layer */
+   DEVFL_GD_NOW = (1<<4),  /* allocating gendisk */
+   DEVFL_KICKME = (1<<5),  /* slow polling network card catch */
+   DEVFL_NEWSIZE = (1<<6), /* need to update dev size in block layer */
+   DEVFL_FREEING = (1<<7), /* set when device is being cleaned up */
+   DEVFL_FREED = (1<<8),   /* device has been cleaned up */
 };
 
 enum {
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 57ac72c..6b5b787 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -147,9 +147,18 @@ aoeblk_open(struct block_device *bdev, fmode_t mode)
struct aoedev *d = bdev->bd_disk->private_data;
ulong flags;
 
+   if (!virt_addr_valid(d)) {
+   pr_crit("aoe: invalid device pointer in %s\n",
+   __func__);
+   WARN_ON(1);
+   return -ENODEV;
+   }
+   if (!(d->flags & DEVFL_UP) || d->flags & DEVFL_TKILL)
+   return -ENODEV;
+
mutex_lock(_mutex);
spin_lock_irqsave(>lock, flags);
-   if (d->flags & DEVFL_UP) {
+   if (d->flags & DEVFL_UP && !(d->flags & DEVFL_TKILL)) {
d->nopen++;
spin_unlock_irqrestore(>lock, flags);
mutex_unlock(_mutex);
@@ -259,6 +268,18 @@ aoeblk_gdalloc(void *vp)
struct request_queue *q;
enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
ulong flags;
+   int late = 0;
+
+   spin_lock_irqsave(>lock, flags);
+   if (d->flags & DEVFL_GDALLOC
+   && !(d->flags & DEVFL_TKILL)
+   && !(d->flags & DEVFL_GD_NOW))
+   d->flags |= DEVFL_GD_NOW;
+   else
+   late = 1;
+   spin_unlock_irqrestore(>lock, flags);
+   if (late)
+   return;
 
gd = alloc_disk(AOE_PARTITIONS);
if (gd == NULL) {
@@ -282,6 +303,11 @@ aoeblk_gdalloc(void *vp)
}
 
spin_lock_irqsave(>lock, flags);
+   WARN_ON(!(d->flags & DEVFL_GD_NOW));
+   WARN_ON(!(d->flags & DEVFL_GDALLOC));
+   WARN_ON(d->flags & DEVFL_TKILL);
+   WARN_ON(d->gd);
+   WARN_ON(d->flags & DEVFL_UP);
blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
q->backing_dev_info.name = "aoe";
q->backing_dev_info.ra_pages = READ_AHEAD / PAGE_CACHE_SIZE;
@@ -306,6 +332,11 @@ aoeblk_gdalloc(void *vp)
 
add_disk(gd);
aoedisk_add_sysfs(d);
+
+   spin_lock_irqsave(>lock, flags);
+   WARN_ON(!(d->flags & DEVFL_GD_NOW));
+   d->flags &= ~DEVFL_GD_NOW;
+   spin_unlock_irqrestore(>lock, flags);
return;
 
 err_mempool:
@@ -314,7 +345,8 @@ err_disk:
put_disk(gd);
 err:
spin_lock_irqsave(>lock, flags);
-   d->flags &= ~DEVFL_GDALLOC;
+   d->flags &= ~DEVFL_GD_NOW;
+   schedule_work(>work);
spin_unlock_irqrestore(>lock, flags);
 }
 
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index f0c0c74..3776715 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -15,7 +15,6 @@
 

[PATCH 2/7] aoe: avoid races between device destruction and discovery

2012-12-03 Thread Ed Cashin
This change avoids a race that could result in a NULL pointer
derference following a WARNing from kobject_add_internal, don't
try to register things with the same name in the same directory.

The problem was found with a test that forgets and discovers an
aoe device in a loop:

  while test ! -r /tmp/stop; do
aoe-flush -a
aoe-discover
  done

The race was between aoedev_flush taking aoedevs out of the
devlist, allowing a new discovery of the same AoE target to take
place before the driver gets around to calling
sysfs_remove_group.  Fixing that one revealed another race
between do_open and add_disk, and this patch avoids that, too.

The fix required some care, because for flushing (forgetting) an
aoedev, some of the steps must be performed under lock and some
must be able to sleep.  Also, for discovering a new aoedev, some
steps might sleep.

The check for a bad aoedev pointer remains from a time when about
half of this patch was done, and it was possible for the
bdev-bd_disk-private_data to become corrupted.  The check
should be removed eventually, but it is not expected to add
significant overhead, occurring in the aoeblk_open routine.

Signed-off-by: Ed Cashin ecas...@coraid.com
---
 drivers/block/aoe/aoe.h|7 ++-
 drivers/block/aoe/aoeblk.c |   36 +-
 drivers/block/aoe/aoedev.c |  166 
 3 files changed, 146 insertions(+), 63 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index b6d2b16..d50e945 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -74,8 +74,11 @@ enum {
DEVFL_TKILL = (11),   /* flag for timer to know when to kill self */
DEVFL_EXT = (12), /* device accepts lba48 commands */
DEVFL_GDALLOC = (13), /* need to alloc gendisk */
-   DEVFL_KICKME = (14),  /* slow polling network card catch */
-   DEVFL_NEWSIZE = (15), /* need to update dev size in block layer */
+   DEVFL_GD_NOW = (14),  /* allocating gendisk */
+   DEVFL_KICKME = (15),  /* slow polling network card catch */
+   DEVFL_NEWSIZE = (16), /* need to update dev size in block layer */
+   DEVFL_FREEING = (17), /* set when device is being cleaned up */
+   DEVFL_FREED = (18),   /* device has been cleaned up */
 };
 
 enum {
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 57ac72c..6b5b787 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -147,9 +147,18 @@ aoeblk_open(struct block_device *bdev, fmode_t mode)
struct aoedev *d = bdev-bd_disk-private_data;
ulong flags;
 
+   if (!virt_addr_valid(d)) {
+   pr_crit(aoe: invalid device pointer in %s\n,
+   __func__);
+   WARN_ON(1);
+   return -ENODEV;
+   }
+   if (!(d-flags  DEVFL_UP) || d-flags  DEVFL_TKILL)
+   return -ENODEV;
+
mutex_lock(aoeblk_mutex);
spin_lock_irqsave(d-lock, flags);
-   if (d-flags  DEVFL_UP) {
+   if (d-flags  DEVFL_UP  !(d-flags  DEVFL_TKILL)) {
d-nopen++;
spin_unlock_irqrestore(d-lock, flags);
mutex_unlock(aoeblk_mutex);
@@ -259,6 +268,18 @@ aoeblk_gdalloc(void *vp)
struct request_queue *q;
enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
ulong flags;
+   int late = 0;
+
+   spin_lock_irqsave(d-lock, flags);
+   if (d-flags  DEVFL_GDALLOC
+!(d-flags  DEVFL_TKILL)
+!(d-flags  DEVFL_GD_NOW))
+   d-flags |= DEVFL_GD_NOW;
+   else
+   late = 1;
+   spin_unlock_irqrestore(d-lock, flags);
+   if (late)
+   return;
 
gd = alloc_disk(AOE_PARTITIONS);
if (gd == NULL) {
@@ -282,6 +303,11 @@ aoeblk_gdalloc(void *vp)
}
 
spin_lock_irqsave(d-lock, flags);
+   WARN_ON(!(d-flags  DEVFL_GD_NOW));
+   WARN_ON(!(d-flags  DEVFL_GDALLOC));
+   WARN_ON(d-flags  DEVFL_TKILL);
+   WARN_ON(d-gd);
+   WARN_ON(d-flags  DEVFL_UP);
blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
q-backing_dev_info.name = aoe;
q-backing_dev_info.ra_pages = READ_AHEAD / PAGE_CACHE_SIZE;
@@ -306,6 +332,11 @@ aoeblk_gdalloc(void *vp)
 
add_disk(gd);
aoedisk_add_sysfs(d);
+
+   spin_lock_irqsave(d-lock, flags);
+   WARN_ON(!(d-flags  DEVFL_GD_NOW));
+   d-flags = ~DEVFL_GD_NOW;
+   spin_unlock_irqrestore(d-lock, flags);
return;
 
 err_mempool:
@@ -314,7 +345,8 @@ err_disk:
put_disk(gd);
 err:
spin_lock_irqsave(d-lock, flags);
-   d-flags = ~DEVFL_GDALLOC;
+   d-flags = ~DEVFL_GD_NOW;
+   schedule_work(d-work);
spin_unlock_irqrestore(d-lock, flags);
 }
 
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index f0c0c74..3776715 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -15,7 +15,6 @@
 #include aoe.h
 
 static void