Re: [PATCH 1/3] rbd: fix rbd_dev_parent_get() when parent_overlap == 0

2015-01-26 Thread Alex Elder
On 01/20/2015 06:41 AM, Ilya Dryomov wrote:
 The comment for rbd_dev_parent_get() said
 
 * We must get the reference before checking for the overlap to
 * coordinate properly with zeroing the parent overlap in
 * rbd_dev_v2_parent_info() when an image gets flattened.  We
 * drop it again if there is no overlap.
 
 but the drop it again if there is no overlap part was missing from
 the implementation.  This lead to absurd parent_ref values for images
 with parent_overlap == 0, as parent_ref was incremented for each
 img_request and virtually never decremented.

You're right about this.  If the image had a parent with no
overlap this would leak a reference to the parent image.  The
code should have said:

counter = atomic_inc_return_safe(rbd_dev-parent_ref);
if (counter  0) {
if (rbd_dev-parent_overlap)
return true;
atomic_dec(rbd_dev-parent_ref);
} else if (counter  0) {
rbd_warn(rbd_dev, parent reference overflow);
}

 Fix this by leveraging the fact that refresh path calls
 rbd_dev_v2_parent_info() under header_rwsem and use it for read in
 rbd_dev_parent_get(), instead of messing around with atomics.  Get rid
 of barriers in rbd_dev_v2_parent_info() while at it - I don't see what
 they'd pair with now and I suspect we are in a pretty miserable
 situation as far as proper locking goes regardless.

The point of the memory barrier was to ensure that when parent_overlap
gets zeroed, this code sees the zero rather than the old non-zero
value.  The atomic_inc_return_safe() call has an implicit memory
barrier to match the smp_mb() call.  It allowed the synchronization
to occur without the use of a lock.

We're trying to atomically determine whether an image request needs
to be marked as layered, to know how to handle ENOENT on parent reads.
If it is a write to an image with a parent having a non-zero overlap,
it's layered, otherwise we can treat it as a simple request.

I think in this particular case, this is just an optimization,
trying very hard to avoid having to do layered image handling
if the parent has become flattened.  I think that even if it
got old information (suggesting non-zero overlap) things would
behave correctly, just less efficiently.

Using the semaphore adds a lock to this path and therefore
implements whatever barriers are being removed.  I'm not
sure how often this is hit--maybe the optimization isn't
buying much after all.

I am getting a little rusty on some of details of what
precisely happens when a layered image gets flattened.
But I think this looks OK.  Maybe just watch for small
(perhaps insignificant) performance regressions with
this change in place...

Reviewed-by: Alex Elder el...@linaro.org

 Cc: sta...@vger.kernel.org # 3.11+
 Signed-off-by: Ilya Dryomov idryo...@redhat.com
 ---
  drivers/block/rbd.c | 20 ++--
  1 file changed, 6 insertions(+), 14 deletions(-)
 
 diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
 index 31fa00f0d707..2990a1c75159 100644
 --- a/drivers/block/rbd.c
 +++ b/drivers/block/rbd.c
 @@ -2098,32 +2098,26 @@ static void rbd_dev_parent_put(struct rbd_device 
 *rbd_dev)
   * If an image has a non-zero parent overlap, get a reference to its
   * parent.
   *
 - * We must get the reference before checking for the overlap to
 - * coordinate properly with zeroing the parent overlap in
 - * rbd_dev_v2_parent_info() when an image gets flattened.  We
 - * drop it again if there is no overlap.
 - *
   * Returns true if the rbd device has a parent with a non-zero
   * overlap and a reference for it was successfully taken, or
   * false otherwise.
   */
  static bool rbd_dev_parent_get(struct rbd_device *rbd_dev)
  {
 - int counter;
 + int counter = 0;
  
   if (!rbd_dev-parent_spec)
   return false;
  
 - counter = atomic_inc_return_safe(rbd_dev-parent_ref);
 - if (counter  0  rbd_dev-parent_overlap)
 - return true;
 -
 - /* Image was flattened, but parent is not yet torn down */
 + down_read(rbd_dev-header_rwsem);
 + if (rbd_dev-parent_overlap)
 + counter = atomic_inc_return_safe(rbd_dev-parent_ref);
 + up_read(rbd_dev-header_rwsem);
  
   if (counter  0)
   rbd_warn(rbd_dev, parent reference overflow);
  
 - return false;
 + return counter  0;
  }
  
  /*
 @@ -4238,7 +4232,6 @@ static int rbd_dev_v2_parent_info(struct rbd_device 
 *rbd_dev)
*/
   if (rbd_dev-parent_overlap) {
   rbd_dev-parent_overlap = 0;
 - smp_mb();
   rbd_dev_parent_put(rbd_dev);
   pr_info(%s: clone image has been flattened\n,
   rbd_dev-disk-disk_name);
 @@ -4284,7 +4277,6 @@ static int rbd_dev_v2_parent_info(struct rbd_device 
 *rbd_dev)
* treat it specially.
*/
   rbd_dev-parent_overlap = overlap;
 

[PATCH 1/3] rbd: fix rbd_dev_parent_get() when parent_overlap == 0

2015-01-20 Thread Ilya Dryomov
The comment for rbd_dev_parent_get() said

* We must get the reference before checking for the overlap to
* coordinate properly with zeroing the parent overlap in
* rbd_dev_v2_parent_info() when an image gets flattened.  We
* drop it again if there is no overlap.

but the drop it again if there is no overlap part was missing from
the implementation.  This lead to absurd parent_ref values for images
with parent_overlap == 0, as parent_ref was incremented for each
img_request and virtually never decremented.

Fix this by leveraging the fact that refresh path calls
rbd_dev_v2_parent_info() under header_rwsem and use it for read in
rbd_dev_parent_get(), instead of messing around with atomics.  Get rid
of barriers in rbd_dev_v2_parent_info() while at it - I don't see what
they'd pair with now and I suspect we are in a pretty miserable
situation as far as proper locking goes regardless.

Cc: sta...@vger.kernel.org # 3.11+
Signed-off-by: Ilya Dryomov idryo...@redhat.com
---
 drivers/block/rbd.c | 20 ++--
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 31fa00f0d707..2990a1c75159 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -2098,32 +2098,26 @@ static void rbd_dev_parent_put(struct rbd_device 
*rbd_dev)
  * If an image has a non-zero parent overlap, get a reference to its
  * parent.
  *
- * We must get the reference before checking for the overlap to
- * coordinate properly with zeroing the parent overlap in
- * rbd_dev_v2_parent_info() when an image gets flattened.  We
- * drop it again if there is no overlap.
- *
  * Returns true if the rbd device has a parent with a non-zero
  * overlap and a reference for it was successfully taken, or
  * false otherwise.
  */
 static bool rbd_dev_parent_get(struct rbd_device *rbd_dev)
 {
-   int counter;
+   int counter = 0;
 
if (!rbd_dev-parent_spec)
return false;
 
-   counter = atomic_inc_return_safe(rbd_dev-parent_ref);
-   if (counter  0  rbd_dev-parent_overlap)
-   return true;
-
-   /* Image was flattened, but parent is not yet torn down */
+   down_read(rbd_dev-header_rwsem);
+   if (rbd_dev-parent_overlap)
+   counter = atomic_inc_return_safe(rbd_dev-parent_ref);
+   up_read(rbd_dev-header_rwsem);
 
if (counter  0)
rbd_warn(rbd_dev, parent reference overflow);
 
-   return false;
+   return counter  0;
 }
 
 /*
@@ -4238,7 +4232,6 @@ static int rbd_dev_v2_parent_info(struct rbd_device 
*rbd_dev)
 */
if (rbd_dev-parent_overlap) {
rbd_dev-parent_overlap = 0;
-   smp_mb();
rbd_dev_parent_put(rbd_dev);
pr_info(%s: clone image has been flattened\n,
rbd_dev-disk-disk_name);
@@ -4284,7 +4277,6 @@ static int rbd_dev_v2_parent_info(struct rbd_device 
*rbd_dev)
 * treat it specially.
 */
rbd_dev-parent_overlap = overlap;
-   smp_mb();
if (!overlap) {
 
/* A null parent_spec indicates it's the initial probe */
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] rbd: fix rbd_dev_parent_get() when parent_overlap == 0

2015-01-20 Thread Josh Durgin

On 01/20/2015 04:41 AM, Ilya Dryomov wrote:

The comment for rbd_dev_parent_get() said

 * We must get the reference before checking for the overlap to
 * coordinate properly with zeroing the parent overlap in
 * rbd_dev_v2_parent_info() when an image gets flattened.  We
 * drop it again if there is no overlap.

but the drop it again if there is no overlap part was missing from
the implementation.  This lead to absurd parent_ref values for images
with parent_overlap == 0, as parent_ref was incremented for each
img_request and virtually never decremented.

Fix this by leveraging the fact that refresh path calls
rbd_dev_v2_parent_info() under header_rwsem and use it for read in
rbd_dev_parent_get(), instead of messing around with atomics.  Get rid
of barriers in rbd_dev_v2_parent_info() while at it - I don't see what
they'd pair with now and I suspect we are in a pretty miserable
situation as far as proper locking goes regardless.


Yeah, looks like we need some refactoring to read parent_overlap safely
in the I/O path in a few places.

Reviewed-by: Josh Durgin jdur...@redhat.com


Cc: sta...@vger.kernel.org # 3.11+
Signed-off-by: Ilya Dryomov idryo...@redhat.com
---
  drivers/block/rbd.c | 20 ++--
  1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 31fa00f0d707..2990a1c75159 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -2098,32 +2098,26 @@ static void rbd_dev_parent_put(struct rbd_device 
*rbd_dev)
   * If an image has a non-zero parent overlap, get a reference to its
   * parent.
   *
- * We must get the reference before checking for the overlap to
- * coordinate properly with zeroing the parent overlap in
- * rbd_dev_v2_parent_info() when an image gets flattened.  We
- * drop it again if there is no overlap.
- *
   * Returns true if the rbd device has a parent with a non-zero
   * overlap and a reference for it was successfully taken, or
   * false otherwise.
   */
  static bool rbd_dev_parent_get(struct rbd_device *rbd_dev)
  {
-   int counter;
+   int counter = 0;

if (!rbd_dev-parent_spec)
return false;

-   counter = atomic_inc_return_safe(rbd_dev-parent_ref);
-   if (counter  0  rbd_dev-parent_overlap)
-   return true;
-
-   /* Image was flattened, but parent is not yet torn down */
+   down_read(rbd_dev-header_rwsem);
+   if (rbd_dev-parent_overlap)
+   counter = atomic_inc_return_safe(rbd_dev-parent_ref);
+   up_read(rbd_dev-header_rwsem);

if (counter  0)
rbd_warn(rbd_dev, parent reference overflow);

-   return false;
+   return counter  0;
  }

  /*
@@ -4238,7 +4232,6 @@ static int rbd_dev_v2_parent_info(struct rbd_device 
*rbd_dev)
 */
if (rbd_dev-parent_overlap) {
rbd_dev-parent_overlap = 0;
-   smp_mb();
rbd_dev_parent_put(rbd_dev);
pr_info(%s: clone image has been flattened\n,
rbd_dev-disk-disk_name);
@@ -4284,7 +4277,6 @@ static int rbd_dev_v2_parent_info(struct rbd_device 
*rbd_dev)
 * treat it specially.
 */
rbd_dev-parent_overlap = overlap;
-   smp_mb();
if (!overlap) {

/* A null parent_spec indicates it's the initial probe */



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html