[PATCH] block: flush queued bios when the process blocks

2014-06-26 Thread Mikulas Patocka
Hi Jens

Would you please consider applying this patch?

Regarding your idea about merging bio and request plugging - I think this 
could be done later when it is properly designed by Kent Overstreet or 
someone else.

We need this patch to fix the deadlock in dm snapshot.

Mikulas


-- Forwarded message --
Date: Tue, 27 May 2014 11:03:36 -0400 (EDT)
From: Mikulas Patocka 
To: Jens Axboe , Kent Overstreet 
Cc: linux-kernel@vger.kernel.org, dm-de...@redhat.com,
Alasdair G. Kergon , Mike Snitzer 
Subject: [PATCH] block: flush queued bios when the process blocks

The block layer uses per-process bio list to avoid recursion in
generic_make_request. When generic_make_request is called recursively, the
bio is added to current->bio_list and the function returns immediatelly.
The top-level instance of generic_make_requests takes bios from
current->bio_list and processes them.

This bio queuing can result in deadlocks. The following deadlock was
observed:

1) Process A sends one-page read bio to the dm-snapshot target. The bio
spans snapshot chunk boundary and so it is split to two bios by device
mapper.

2) Device mapper creates the first sub-bio and sends it to the snapshot
driver.

3) The function snapshot_map calls track_chunk (that allocates a structure
dm_snap_tracked_chunk and adds it to tracked_chunk_hash) and then it
remaps the bio to the underlying linear target and exits with
DM_MAPIO_REMAPPED.

4) The remapped bio is submitted with generic_make_request, but it isn't
processed - it is added to current->bio_list instead.

5) Meanwhile, process B executes pending_complete for the affected chunk,
it takes down_write(>lock) and then loops in
__check_for_conflicting_io, waiting for dm_snap_tracked_chunk created in
step 3) to be released.

6) Process A continues, it creates a new bio for the rest of the original
bio.

7) snapshot_map is called for this new bio, it waits on
down_write(>lock) that is held in step 5).

The resulting deadlock:
* bio added to current->bio_list at step 4) waits until the function in
  step 7) finishes
* the function in step 7) waits until s->lock held in step 5) is released
* the process in step 5) waits until the bio queued in step 4) finishes

The general problem is that queuing bios on current->bio_list introduces
additional lock dependencies. If a device mapper target sends a bio to
some block device, it assumes that the bio only takes locks of the target
block device or devices that are below the target device. However, if the
bio is added to queue on current->bio_list, it creates artifical locking
dependency on locks taken by other bios that are on current->bio_list. In
the above scenario, this artifical locking dependency results in
deadlock.

Kent Overstreet already created a workqueue for every bio set and there is
a code that tries to resolve some low-memory deadlocks by redirecting bios
queued on current->bio_list to the workqueue if the system is low on
memory. However, other deadlocks (as described above) may happen without
any low memory condition.

This patch generalizes Kent's concept, it redirects bios on
current->bio_list to the bio_set's workqueue on every schedule call.
Consequently, when the process blocks on a mutex, the bios queued on
current->bio_list are dispatched to independent workqueus and they can
complete without waiting for the mutex to be available.

Bios allocated with bio_kmalloc do not have bio_set, so they are not
redirected, however bio_kmalloc shouldn't be used by stacking drivers (it
is currently used by raid1.c and raid10.c, we need to change it to
bio_set).


Note to stable kernel maintainers: before backporting this patch, you also
need to backport df2cb6daa4cbc34406bc4b1ac9b9335df1083a72.

Signed-off-by: Mikulas Patocka 
Cc: sta...@vger.kernel.org

---
 fs/bio.c   |   84 ++---
 include/linux/blkdev.h |7 +++-
 kernel/sched/core.c|7 
 3 files changed, 37 insertions(+), 61 deletions(-)

Index: linux-3.15-rc5/fs/bio.c
===
--- linux-3.15-rc5.orig/fs/bio.c2014-05-26 19:02:47.0 +0200
+++ linux-3.15-rc5/fs/bio.c 2014-05-27 00:00:13.0 +0200
@@ -342,35 +342,34 @@ static void bio_alloc_rescue(struct work
}
 }
 
-static void punt_bios_to_rescuer(struct bio_set *bs)
+/**
+ * blk_flush_bio_list
+ *
+ * Pop bios queued on current->bio_list and submit each of them to
+ * their rescue workqueue.
+ *
+ * If the bio doesn't have a bio_set, we leave it on current->bio_list.
+ * However, stacking drivers should use bio_set, so this shouldn't be
+ * an issue.
+ */
+void blk_flush_bio_list(void)
 {
-   struct bio_list punt, nopunt;
struct bio *bio;
+   struct bio_list list = *current->bio_list;
+   bio_list_init(current->bio_list);
 
-   /*
-* In order to guarantee forward progress we must

[PATCH] block: flush queued bios when the process blocks

2014-06-26 Thread Mikulas Patocka
Hi Jens

Would you please consider applying this patch?

Regarding your idea about merging bio and request plugging - I think this 
could be done later when it is properly designed by Kent Overstreet or 
someone else.

We need this patch to fix the deadlock in dm snapshot.

Mikulas


-- Forwarded message --
Date: Tue, 27 May 2014 11:03:36 -0400 (EDT)
From: Mikulas Patocka mpato...@redhat.com
To: Jens Axboe ax...@kernel.dk, Kent Overstreet k...@daterainc.com
Cc: linux-kernel@vger.kernel.org, dm-de...@redhat.com,
Alasdair G. Kergon a...@redhat.com, Mike Snitzer msnit...@redhat.com
Subject: [PATCH] block: flush queued bios when the process blocks

The block layer uses per-process bio list to avoid recursion in
generic_make_request. When generic_make_request is called recursively, the
bio is added to current-bio_list and the function returns immediatelly.
The top-level instance of generic_make_requests takes bios from
current-bio_list and processes them.

This bio queuing can result in deadlocks. The following deadlock was
observed:

1) Process A sends one-page read bio to the dm-snapshot target. The bio
spans snapshot chunk boundary and so it is split to two bios by device
mapper.

2) Device mapper creates the first sub-bio and sends it to the snapshot
driver.

3) The function snapshot_map calls track_chunk (that allocates a structure
dm_snap_tracked_chunk and adds it to tracked_chunk_hash) and then it
remaps the bio to the underlying linear target and exits with
DM_MAPIO_REMAPPED.

4) The remapped bio is submitted with generic_make_request, but it isn't
processed - it is added to current-bio_list instead.

5) Meanwhile, process B executes pending_complete for the affected chunk,
it takes down_write(s-lock) and then loops in
__check_for_conflicting_io, waiting for dm_snap_tracked_chunk created in
step 3) to be released.

6) Process A continues, it creates a new bio for the rest of the original
bio.

7) snapshot_map is called for this new bio, it waits on
down_write(s-lock) that is held in step 5).

The resulting deadlock:
* bio added to current-bio_list at step 4) waits until the function in
  step 7) finishes
* the function in step 7) waits until s-lock held in step 5) is released
* the process in step 5) waits until the bio queued in step 4) finishes

The general problem is that queuing bios on current-bio_list introduces
additional lock dependencies. If a device mapper target sends a bio to
some block device, it assumes that the bio only takes locks of the target
block device or devices that are below the target device. However, if the
bio is added to queue on current-bio_list, it creates artifical locking
dependency on locks taken by other bios that are on current-bio_list. In
the above scenario, this artifical locking dependency results in
deadlock.

Kent Overstreet already created a workqueue for every bio set and there is
a code that tries to resolve some low-memory deadlocks by redirecting bios
queued on current-bio_list to the workqueue if the system is low on
memory. However, other deadlocks (as described above) may happen without
any low memory condition.

This patch generalizes Kent's concept, it redirects bios on
current-bio_list to the bio_set's workqueue on every schedule call.
Consequently, when the process blocks on a mutex, the bios queued on
current-bio_list are dispatched to independent workqueus and they can
complete without waiting for the mutex to be available.

Bios allocated with bio_kmalloc do not have bio_set, so they are not
redirected, however bio_kmalloc shouldn't be used by stacking drivers (it
is currently used by raid1.c and raid10.c, we need to change it to
bio_set).


Note to stable kernel maintainers: before backporting this patch, you also
need to backport df2cb6daa4cbc34406bc4b1ac9b9335df1083a72.

Signed-off-by: Mikulas Patocka mpato...@redhat.com
Cc: sta...@vger.kernel.org

---
 fs/bio.c   |   84 ++---
 include/linux/blkdev.h |7 +++-
 kernel/sched/core.c|7 
 3 files changed, 37 insertions(+), 61 deletions(-)

Index: linux-3.15-rc5/fs/bio.c
===
--- linux-3.15-rc5.orig/fs/bio.c2014-05-26 19:02:47.0 +0200
+++ linux-3.15-rc5/fs/bio.c 2014-05-27 00:00:13.0 +0200
@@ -342,35 +342,34 @@ static void bio_alloc_rescue(struct work
}
 }
 
-static void punt_bios_to_rescuer(struct bio_set *bs)
+/**
+ * blk_flush_bio_list
+ *
+ * Pop bios queued on current-bio_list and submit each of them to
+ * their rescue workqueue.
+ *
+ * If the bio doesn't have a bio_set, we leave it on current-bio_list.
+ * However, stacking drivers should use bio_set, so this shouldn't be
+ * an issue.
+ */
+void blk_flush_bio_list(void)
 {
-   struct bio_list punt, nopunt;
struct bio *bio;
+   struct bio_list list = *current-bio_list;
+   bio_list_init(current-bio_list

Re: [PATCH] block: flush queued bios when the process blocks

2014-05-29 Thread Mikulas Patocka


On Tue, 27 May 2014, Jens Axboe wrote:

> On 2014-05-27 10:26, Mikulas Patocka wrote:
> > On Tue, 27 May 2014, Jens Axboe wrote:
> > 
> > > On 2014-05-27 09:23, Mikulas Patocka wrote:
> > > 
> > > > The patch adds bio list flushing to the scheduler just besides plug
> > > > flushsing.
> > > 
> > > ... which is exactly why I'm commenting. It'd be great to avoid yet one
> > > more
> > > scheduler hook for this sort of thing.
> > > 
> > > --
> > > Jens Axboe
> > 
> > One could create something like schedule notifier chain, but I'm not sure
> > if it is worth the complexity because of just two users. If more users
> > come in the future, it could be generalized.
> 
> Except such a thing already exists, there are unplug callback chains. All I'm
> asking is that you look into how feasible it would be to use something like
> that, instead of reinventing the wheel.
> 
> -- 
> Jens Axboe


You can use this patch as an example that moves current->bio_list to 
struct plug, but I don't recommend to put it in the kernel - this patch 
still has some issues (some lvm raid tests fail) - and at -rc7 stage we 
should really be fixing bugs and not rearchitecting the code.


You should commit my original patch because it is small and it generated 
no regressions for me. Think about stable kernels and enterprise 
distributions - the smaller the patch is, the easier it is to backport.


Mikulas


---
 block/blk-core.c   |   19 ---
 drivers/md/dm-bufio.c  |2 +-
 drivers/md/raid1.c |6 +++---
 drivers/md/raid10.c|6 +++---
 fs/bio.c   |   21 +
 include/linux/blkdev.h |7 +--
 include/linux/sched.h  |4 
 kernel/sched/core.c|7 ---
 8 files changed, 33 insertions(+), 39 deletions(-)

Index: linux-3.15-rc5/block/blk-core.c
===
--- linux-3.15-rc5.orig/block/blk-core.c2014-05-29 23:06:29.0 
+0200
+++ linux-3.15-rc5/block/blk-core.c 2014-05-30 00:30:41.0 +0200
@@ -1828,7 +1828,7 @@ end_io:
  */
 void generic_make_request(struct bio *bio)
 {
-   struct bio_list bio_list_on_stack;
+   struct blk_plug plug;
 
if (!generic_make_request_checks(bio))
return;
@@ -1858,8 +1858,8 @@ void generic_make_request(struct bio *bi
 * it is non-NULL, then a make_request is active, and new requests
 * should be added at the tail
 */
-   if (current->bio_list) {
-   bio_list_add(current->bio_list, bio);
+   if (current->plug) {
+   bio_list_add(>plug->bio_list, bio);
return;
}
 
@@ -1877,17 +1877,18 @@ void generic_make_request(struct bio *bi
 * of the top of the list (no pretending) and so remove it from
 * bio_list, and call into ->make_request() again.
 */
+   blk_start_plug();
+   current->plug->in_generic_make_request = 1;
BUG_ON(bio->bi_next);
-   bio_list_init(_list_on_stack);
-   current->bio_list = _list_on_stack;
do {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
q->make_request_fn(q, bio);
 
-   bio = bio_list_pop(current->bio_list);
+   bio = bio_list_pop(>plug->bio_list);
} while (bio);
-   current->bio_list = NULL; /* deactivate */
+   current->plug->in_generic_make_request = 0;
+   blk_finish_plug();
 }
 EXPORT_SYMBOL(generic_make_request);
 
@@ -2965,6 +2966,8 @@ void blk_start_plug(struct blk_plug *plu
INIT_LIST_HEAD(>list);
INIT_LIST_HEAD(>mq_list);
INIT_LIST_HEAD(>cb_list);
+   bio_list_init(>bio_list);
+   plug->in_generic_make_request = 0;
 
/*
 * If this is a nested plug, don't actually assign it. It will be
@@ -3060,6 +3063,8 @@ void blk_flush_plug_list(struct blk_plug
 
BUG_ON(plug->magic != PLUG_MAGIC);
 
+   blk_flush_bio_list(plug);
+
flush_plug_callbacks(plug, from_schedule);
 
if (!list_empty(>mq_list))
Index: linux-3.15-rc5/include/linux/blkdev.h
===
--- linux-3.15-rc5.orig/include/linux/blkdev.h  2014-05-29 23:05:46.0 
+0200
+++ linux-3.15-rc5/include/linux/blkdev.h   2014-05-30 00:30:54.0 
+0200
@@ -1061,6 +1061,8 @@ struct blk_plug {
struct list_head list; /* requests */
struct list_head mq_list; /* blk-mq requests */
struct list_head cb_list; /* md requires an unplug callback */
+   struct bio_list bio_list; /* list of queued bios */
+   int in_generic_make_request;
 };
 #define BLK_MAX_REQUEST_COUNT 16
 
@@ -1100,10 +1102,11 @@ static inline bool blk_needs_flush_plug(
return plug &&
(!list_empty(>list) ||
 !list_empty(>mq_list) ||
-!list_empty(>cb_list));
+!list_empty(>cb_list) ||
+

Re: [PATCH] block: flush queued bios when the process blocks

2014-05-29 Thread Mikulas Patocka


On Tue, 27 May 2014, Jens Axboe wrote:

 On 2014-05-27 10:26, Mikulas Patocka wrote:
  On Tue, 27 May 2014, Jens Axboe wrote:
  
   On 2014-05-27 09:23, Mikulas Patocka wrote:
   
The patch adds bio list flushing to the scheduler just besides plug
flushsing.
   
   ... which is exactly why I'm commenting. It'd be great to avoid yet one
   more
   scheduler hook for this sort of thing.
   
   --
   Jens Axboe
  
  One could create something like schedule notifier chain, but I'm not sure
  if it is worth the complexity because of just two users. If more users
  come in the future, it could be generalized.
 
 Except such a thing already exists, there are unplug callback chains. All I'm
 asking is that you look into how feasible it would be to use something like
 that, instead of reinventing the wheel.
 
 -- 
 Jens Axboe


You can use this patch as an example that moves current-bio_list to 
struct plug, but I don't recommend to put it in the kernel - this patch 
still has some issues (some lvm raid tests fail) - and at -rc7 stage we 
should really be fixing bugs and not rearchitecting the code.


You should commit my original patch because it is small and it generated 
no regressions for me. Think about stable kernels and enterprise 
distributions - the smaller the patch is, the easier it is to backport.


Mikulas


---
 block/blk-core.c   |   19 ---
 drivers/md/dm-bufio.c  |2 +-
 drivers/md/raid1.c |6 +++---
 drivers/md/raid10.c|6 +++---
 fs/bio.c   |   21 +
 include/linux/blkdev.h |7 +--
 include/linux/sched.h  |4 
 kernel/sched/core.c|7 ---
 8 files changed, 33 insertions(+), 39 deletions(-)

Index: linux-3.15-rc5/block/blk-core.c
===
--- linux-3.15-rc5.orig/block/blk-core.c2014-05-29 23:06:29.0 
+0200
+++ linux-3.15-rc5/block/blk-core.c 2014-05-30 00:30:41.0 +0200
@@ -1828,7 +1828,7 @@ end_io:
  */
 void generic_make_request(struct bio *bio)
 {
-   struct bio_list bio_list_on_stack;
+   struct blk_plug plug;
 
if (!generic_make_request_checks(bio))
return;
@@ -1858,8 +1858,8 @@ void generic_make_request(struct bio *bi
 * it is non-NULL, then a make_request is active, and new requests
 * should be added at the tail
 */
-   if (current-bio_list) {
-   bio_list_add(current-bio_list, bio);
+   if (current-plug) {
+   bio_list_add(current-plug-bio_list, bio);
return;
}
 
@@ -1877,17 +1877,18 @@ void generic_make_request(struct bio *bi
 * of the top of the list (no pretending) and so remove it from
 * bio_list, and call into -make_request() again.
 */
+   blk_start_plug(plug);
+   current-plug-in_generic_make_request = 1;
BUG_ON(bio-bi_next);
-   bio_list_init(bio_list_on_stack);
-   current-bio_list = bio_list_on_stack;
do {
struct request_queue *q = bdev_get_queue(bio-bi_bdev);
 
q-make_request_fn(q, bio);
 
-   bio = bio_list_pop(current-bio_list);
+   bio = bio_list_pop(current-plug-bio_list);
} while (bio);
-   current-bio_list = NULL; /* deactivate */
+   current-plug-in_generic_make_request = 0;
+   blk_finish_plug(plug);
 }
 EXPORT_SYMBOL(generic_make_request);
 
@@ -2965,6 +2966,8 @@ void blk_start_plug(struct blk_plug *plu
INIT_LIST_HEAD(plug-list);
INIT_LIST_HEAD(plug-mq_list);
INIT_LIST_HEAD(plug-cb_list);
+   bio_list_init(plug-bio_list);
+   plug-in_generic_make_request = 0;
 
/*
 * If this is a nested plug, don't actually assign it. It will be
@@ -3060,6 +3063,8 @@ void blk_flush_plug_list(struct blk_plug
 
BUG_ON(plug-magic != PLUG_MAGIC);
 
+   blk_flush_bio_list(plug);
+
flush_plug_callbacks(plug, from_schedule);
 
if (!list_empty(plug-mq_list))
Index: linux-3.15-rc5/include/linux/blkdev.h
===
--- linux-3.15-rc5.orig/include/linux/blkdev.h  2014-05-29 23:05:46.0 
+0200
+++ linux-3.15-rc5/include/linux/blkdev.h   2014-05-30 00:30:54.0 
+0200
@@ -1061,6 +1061,8 @@ struct blk_plug {
struct list_head list; /* requests */
struct list_head mq_list; /* blk-mq requests */
struct list_head cb_list; /* md requires an unplug callback */
+   struct bio_list bio_list; /* list of queued bios */
+   int in_generic_make_request;
 };
 #define BLK_MAX_REQUEST_COUNT 16
 
@@ -1100,10 +1102,11 @@ static inline bool blk_needs_flush_plug(
return plug 
(!list_empty(plug-list) ||
 !list_empty(plug-mq_list) ||
-!list_empty(plug-cb_list));
+!list_empty(plug-cb_list) ||
+!bio_list_empty(plug-bio_list));
 }
 

Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Kent Overstreet
On Tue, May 27, 2014 at 03:56:00PM -0400, Mikulas Patocka wrote:
> 
> 
> On Tue, 27 May 2014, Jens Axboe wrote:
> 
> > On 2014-05-27 10:26, Mikulas Patocka wrote:
> > > On Tue, 27 May 2014, Jens Axboe wrote:
> > > 
> > > > On 2014-05-27 09:23, Mikulas Patocka wrote:
> > > > 
> > > > > The patch adds bio list flushing to the scheduler just besides plug
> > > > > flushsing.
> > > > 
> > > > ... which is exactly why I'm commenting. It'd be great to avoid yet one
> > > > more
> > > > scheduler hook for this sort of thing.
> > > > 
> > > > --
> > > > Jens Axboe
> > > 
> > > One could create something like schedule notifier chain, but I'm not sure
> > > if it is worth the complexity because of just two users. If more users
> > > come in the future, it could be generalized.
> > 
> > Except such a thing already exists, there are unplug callback chains. All 
> > I'm
> > asking is that you look into how feasible it would be to use something like
> > that, instead of reinventing the wheel.
> > 
> > -- 
> > Jens Axboe
> 
> Do you mean moving current->bio_list to struct blk_plug and calling 
> blk_start_plug/blk_finish_plug around generic_make_request?
> 
> It would be possible on a condition that we can redirect all bios to a 
> workqueue (i.e. eliminate bio_kmalloc and always use bio_alloc_bioset).
> 
> What are performance implications of this - does it make sense to have 
> blk_start_plug/blk_finish_plug around every call to generic_make_request? 
> - that means that all i/o requests will be added to a plug and then 
> unplugged.

We've already got blk_start_plug() calls around IO submission at higher points
in the stack. (I actually have seen it show up in profiles though, it probably
would be worth inlining and slimming down a bit).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Kent Overstreet
On Tue, May 27, 2014 at 11:14:15AM -0700, Christoph Hellwig wrote:
> On Tue, May 27, 2014 at 11:42:26AM -0600, Jens Axboe wrote:
> > Except such a thing already exists, there are unplug callback
> > chains. All I'm asking is that you look into how feasible it would
> > be to use something like that, instead of reinventing the wheel.
> 
> I suspect the code from MD could be lifted quite easily.
> 
> It would be nicer to move it up to the block layer entirely, but when I
> tried that a while ago I ran into issues that I unfortunately don't
> remember.

How's md do it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Mikulas Patocka


On Tue, 27 May 2014, Jens Axboe wrote:

> On 2014-05-27 10:26, Mikulas Patocka wrote:
> > On Tue, 27 May 2014, Jens Axboe wrote:
> > 
> > > On 2014-05-27 09:23, Mikulas Patocka wrote:
> > > 
> > > > The patch adds bio list flushing to the scheduler just besides plug
> > > > flushsing.
> > > 
> > > ... which is exactly why I'm commenting. It'd be great to avoid yet one
> > > more
> > > scheduler hook for this sort of thing.
> > > 
> > > --
> > > Jens Axboe
> > 
> > One could create something like schedule notifier chain, but I'm not sure
> > if it is worth the complexity because of just two users. If more users
> > come in the future, it could be generalized.
> 
> Except such a thing already exists, there are unplug callback chains. All I'm
> asking is that you look into how feasible it would be to use something like
> that, instead of reinventing the wheel.
> 
> -- 
> Jens Axboe

Do you mean moving current->bio_list to struct blk_plug and calling 
blk_start_plug/blk_finish_plug around generic_make_request?

It would be possible on a condition that we can redirect all bios to a 
workqueue (i.e. eliminate bio_kmalloc and always use bio_alloc_bioset).

What are performance implications of this - does it make sense to have 
blk_start_plug/blk_finish_plug around every call to generic_make_request? 
- that means that all i/o requests will be added to a plug and then 
unplugged.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Christoph Hellwig
On Tue, May 27, 2014 at 11:42:26AM -0600, Jens Axboe wrote:
> Except such a thing already exists, there are unplug callback
> chains. All I'm asking is that you look into how feasible it would
> be to use something like that, instead of reinventing the wheel.

I suspect the code from MD could be lifted quite easily.

It would be nicer to move it up to the block layer entirely, but when I
tried that a while ago I ran into issues that I unfortunately don't
remember.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Kent Overstreet
On Tue, May 27, 2014 at 8:08 AM, Jens Axboe  wrote:
> This really begs the question of why we just don't use the per-process plugs
> for this. We already have scheduler hooks in place to flush those at the
> appropriate time. Why are we reinventing something for essentially the same
> thing?

Yes! Unifying the two plugging mechanisms has been on my todo/wishlist for ages.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Jens Axboe

On 2014-05-27 10:26, Mikulas Patocka wrote:

On Tue, 27 May 2014, Jens Axboe wrote:


On 2014-05-27 09:23, Mikulas Patocka wrote:


The patch adds bio list flushing to the scheduler just besides plug
flushsing.


... which is exactly why I'm commenting. It'd be great to avoid yet one more
scheduler hook for this sort of thing.

--
Jens Axboe


One could create something like schedule notifier chain, but I'm not sure
if it is worth the complexity because of just two users. If more users
come in the future, it could be generalized.


Except such a thing already exists, there are unplug callback chains. 
All I'm asking is that you look into how feasible it would be to use 
something like that, instead of reinventing the wheel.


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Mikulas Patocka
On Tue, 27 May 2014, Jens Axboe wrote:

> On 2014-05-27 09:23, Mikulas Patocka wrote:
> 
> > The patch adds bio list flushing to the scheduler just besides plug
> > flushsing.
> 
> ... which is exactly why I'm commenting. It'd be great to avoid yet one more
> scheduler hook for this sort of thing.
> 
> -- 
> Jens Axboe

One could create something like schedule notifier chain, but I'm not sure 
if it is worth the complexity because of just two users. If more users 
come in the future, it could be generalized.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Jens Axboe

On 2014-05-27 09:23, Mikulas Patocka wrote:



On Tue, 27 May 2014, Jens Axboe wrote:


On 2014-05-27 09:03, Mikulas Patocka wrote:

The block layer uses per-process bio list to avoid recursion in
generic_make_request. When generic_make_request is called recursively, the
bio is added to current->bio_list and the function returns immediatelly.
The top-level instance of generic_make_requests takes bios from
current->bio_list and processes them.


This really begs the question of why we just don't use the per-process plugs
for this. We already have scheduler hooks in place to flush those at the
appropriate time. Why are we reinventing something for essentially the same
thing?

--
Jens Axboe


Plugs work with requests, this patch works with bios. They are different
structures, so you can't use one infrastructure to process them.


Yes... I realize the list and plugs are for requests. But there's 
nothing preventing a non-rq hook, we have uses like that too. And it 
could easily be extended to handle bio lists, too.



The patch adds bio list flushing to the scheduler just besides plug
flushsing.


... which is exactly why I'm commenting. It'd be great to avoid yet one 
more scheduler hook for this sort of thing.


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Mikulas Patocka


On Tue, 27 May 2014, Jens Axboe wrote:

> On 2014-05-27 09:03, Mikulas Patocka wrote:
> > The block layer uses per-process bio list to avoid recursion in
> > generic_make_request. When generic_make_request is called recursively, the
> > bio is added to current->bio_list and the function returns immediatelly.
> > The top-level instance of generic_make_requests takes bios from
> > current->bio_list and processes them.
> 
> This really begs the question of why we just don't use the per-process plugs
> for this. We already have scheduler hooks in place to flush those at the
> appropriate time. Why are we reinventing something for essentially the same
> thing?
> 
> -- 
> Jens Axboe

Plugs work with requests, this patch works with bios. They are different 
structures, so you can't use one infrastructure to process them.

The patch adds bio list flushing to the scheduler just besides plug 
flushsing.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Jens Axboe

On 2014-05-27 09:03, Mikulas Patocka wrote:

The block layer uses per-process bio list to avoid recursion in
generic_make_request. When generic_make_request is called recursively, the
bio is added to current->bio_list and the function returns immediatelly.
The top-level instance of generic_make_requests takes bios from
current->bio_list and processes them.


This really begs the question of why we just don't use the per-process 
plugs for this. We already have scheduler hooks in place to flush those 
at the appropriate time. Why are we reinventing something for 
essentially the same thing?


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Mikulas Patocka
The block layer uses per-process bio list to avoid recursion in
generic_make_request. When generic_make_request is called recursively, the
bio is added to current->bio_list and the function returns immediatelly.
The top-level instance of generic_make_requests takes bios from
current->bio_list and processes them.

This bio queuing can result in deadlocks. The following deadlock was
observed:

1) Process A sends one-page read bio to the dm-snapshot target. The bio
spans snapshot chunk boundary and so it is split to two bios by device
mapper.

2) Device mapper creates the first sub-bio and sends it to the snapshot
driver.

3) The function snapshot_map calls track_chunk (that allocates a structure
dm_snap_tracked_chunk and adds it to tracked_chunk_hash) and then it
remaps the bio to the underlying linear target and exits with
DM_MAPIO_REMAPPED.

4) The remapped bio is submitted with generic_make_request, but it isn't
processed - it is added to current->bio_list instead.

5) Meanwhile, process B executes pending_complete for the affected chunk,
it takes down_write(>lock) and then loops in
__check_for_conflicting_io, waiting for dm_snap_tracked_chunk created in
step 3) to be released.

6) Process A continues, it creates a new bio for the rest of the original
bio.

7) snapshot_map is called for this new bio, it waits on
down_write(>lock) that is held in step 5).

The resulting deadlock:
* bio added to current->bio_list at step 4) waits until the function in
  step 7) finishes
* the function in step 7) waits until s->lock held in step 5) is released
* the process in step 5) waits until the bio queued in step 4) finishes

The general problem is that queuing bios on current->bio_list introduces
additional lock dependencies. If a device mapper target sends a bio to
some block device, it assumes that the bio only takes locks of the target
block device or devices that are below the target device. However, if the
bio is added to queue on current->bio_list, it creates artifical locking
dependency on locks taken by other bios that are on current->bio_list. In
the above scenario, this artifical locking dependency results in
deadlock.

Kent Overstreet already created a workqueue for every bio set and there is
a code that tries to resolve some low-memory deadlocks by redirecting bios
queued on current->bio_list to the workqueue if the system is low on
memory. However, other deadlocks (as described above) may happen without
any low memory condition.

This patch generalizes Kent's concept, it redirects bios on
current->bio_list to the bio_set's workqueue on every schedule call.
Consequently, when the process blocks on a mutex, the bios queued on
current->bio_list are dispatched to independent workqueus and they can
complete without waiting for the mutex to be available.

Bios allocated with bio_kmalloc do not have bio_set, so they are not
redirected, however bio_kmalloc shouldn't be used by stacking drivers (it
is currently used by raid1.c and raid10.c, we need to change it to
bio_set).


Note to stable kernel maintainers: before backporting this patch, you also
need to backport df2cb6daa4cbc34406bc4b1ac9b9335df1083a72.

Signed-off-by: Mikulas Patocka 
Cc: sta...@vger.kernel.org

---
 fs/bio.c   |   84 ++---
 include/linux/blkdev.h |7 +++-
 kernel/sched/core.c|7 
 3 files changed, 37 insertions(+), 61 deletions(-)

Index: linux-3.15-rc5/fs/bio.c
===
--- linux-3.15-rc5.orig/fs/bio.c2014-05-26 19:02:47.0 +0200
+++ linux-3.15-rc5/fs/bio.c 2014-05-27 00:00:13.0 +0200
@@ -342,35 +342,34 @@ static void bio_alloc_rescue(struct work
}
 }
 
-static void punt_bios_to_rescuer(struct bio_set *bs)
+/**
+ * blk_flush_bio_list
+ *
+ * Pop bios queued on current->bio_list and submit each of them to
+ * their rescue workqueue.
+ *
+ * If the bio doesn't have a bio_set, we leave it on current->bio_list.
+ * However, stacking drivers should use bio_set, so this shouldn't be
+ * an issue.
+ */
+void blk_flush_bio_list(void)
 {
-   struct bio_list punt, nopunt;
struct bio *bio;
+   struct bio_list list = *current->bio_list;
+   bio_list_init(current->bio_list);
 
-   /*
-* In order to guarantee forward progress we must punt only bios that
-* were allocated from this bio_set; otherwise, if there was a bio on
-* there for a stacking driver higher up in the stack, processing it
-* could require allocating bios from this bio_set, and doing that from
-* our own rescuer would be bad.
-*
-* Since bio lists are singly linked, pop them all instead of trying to
-* remove from the middle of the list:
-*/
-
-   bio_list_init();
-   bio_list_init();
-
-   while ((bio = bio_list_pop(current->bio_list)))
-   bio_list_add(bio->bi_pool == bs ?  : , bio);
-
-   

[PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Mikulas Patocka
The block layer uses per-process bio list to avoid recursion in
generic_make_request. When generic_make_request is called recursively, the
bio is added to current-bio_list and the function returns immediatelly.
The top-level instance of generic_make_requests takes bios from
current-bio_list and processes them.

This bio queuing can result in deadlocks. The following deadlock was
observed:

1) Process A sends one-page read bio to the dm-snapshot target. The bio
spans snapshot chunk boundary and so it is split to two bios by device
mapper.

2) Device mapper creates the first sub-bio and sends it to the snapshot
driver.

3) The function snapshot_map calls track_chunk (that allocates a structure
dm_snap_tracked_chunk and adds it to tracked_chunk_hash) and then it
remaps the bio to the underlying linear target and exits with
DM_MAPIO_REMAPPED.

4) The remapped bio is submitted with generic_make_request, but it isn't
processed - it is added to current-bio_list instead.

5) Meanwhile, process B executes pending_complete for the affected chunk,
it takes down_write(s-lock) and then loops in
__check_for_conflicting_io, waiting for dm_snap_tracked_chunk created in
step 3) to be released.

6) Process A continues, it creates a new bio for the rest of the original
bio.

7) snapshot_map is called for this new bio, it waits on
down_write(s-lock) that is held in step 5).

The resulting deadlock:
* bio added to current-bio_list at step 4) waits until the function in
  step 7) finishes
* the function in step 7) waits until s-lock held in step 5) is released
* the process in step 5) waits until the bio queued in step 4) finishes

The general problem is that queuing bios on current-bio_list introduces
additional lock dependencies. If a device mapper target sends a bio to
some block device, it assumes that the bio only takes locks of the target
block device or devices that are below the target device. However, if the
bio is added to queue on current-bio_list, it creates artifical locking
dependency on locks taken by other bios that are on current-bio_list. In
the above scenario, this artifical locking dependency results in
deadlock.

Kent Overstreet already created a workqueue for every bio set and there is
a code that tries to resolve some low-memory deadlocks by redirecting bios
queued on current-bio_list to the workqueue if the system is low on
memory. However, other deadlocks (as described above) may happen without
any low memory condition.

This patch generalizes Kent's concept, it redirects bios on
current-bio_list to the bio_set's workqueue on every schedule call.
Consequently, when the process blocks on a mutex, the bios queued on
current-bio_list are dispatched to independent workqueus and they can
complete without waiting for the mutex to be available.

Bios allocated with bio_kmalloc do not have bio_set, so they are not
redirected, however bio_kmalloc shouldn't be used by stacking drivers (it
is currently used by raid1.c and raid10.c, we need to change it to
bio_set).


Note to stable kernel maintainers: before backporting this patch, you also
need to backport df2cb6daa4cbc34406bc4b1ac9b9335df1083a72.

Signed-off-by: Mikulas Patocka mpato...@redhat.com
Cc: sta...@vger.kernel.org

---
 fs/bio.c   |   84 ++---
 include/linux/blkdev.h |7 +++-
 kernel/sched/core.c|7 
 3 files changed, 37 insertions(+), 61 deletions(-)

Index: linux-3.15-rc5/fs/bio.c
===
--- linux-3.15-rc5.orig/fs/bio.c2014-05-26 19:02:47.0 +0200
+++ linux-3.15-rc5/fs/bio.c 2014-05-27 00:00:13.0 +0200
@@ -342,35 +342,34 @@ static void bio_alloc_rescue(struct work
}
 }
 
-static void punt_bios_to_rescuer(struct bio_set *bs)
+/**
+ * blk_flush_bio_list
+ *
+ * Pop bios queued on current-bio_list and submit each of them to
+ * their rescue workqueue.
+ *
+ * If the bio doesn't have a bio_set, we leave it on current-bio_list.
+ * However, stacking drivers should use bio_set, so this shouldn't be
+ * an issue.
+ */
+void blk_flush_bio_list(void)
 {
-   struct bio_list punt, nopunt;
struct bio *bio;
+   struct bio_list list = *current-bio_list;
+   bio_list_init(current-bio_list);
 
-   /*
-* In order to guarantee forward progress we must punt only bios that
-* were allocated from this bio_set; otherwise, if there was a bio on
-* there for a stacking driver higher up in the stack, processing it
-* could require allocating bios from this bio_set, and doing that from
-* our own rescuer would be bad.
-*
-* Since bio lists are singly linked, pop them all instead of trying to
-* remove from the middle of the list:
-*/
-
-   bio_list_init(punt);
-   bio_list_init(nopunt);
-
-   while ((bio = bio_list_pop(current-bio_list)))
-   bio_list_add(bio-bi_pool == bs ? punt : nopunt, 

Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Jens Axboe

On 2014-05-27 09:03, Mikulas Patocka wrote:

The block layer uses per-process bio list to avoid recursion in
generic_make_request. When generic_make_request is called recursively, the
bio is added to current-bio_list and the function returns immediatelly.
The top-level instance of generic_make_requests takes bios from
current-bio_list and processes them.


This really begs the question of why we just don't use the per-process 
plugs for this. We already have scheduler hooks in place to flush those 
at the appropriate time. Why are we reinventing something for 
essentially the same thing?


--
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Mikulas Patocka


On Tue, 27 May 2014, Jens Axboe wrote:

 On 2014-05-27 09:03, Mikulas Patocka wrote:
  The block layer uses per-process bio list to avoid recursion in
  generic_make_request. When generic_make_request is called recursively, the
  bio is added to current-bio_list and the function returns immediatelly.
  The top-level instance of generic_make_requests takes bios from
  current-bio_list and processes them.
 
 This really begs the question of why we just don't use the per-process plugs
 for this. We already have scheduler hooks in place to flush those at the
 appropriate time. Why are we reinventing something for essentially the same
 thing?
 
 -- 
 Jens Axboe

Plugs work with requests, this patch works with bios. They are different 
structures, so you can't use one infrastructure to process them.

The patch adds bio list flushing to the scheduler just besides plug 
flushsing.

Mikulas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Jens Axboe

On 2014-05-27 09:23, Mikulas Patocka wrote:



On Tue, 27 May 2014, Jens Axboe wrote:


On 2014-05-27 09:03, Mikulas Patocka wrote:

The block layer uses per-process bio list to avoid recursion in
generic_make_request. When generic_make_request is called recursively, the
bio is added to current-bio_list and the function returns immediatelly.
The top-level instance of generic_make_requests takes bios from
current-bio_list and processes them.


This really begs the question of why we just don't use the per-process plugs
for this. We already have scheduler hooks in place to flush those at the
appropriate time. Why are we reinventing something for essentially the same
thing?

--
Jens Axboe


Plugs work with requests, this patch works with bios. They are different
structures, so you can't use one infrastructure to process them.


Yes... I realize the list and plugs are for requests. But there's 
nothing preventing a non-rq hook, we have uses like that too. And it 
could easily be extended to handle bio lists, too.



The patch adds bio list flushing to the scheduler just besides plug
flushsing.


... which is exactly why I'm commenting. It'd be great to avoid yet one 
more scheduler hook for this sort of thing.


--
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Mikulas Patocka
On Tue, 27 May 2014, Jens Axboe wrote:

 On 2014-05-27 09:23, Mikulas Patocka wrote:
 
  The patch adds bio list flushing to the scheduler just besides plug
  flushsing.
 
 ... which is exactly why I'm commenting. It'd be great to avoid yet one more
 scheduler hook for this sort of thing.
 
 -- 
 Jens Axboe

One could create something like schedule notifier chain, but I'm not sure 
if it is worth the complexity because of just two users. If more users 
come in the future, it could be generalized.

Mikulas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Jens Axboe

On 2014-05-27 10:26, Mikulas Patocka wrote:

On Tue, 27 May 2014, Jens Axboe wrote:


On 2014-05-27 09:23, Mikulas Patocka wrote:


The patch adds bio list flushing to the scheduler just besides plug
flushsing.


... which is exactly why I'm commenting. It'd be great to avoid yet one more
scheduler hook for this sort of thing.

--
Jens Axboe


One could create something like schedule notifier chain, but I'm not sure
if it is worth the complexity because of just two users. If more users
come in the future, it could be generalized.


Except such a thing already exists, there are unplug callback chains. 
All I'm asking is that you look into how feasible it would be to use 
something like that, instead of reinventing the wheel.


--
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Kent Overstreet
On Tue, May 27, 2014 at 8:08 AM, Jens Axboe ax...@kernel.dk wrote:
 This really begs the question of why we just don't use the per-process plugs
 for this. We already have scheduler hooks in place to flush those at the
 appropriate time. Why are we reinventing something for essentially the same
 thing?

Yes! Unifying the two plugging mechanisms has been on my todo/wishlist for ages.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Christoph Hellwig
On Tue, May 27, 2014 at 11:42:26AM -0600, Jens Axboe wrote:
 Except such a thing already exists, there are unplug callback
 chains. All I'm asking is that you look into how feasible it would
 be to use something like that, instead of reinventing the wheel.

I suspect the code from MD could be lifted quite easily.

It would be nicer to move it up to the block layer entirely, but when I
tried that a while ago I ran into issues that I unfortunately don't
remember.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Mikulas Patocka


On Tue, 27 May 2014, Jens Axboe wrote:

 On 2014-05-27 10:26, Mikulas Patocka wrote:
  On Tue, 27 May 2014, Jens Axboe wrote:
  
   On 2014-05-27 09:23, Mikulas Patocka wrote:
   
The patch adds bio list flushing to the scheduler just besides plug
flushsing.
   
   ... which is exactly why I'm commenting. It'd be great to avoid yet one
   more
   scheduler hook for this sort of thing.
   
   --
   Jens Axboe
  
  One could create something like schedule notifier chain, but I'm not sure
  if it is worth the complexity because of just two users. If more users
  come in the future, it could be generalized.
 
 Except such a thing already exists, there are unplug callback chains. All I'm
 asking is that you look into how feasible it would be to use something like
 that, instead of reinventing the wheel.
 
 -- 
 Jens Axboe

Do you mean moving current-bio_list to struct blk_plug and calling 
blk_start_plug/blk_finish_plug around generic_make_request?

It would be possible on a condition that we can redirect all bios to a 
workqueue (i.e. eliminate bio_kmalloc and always use bio_alloc_bioset).

What are performance implications of this - does it make sense to have 
blk_start_plug/blk_finish_plug around every call to generic_make_request? 
- that means that all i/o requests will be added to a plug and then 
unplugged.

Mikulas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Kent Overstreet
On Tue, May 27, 2014 at 11:14:15AM -0700, Christoph Hellwig wrote:
 On Tue, May 27, 2014 at 11:42:26AM -0600, Jens Axboe wrote:
  Except such a thing already exists, there are unplug callback
  chains. All I'm asking is that you look into how feasible it would
  be to use something like that, instead of reinventing the wheel.
 
 I suspect the code from MD could be lifted quite easily.
 
 It would be nicer to move it up to the block layer entirely, but when I
 tried that a while ago I ran into issues that I unfortunately don't
 remember.

How's md do it?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: flush queued bios when the process blocks

2014-05-27 Thread Kent Overstreet
On Tue, May 27, 2014 at 03:56:00PM -0400, Mikulas Patocka wrote:
 
 
 On Tue, 27 May 2014, Jens Axboe wrote:
 
  On 2014-05-27 10:26, Mikulas Patocka wrote:
   On Tue, 27 May 2014, Jens Axboe wrote:
   
On 2014-05-27 09:23, Mikulas Patocka wrote:

 The patch adds bio list flushing to the scheduler just besides plug
 flushsing.

... which is exactly why I'm commenting. It'd be great to avoid yet one
more
scheduler hook for this sort of thing.

--
Jens Axboe
   
   One could create something like schedule notifier chain, but I'm not sure
   if it is worth the complexity because of just two users. If more users
   come in the future, it could be generalized.
  
  Except such a thing already exists, there are unplug callback chains. All 
  I'm
  asking is that you look into how feasible it would be to use something like
  that, instead of reinventing the wheel.
  
  -- 
  Jens Axboe
 
 Do you mean moving current-bio_list to struct blk_plug and calling 
 blk_start_plug/blk_finish_plug around generic_make_request?
 
 It would be possible on a condition that we can redirect all bios to a 
 workqueue (i.e. eliminate bio_kmalloc and always use bio_alloc_bioset).
 
 What are performance implications of this - does it make sense to have 
 blk_start_plug/blk_finish_plug around every call to generic_make_request? 
 - that means that all i/o requests will be added to a plug and then 
 unplugged.

We've already got blk_start_plug() calls around IO submission at higher points
in the stack. (I actually have seen it show up in profiles though, it probably
would be worth inlining and slimming down a bit).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/