Re: [dm-devel] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Eric Ren

Hi Mike,


On 06/26/2017 09:43 PM, Mike Snitzer wrote:

On Mon, Jun 26 2017 at  7:49am -0400,
Zdenek Kabelac  wrote:

[...snip...]
Jon (cc'd) tested it and found that the original issue wasn't
reproducible.  But that the commit in question was the source of crashes
(likely due to needing the fix from this thread).

We can revisit for 4.13 -- but I'll defer to Heinz.

Eric, please share your results from testing the latest 4.12-rc7.

Testing on 4.12-rc7 doesn't crash the kernel.

Sorry for the noise, please ignore this patch. It's still a
confusion to me why backporting this fixes has problem.
Will update this thread, if I find something.

Regards,
Eric



Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Eric Ren

Hi,

On 06/26/2017 11:27 PM, Eric Ren wrote:

Hi Mike,


On 06/26/2017 10:37 PM, Mike Snitzer wrote:

On Mon, Jun 26 2017 at  9:47am -0400,
Eric Ren  wrote:
[...snip...]

"""
Revert "dm mirror: use all available legs on multiple failures"
dm io: fix duplicate bio completion due to missing ref count

I have a confusion about this "dm io..." fix. The fix itself is good.

Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev
has failed, will crash the kernel with the discard operation from 
mkfs.ext4.

However, mkfs.ext4 can succeed on a healthy mirrored device. This
is the thing I don't understand, because no matter the mirrored 
device is
good or not, there's always a duplicate bio completion before having 
this

this fix, thus write_callback() will be called twice, crashing will
occur on the
second write_callback():

No, there is only a duplicate bio completion if the error path is taken
(e.g. underlying device doesn't support discard).

Hmm, when "op == REQ_OP_DISCARD", please see comments in do_region():

"""
static void do_region(int op, int op_flags, unsigned region,
  struct dm_io_region *where, struct dpages *dp,
  struct io *io)
{
...
 if (op == REQ_OP_DISCARD)
special_cmd_max_sectors = q->limits.max_discard_sectors;
...
if ((op == REQ_OP_DISCARD || op == REQ_OP_WRITE_ZEROES ||
 op == REQ_OP_WRITE_SAME) && special_cmd_max_sectors == 0) {
atomic_inc(&io->count);===> [1]
dec_count(io, region, -EOPNOTSUPP); ===> [2]
return;
}
"""

[1] ref count fixed by patch "dm io: ...";
[2] we won't come here if "special_cmd_max_sectors != 0", which is 
true when both sides

of the mirror is good.

So only when a mirror device fails, "max_discard_sectors" on its queue 
is 0, thus this error path

will be taken, right?


Yes, I checked this as below:

- print info
"""
@@ -310,8 +310,10 @@ static void do_region(int op, int op_flags, 
unsigned region,

/*
 * Reject unsupported discard and write same requests.
 */
-   if (op == REQ_OP_DISCARD)
+   if (op == REQ_OP_DISCARD) {
special_cmd_max_sectors = q->limits.max_discard_sectors;
+ DMERR("do_region: op=DISCARD, q->limits.max_discard_sectors=%d\n", 
special_cmd_max_sectors);

+   }
"""
- log message when mkfs on mirror device that primary leg fails:
[ 169.672880] device-mapper: io: do_region: op=DISCARD, 
q->limits.max_discard_sectors=0===> for the failed leg
[ 169.672885] device-mapper: io: do_region: op=DISCARD, 
q->limits.max_discard_sectors=8388607   ===> for the good leg


Thanks,
Eric





"""
static void write_callback(unsigned long error, void *context)
{
 unsigned i;
 struct bio *bio = (struct bio *) context;
 struct mirror_set *ms;
 int should_wake = 0;
 unsigned long flags;

 ms = bio_get_m(bio)->ms;> NULL pointer at the
duplicate completion
 bio_set_m(bio, NULL);
"""

If no this fix, I expected the DISCARD IO would always crash the
kernel, but it's not true when
the mirrored device is good. Hope someone happen to know the reason
can give some hints ;-P

If the mirror is healthy then only one completion is returned to
dm-mirror (via write_callback).  The problem was the error patch wasn't
managing the reference count as needed.  Whereas dm-io's normal discard
IO path does.


Yes, I can understand this.

Thanks a lot,
Eric



Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] dm-cache coherence issue

2017-06-26 Thread Mike Snitzer
On Mon, Jun 26 2017 at  4:36pm -0400,
Johannes Bauer  wrote:

> On 26.06.2017 21:56, Mike Snitzer wrote:
> 
> >> Interesting, I did *not* change to writethrough. However, there
> >> shouldn't have been any I/O on the device (it was not accessed by
> >> anything after I switched to the cleaner policy).
> [...]
> >> Anyways, I'll try to replicate my scenario again because I'm actually
> >> quite sure that I did everything correctly (I did it a few times).
> > 
> > Except you didn't first switch to writethrough -- which is _not_
> > correct.
> 
> Absolutely, very good to know. So even without any I/O being request,
> dm-cache is allowed to "hold back" pages as long as the dm-cache device
> is in writeback mode?

s/pages/blocks/

The "dmsetup status" output for a DM cache device is showing dirty
accounting is in terms of cache blocks.

> Would this also explain why the "dmsetup wait" hung indefinitely?

You need to read the dmsetup man page, dmsetup wait" has _nothing_ to do
with waiting for IO to complete.  It is about DM events, without
specifying an event_nr you're just waiting for the device's event
counter to increment (which may never happen if you aren't doing
anything that'd trigger an event).  See:

"   wait [--noflush] device_name [event_nr]
  Sleeps until the event counter for device_name exceeds
  event_nr.  Use -v to see the event number returned.  To
  wait until the next event is triggered, use info to find
  the last event number.  With --noflush, the thin target
  (from version 1.3.0) doesn't commit any
  outstanding changes to disk before reporting its statistics."

> I do think I followed a tutorial that I found on the net regarding this.
> Scary that such a crucial fact is missing there. The fact that dirty
> pages are reported as zero just gives the impression that everything is
> coherent, when in fact it's not.

I'll concede that it is weird that you're seeing a different md5sum for
the origin vs the cache (that is in writeback mode yet reports 0 dirty
blocks).

But I think there is some important detail that would explain it; sadly
I'd need to dig in and reproduce on a testbed to identify it.  Maybe Joe
will be able to offer a quick answer?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] dm-cache coherence issue

2017-06-26 Thread Johannes Bauer
On 26.06.2017 21:56, Mike Snitzer wrote:

>> Interesting, I did *not* change to writethrough. However, there
>> shouldn't have been any I/O on the device (it was not accessed by
>> anything after I switched to the cleaner policy).
[...]
>> Anyways, I'll try to replicate my scenario again because I'm actually
>> quite sure that I did everything correctly (I did it a few times).
> 
> Except you didn't first switch to writethrough -- which is _not_
> correct.

Absolutely, very good to know. So even without any I/O being request,
dm-cache is allowed to "hold back" pages as long as the dm-cache device
is in writeback mode? Would this also explain why the "dmsetup wait"
hung indefinitely?

I do think I followed a tutorial that I found on the net regarding this.
Scary that such a crucial fact is missing there. The fact that dirty
pages are reported as zero just gives the impression that everything is
coherent, when in fact it's not.

Regardless, I find this and your explanation extremely interesting and
want to thank you for clearing this up. Very fascinating topic indeed.

Best regards,
Johannes


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] dm-cache coherence issue

2017-06-26 Thread Mike Snitzer
On Mon, Jun 26 2017 at  3:08pm -0400,
Johannes Bauer  wrote:

> Hi Joe,
> 
> On 26.06.2017 17:58, Joe Thornber wrote:
> > On Mon, Jun 26, 2017 at 12:33:42PM +0100, Joe Thornber wrote:
> >> On Sat, Jun 24, 2017 at 03:56:54PM +0200, Johannes Bauer wrote:
> >>> So I seem to have a very basic misunderstanding of what the cleaner
> >>> policy/dirty pages mean. Is there a way to force the cache to flush
> >>> entirely? Apparently, "dmsetup wait" and/or "sync" don't do the job.
> >>
> >> I'll try and reproduce your scenario and get back to you.
> > 
> > Here's a similar scenario that I've added to the dm test suite:
> > 
> > https://github.com/jthornber/device-mapper-test-suite/commit/457e889b0c4d510609c0d7464af07f2ebee20768
> > 
> > It goes through all the steps you need to use to decommission a cache
> > with dirty blocks.  Namely:
> > 
> > - switch to writethrough mode (so new io can't create new dirty blocks)
> > - switch to the cleaner policy
> > - wait for clean
> > - remove the cache device before accessing the origin directly
> 
> Interesting, I did *not* change to writethrough. However, there
> shouldn't have been any I/O on the device (it was not accessed by
> anything after I switched to the cleaner policy).
> 
> On the advice of Zdenek Kabelac, who messaged me off-list, I therefore
> have indeed been convinced that using dm-cache "by hand" maybe is too
> dangerous (i.e., I had not accounted for "repairing" of a cache and am
> really still unsure what LVM does that) and that lvmcache is a better
> choice -- even though Ubuntu supports it really crappily for root devices:
> https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1423796 a shame,
> it's been like this for over two years.
> 
> Anyways, I'll try to replicate my scenario again because I'm actually
> quite sure that I did everything correctly (I did it a few times).

Except you didn't first switch to writethrough -- which is _not_
correct.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] dm-cache coherence issue

2017-06-26 Thread Johannes Bauer
Hi Joe,

On 26.06.2017 17:58, Joe Thornber wrote:
> On Mon, Jun 26, 2017 at 12:33:42PM +0100, Joe Thornber wrote:
>> On Sat, Jun 24, 2017 at 03:56:54PM +0200, Johannes Bauer wrote:
>>> So I seem to have a very basic misunderstanding of what the cleaner
>>> policy/dirty pages mean. Is there a way to force the cache to flush
>>> entirely? Apparently, "dmsetup wait" and/or "sync" don't do the job.
>>
>> I'll try and reproduce your scenario and get back to you.
> 
> Here's a similar scenario that I've added to the dm test suite:
> 
> https://github.com/jthornber/device-mapper-test-suite/commit/457e889b0c4d510609c0d7464af07f2ebee20768
> 
> It goes through all the steps you need to use to decommission a cache
> with dirty blocks.  Namely:
> 
> - switch to writethrough mode (so new io can't create new dirty blocks)
> - switch to the cleaner policy
> - wait for clean
> - remove the cache device before accessing the origin directly

Interesting, I did *not* change to writethrough. However, there
shouldn't have been any I/O on the device (it was not accessed by
anything after I switched to the cleaner policy).

On the advice of Zdenek Kabelac, who messaged me off-list, I therefore
have indeed been convinced that using dm-cache "by hand" maybe is too
dangerous (i.e., I had not accounted for "repairing" of a cache and am
really still unsure what LVM does that) and that lvmcache is a better
choice -- even though Ubuntu supports it really crappily for root devices:
https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1423796 a shame,
it's been like this for over two years.

Anyways, I'll try to replicate my scenario again because I'm actually
quite sure that I did everything correctly (I did it a few times).

Thanks for you help,
Johannes

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] dm thin: do not queue freed thin mapping for next stage processing

2017-06-26 Thread Mike Snitzer
On Fri, Jun 23 2017 at  2:53pm -0400,
Vallish Vaidyeshwara  wrote:

> process_prepared_discard_passdown_pt1() should cleanup
> dm_thin_new_mapping in cases of error.
> 
> dm_pool_inc_data_range() can fail trying to get a block reference:
> 
> metadata operation 'dm_pool_inc_data_range' failed: error = -61
> 
> When dm_pool_inc_data_range() fails, dm thin aborts current metadata
> transaction and marks pool as PM_READ_ONLY. Memory for thin mapping
> is released as well. However, current thin mapping will be queued
> onto next stage as part of queue_passdown_pt2() or passdown_endio().
> This dangling thin mapping memory when processed and accessed in
> next stage will lead to device mapper crashing.
> 
> Code flow without fix:
> -> process_prepared_discard_passdown_pt1(m)
>-> dm_thin_remove_range()
>-> discard passdown
>   --> passdown_endio(m) queues m onto next stage
>-> dm_pool_inc_data_range() fails, frees memory m
> but does not remove it from next stage queue
> 
> -> process_prepared_discard_passdown_pt2(m)
>-> processes freed memory m and crashes
> 
> One such stack:
> 
> Call Trace:
> [] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison]
> [] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool]
> [] process_prepared_discard_passdown_pt2+0x4b/0x90 
> [dm_thin_pool]
> [] process_prepared+0x81/0xa0 [dm_thin_pool]
> [] do_worker+0xc5/0x820 [dm_thin_pool]
> [] ? __schedule+0x244/0x680
> [] ? pwq_activate_delayed_work+0x42/0xb0
> [] process_one_work+0x153/0x3f0
> [] worker_thread+0x12b/0x4b0
> [] ? rescuer_thread+0x350/0x350
> [] kthread+0xca/0xe0
> [] ? kthread_park+0x60/0x60
> [] ret_from_fork+0x25/0x30
> 
> The fix is to first take the block ref count for discarded block and
> then do a passdown discard of this block. If block ref count fails,
> then bail out aborting current metadata transaction, mark pool as
> PM_READ_ONLY and also free current thin mapping memory (existing error
> handling code) without queueing this thin mapping onto next stage of
> processing. If block ref count succeeds, then passdown discard of this
> block. Discard callback of passdown_endio() will queue this thin mapping
> onto next stage of processing.
> 
> Code flow with fix:
> -> process_prepared_discard_passdown_pt1(m)
>-> dm_thin_remove_range()
>-> dm_pool_inc_data_range()
>   --> if fails, free memory m and bail out
>-> discard passdown
>   --> passdown_endio(m) queues m onto next stage
> 
> Cc: stable  # v4.9+
> Reviewed-by: Eduardo Valentin 
> Reviewed-by: Cristian Gafton 
> Reviewed-by: Anchal Agarwal 
> Signed-off-by: Vallish Vaidyeshwara 
> ---
>  drivers/md/dm-thin.c | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index 17ad50d..28808e5 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -1094,6 +1094,19 @@ static void 
> process_prepared_discard_passdown_pt1(struct dm_thin_new_mapping *m)
>   return;
>   }
>  
> + /*
> +  * Increment the unmapped blocks.  This prevents a race between the
> +  * passdown io and reallocation of freed blocks.
> +  */
> + r = dm_pool_inc_data_range(pool->pmd, m->data_block, data_end);
> + if (r) {
> + metadata_operation_failed(pool, "dm_pool_inc_data_range", r);
> + bio_io_error(m->bio);
> + cell_defer_no_holder(tc, m->cell);
> + mempool_free(m, pool->mapping_pool);
> + return;
> + }
> +
>   discard_parent = bio_alloc(GFP_NOIO, 1);
>   if (!discard_parent) {
>   DMWARN("%s: unable to allocate top level discard bio for 
> passdown. Skipping passdown.",
> @@ -1114,19 +1127,6 @@ static void 
> process_prepared_discard_passdown_pt1(struct dm_thin_new_mapping *m)
>   end_discard(&op, r);
>   }
>   }
> -
> - /*
> -  * Increment the unmapped blocks.  This prevents a race between the
> -  * passdown io and reallocation of freed blocks.
> -  */
> - r = dm_pool_inc_data_range(pool->pmd, m->data_block, data_end);
> - if (r) {
> - metadata_operation_failed(pool, "dm_pool_inc_data_range", r);
> - bio_io_error(m->bio);
> - cell_defer_no_holder(tc, m->cell);
> - mempool_free(m, pool->mapping_pool);
> - return;
> - }
>  }
>  
>  static void process_prepared_discard_passdown_pt2(struct dm_thin_new_mapping 
> *m)

This looks like a good fix.

But I'll wait for Joe's confirmation before staging for 4.12 final
inclusion later this week.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH] libmultipath: update 3PARdata builtin config

2017-06-26 Thread Benjamin Marzinski
This updated config comes from hp.

Signed-off-by: Benjamin Marzinski 
---
 libmultipath/hwtable.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
index 390d143..54bdcfc 100644
--- a/libmultipath/hwtable.c
+++ b/libmultipath/hwtable.c
@@ -49,6 +49,8 @@ static struct hwentry default_hw[] = {
.hwhandler = "1 alua",
.prio_name = PRIO_ALUA,
.no_path_retry = 18,
+   .fast_io_fail  = 10,
+   .dev_loss  = MAX_DEV_LOSS_TMO,
},
{
/* RA8000 / ESA12000 */
-- 
2.7.4

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] dm-cache coherence issue

2017-06-26 Thread Joe Thornber
On Mon, Jun 26, 2017 at 12:33:42PM +0100, Joe Thornber wrote:
> On Sat, Jun 24, 2017 at 03:56:54PM +0200, Johannes Bauer wrote:
> > So I seem to have a very basic misunderstanding of what the cleaner
> > policy/dirty pages mean. Is there a way to force the cache to flush
> > entirely? Apparently, "dmsetup wait" and/or "sync" don't do the job.
> 
> I'll try and reproduce your scenario and get back to you.

Here's a similar scenario that I've added to the dm test suite:

https://github.com/jthornber/device-mapper-test-suite/commit/457e889b0c4d510609c0d7464af07f2ebee20768

It goes through all the steps you need to use to decommission a cache
with dirty blocks.  Namely:

- switch to writethrough mode (so new io can't create new dirty blocks)
- switch to the cleaner policy
- wait for clean
- remove the cache device before accessing the origin directly

I've run it with a recent kernel, and one from about a year ago and it
passes fine.

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Eric Ren

Hi Mike,


On 06/26/2017 10:37 PM, Mike Snitzer wrote:

On Mon, Jun 26 2017 at  9:47am -0400,
Eric Ren  wrote:
[...snip...]

"""
Revert "dm mirror: use all available legs on multiple failures"
dm io: fix duplicate bio completion due to missing ref count

I have a confusion about this "dm io..." fix. The fix itself is good.

Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev
has failed, will crash the kernel with the discard operation from mkfs.ext4.
However, mkfs.ext4 can succeed on a healthy mirrored device. This
is the thing I don't understand, because no matter the mirrored device is
good or not, there's always a duplicate bio completion before having this
this fix, thus write_callback() will be called twice, crashing will
occur on the
second write_callback():

No, there is only a duplicate bio completion if the error path is taken
(e.g. underlying device doesn't support discard).

Hmm, when "op == REQ_OP_DISCARD", please see comments in do_region():

"""
static void do_region(int op, int op_flags, unsigned region,
  struct dm_io_region *where, struct dpages *dp,
  struct io *io)
{
...
 if (op == REQ_OP_DISCARD)
special_cmd_max_sectors = q->limits.max_discard_sectors;
...
if ((op == REQ_OP_DISCARD || op == REQ_OP_WRITE_ZEROES ||
 op == REQ_OP_WRITE_SAME) && special_cmd_max_sectors == 0) {
atomic_inc(&io->count);===>   [1]
dec_count(io, region, -EOPNOTSUPP); ===>  [2]
return;
}
"""

[1] ref count fixed by patch "dm io: ...";
[2] we won't come here if "special_cmd_max_sectors != 0", which is true 
when both sides

of the mirror is good.

So only when a mirror device fails, "max_discard_sectors" on its queue 
is 0, thus this error path

will be taken, right?




"""
static void write_callback(unsigned long error, void *context)
{
 unsigned i;
 struct bio *bio = (struct bio *) context;
 struct mirror_set *ms;
 int should_wake = 0;
 unsigned long flags;

 ms = bio_get_m(bio)->ms;> NULL pointer at the
duplicate completion
 bio_set_m(bio, NULL);
"""

If no this fix, I expected the DISCARD IO would always crash the
kernel, but it's not true when
the mirrored device is good. Hope someone happen to know the reason
can give some hints ;-P

If the mirror is healthy then only one completion is returned to
dm-mirror (via write_callback).  The problem was the error patch wasn't
managing the reference count as needed.  Whereas dm-io's normal discard
IO path does.


Yes, I can understand this.

Thanks a lot,
Eric



Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Mike Snitzer
On Mon, Jun 26 2017 at  9:47am -0400,
Eric Ren  wrote:

> Hi,
> 
> On 06/26/2017 06:55 PM, Eric Ren wrote:
> >Hi Zdenek,
> >
> >
> >On 06/26/2017 05:46 PM, Zdenek Kabelac wrote:
> >>Dne 26.6.2017 v 11:08 Eric Ren napsal(a):
> >>>
> >>[... snip...]
> >>
> >>Hi
> >>
> >>Which kernel version is this ?
> >>
> >>I'd thought we've already fixed this BZ for old mirrors:
> >>https://bugzilla.redhat.com/show_bug.cgi?id=1382382
> >>
> >>There similar BZ for md-raid based mirrors (--type raid1)
> >>https://bugzilla.redhat.com/show_bug.cgi?id=1416099
> >My base kernel version is 4.4.68, but with this 2 latest fixes applied:
> >
> >"""
> >Revert "dm mirror: use all available legs on multiple failures"
> >dm io: fix duplicate bio completion due to missing ref count
> 
> I have a confusion about this "dm io..." fix. The fix itself is good.
> 
> Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev
> has failed, will crash the kernel with the discard operation from mkfs.ext4.
> However, mkfs.ext4 can succeed on a healthy mirrored device. This
> is the thing I don't understand, because no matter the mirrored device is
> good or not, there's always a duplicate bio completion before having this
> this fix, thus write_callback() will be called twice, crashing will
> occur on the
> second write_callback():

No, there is only a duplicate bio completion if the error path is taken
(e.g. underlying device doesn't support discard).

> """
> static void write_callback(unsigned long error, void *context)
> {
> unsigned i;
> struct bio *bio = (struct bio *) context;
> struct mirror_set *ms;
> int should_wake = 0;
> unsigned long flags;
> 
> ms = bio_get_m(bio)->ms;> NULL pointer at the
> duplicate completion
> bio_set_m(bio, NULL);
> """
> 
> If no this fix, I expected the DISCARD IO would always crash the
> kernel, but it's not true when
> the mirrored device is good. Hope someone happen to know the reason
> can give some hints ;-P

If the mirror is healthy then only one completion is returned to
dm-mirror (via write_callback).  The problem was the error patch wasn't
managing the reference count as needed.  Whereas dm-io's normal discard
IO path does.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Eric Ren

Hi Mike,

On 06/26/2017 09:43 PM, Mike Snitzer wrote:

On Mon, Jun 26 2017 at  7:49am -0400,
Zdenek Kabelac  wrote:


[...snip...]
Ohh  - I've -rc6 - while this  'revert' patch went to 4.12-rc7.

I'm now starting to wonder why?

It's been a real fix for a real issue - and 'revert' message states
there is no such problem ??

I'm confused

Mike  - have you tried the sequence from BZ  ?

Jon (cc'd) tested it and found that the original issue wasn't
reproducible.  But that the commit in question was the source of crashes
(likely due to needing the fix from this thread).

We can revisit for 4.13 -- but I'll defer to Heinz.

Eric, please share your results from testing the latest 4.12-rc7.

Sure, but I will do it tomorrow, different timezone ;-)

Maybe, Jon can also try my producer on the latest kernel. So, we
can double confirm with each other.

Thanks,
Eric



Mike



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Eric Ren

Hi,

On 06/26/2017 06:55 PM, Eric Ren wrote:

Hi Zdenek,


On 06/26/2017 05:46 PM, Zdenek Kabelac wrote:

Dne 26.6.2017 v 11:08 Eric Ren napsal(a):



[... snip...]

Hi

Which kernel version is this ?

I'd thought we've already fixed this BZ for old mirrors:
https://bugzilla.redhat.com/show_bug.cgi?id=1382382

There similar BZ for md-raid based mirrors (--type raid1)
https://bugzilla.redhat.com/show_bug.cgi?id=1416099

My base kernel version is 4.4.68, but with this 2 latest fixes applied:

"""
Revert "dm mirror: use all available legs on multiple failures"
dm io: fix duplicate bio completion due to missing ref count


I have a confusion about this "dm io..." fix. The fix itself is good.

Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev
has failed, will crash the kernel with the discard operation from mkfs.ext4.
However, mkfs.ext4 can succeed on a healthy mirrored device. This
is the thing I don't understand, because no matter the mirrored device is
good or not, there's always a duplicate bio completion before having this
this fix, thus write_callback() will be called twice, crashing will 
occur on the

second write_callback():
"""
static void write_callback(unsigned long error, void *context)
{
unsigned i;
struct bio *bio = (struct bio *) context;
struct mirror_set *ms;
int should_wake = 0;
unsigned long flags;

ms = bio_get_m(bio)->ms;> NULL pointer at the 
duplicate completion

bio_set_m(bio, NULL);
"""

If no this fix, I expected the DISCARD IO would always crash the kernel, 
but it's not true when
the mirrored device is good. Hope someone happen to know the reason can 
give some hints ;-P


Regards,
Eric


"""

Yes, I've been working on dm-mirror crash issue for couple days,
but only see this fixes above when I rebased to send my fixes out, hah.

But, with this 2 fixes, the producer can still crash the kernel. 
Anyway, I can

test with upstream kernel again ;-P

Regards,
Eric




Regards

Zdenej



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Mike Snitzer
On Mon, Jun 26 2017 at  7:49am -0400,
Zdenek Kabelac  wrote:

> Dne 26.6.2017 v 12:55 Eric Ren napsal(a):
> >Hi Zdenek,
> >
> >
> >On 06/26/2017 05:46 PM, Zdenek Kabelac wrote:
> >>Dne 26.6.2017 v 11:08 Eric Ren napsal(a):
> >>>
> >>[... snip...]
> >>
> >>Hi
> >>
> >>Which kernel version is this ?
> >>
> >>I'd thought we've already fixed this BZ for old mirrors:
> >>https://bugzilla.redhat.com/show_bug.cgi?id=1382382
> >>
> >>There similar BZ for md-raid based mirrors (--type raid1)
> >>https://bugzilla.redhat.com/show_bug.cgi?id=1416099
> >My base kernel version is 4.4.68, but with this 2 latest fixes applied:
> >
> >"""
> >Revert "dm mirror: use all available legs on multiple failures"
> 
> Ohh  - I've -rc6 - while this  'revert' patch went to 4.12-rc7.
> 
> I'm now starting to wonder why?
> 
> It's been a real fix for a real issue - and 'revert' message states
> there is no such problem ??
> 
> I'm confused
> 
> Mike  - have you tried the sequence from BZ  ?

Jon (cc'd) tested it and found that the original issue wasn't
reproducible.  But that the commit in question was the source of crashes
(likely due to needing the fix from this thread).

We can revisit for 4.13 -- but I'll defer to Heinz.

Eric, please share your results from testing the latest 4.12-rc7.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Zdenek Kabelac

Dne 26.6.2017 v 12:55 Eric Ren napsal(a):

Hi Zdenek,


On 06/26/2017 05:46 PM, Zdenek Kabelac wrote:

Dne 26.6.2017 v 11:08 Eric Ren napsal(a):



[... snip...]

Hi

Which kernel version is this ?

I'd thought we've already fixed this BZ for old mirrors:
https://bugzilla.redhat.com/show_bug.cgi?id=1382382

There similar BZ for md-raid based mirrors (--type raid1)
https://bugzilla.redhat.com/show_bug.cgi?id=1416099

My base kernel version is 4.4.68, but with this 2 latest fixes applied:

"""
Revert "dm mirror: use all available legs on multiple failures"


Ohh  - I've -rc6 - while this  'revert' patch went to 4.12-rc7.

I'm now starting to wonder why?

It's been a real fix for a real issue - and 'revert' message states
there is no such problem ??

I'm confused

Mike  - have you tried the sequence from BZ  ?

Zdenek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] dm-cache coherence issue

2017-06-26 Thread Joe Thornber
On Sat, Jun 24, 2017 at 03:56:54PM +0200, Johannes Bauer wrote:
> So I seem to have a very basic misunderstanding of what the cleaner
> policy/dirty pages mean. Is there a way to force the cache to flush
> entirely? Apparently, "dmsetup wait" and/or "sync" don't do the job.

Your understanding is correct.  There is a tool in the latest thinp
tools release called cache_writeback that does offline decommissioning
of a cache.

I'll try and reproduce your scenario and get back to you.

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Eric Ren

Hi Zdenek,


On 06/26/2017 05:46 PM, Zdenek Kabelac wrote:

Dne 26.6.2017 v 11:08 Eric Ren napsal(a):



[... snip...]

Hi

Which kernel version is this ?

I'd thought we've already fixed this BZ for old mirrors:
https://bugzilla.redhat.com/show_bug.cgi?id=1382382

There similar BZ for md-raid based mirrors (--type raid1)
https://bugzilla.redhat.com/show_bug.cgi?id=1416099

My base kernel version is 4.4.68, but with this 2 latest fixes applied:

"""
Revert "dm mirror: use all available legs on multiple failures"
dm io: fix duplicate bio completion due to missing ref count
"""

Yes, I've been working on dm-mirror crash issue for couple days,
but only see this fixes above when I rebased to send my fixes out, hah.

But, with this 2 fixes, the producer can still crash the kernel. Anyway, 
I can

test with upstream kernel again ;-P

Regards,
Eric




Regards

Zdenej



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Eric Ren

Hi,

On 06/26/2017 05:14 PM, Johannes Thumshirn wrote:

On Mon, Jun 26, 2017 at 05:08:48PM +0800, Eric Ren wrote:

When the primary mirror device fails, activating a mirrored
LV will crash the kernel. It can be reproduced 100% with the
scripts below:

"""
dd if=/dev/zero of=file.raw bs=1G count=1
loopdev=$(losetup -f)
losetup $loopdev file.raw
dmsetup create pv1 --table "0 102399 linear $loopdev 0"
dmsetup create pv2 --table "0 102399 linear $loopdev 102400"
vgcreate vgtest /dev/mapper/pv1 /dev/mapper/pv2
lvcreate -l1 --type mirror -m1 -n mirror12 --mirrorlog core \
  vgtest /dev/mapper/pv1 /dev/mapper/pv2
vgchange -an vgtest
echo 0 1000 error | dmsetup load /dev/mapper/pv1
dmsetup resume /dev/mapper/pv1
vgchange -ay vgtest
"""

So as you have a reproducer, can you eventually write an blktest [1]
regression test for it?

No problem, will do later on.

Thanks,
Eric




Thanks,
Johannes

[1] https://github.com/osandov/blktests



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] move bounce limits settings into the drivers

2017-06-26 Thread Christoph Hellwig
ping?

On Mon, Jun 19, 2017 at 09:26:18AM +0200, Christoph Hellwig wrote:
> Currently we still default to a bounce all highmem setting for block
> drivers.  This series defaults to no bouncing and instead adds call
> to blk_queue_bounce_limit to those drivers that need it.   It also
> has a few cleanups in that area.
---end quoted text---

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Zdenek Kabelac

Dne 26.6.2017 v 11:08 Eric Ren napsal(a):

When the primary mirror device fails, activating a mirrored
LV will crash the kernel. It can be reproduced 100% with the
scripts below:

"""
dd if=/dev/zero of=file.raw bs=1G count=1
loopdev=$(losetup -f)
losetup $loopdev file.raw
dmsetup create pv1 --table "0 102399 linear $loopdev 0"
dmsetup create pv2 --table "0 102399 linear $loopdev 102400"
vgcreate vgtest /dev/mapper/pv1 /dev/mapper/pv2
lvcreate -l1 --type mirror -m1 -n mirror12 --mirrorlog core \
  vgtest /dev/mapper/pv1 /dev/mapper/pv2
vgchange -an vgtest
echo 0 1000 error | dmsetup load /dev/mapper/pv1
dmsetup resume /dev/mapper/pv1
vgchange -ay vgtest
" > The call trace:
"""
[  287.008629] device-mapper: raid1: Unable to read primary mirror during 
recovery
[  287.008632] device-mapper: raid1: Primary mirror (254:10) failed while 
out-of-sync: Reads may fail.
...
[  287.012480] BUG: unable to handle kernel NULL pointer dereference at 
0019
[  287.012515] IP: [] mirror_end_io+0x7f/0x130 [dm_mirror]
...
[  291.994645] Call Trace:
[  291.994671]  [] clone_endio+0x35/0xe0 [dm_mod]
[  291.994675]  [] do_reads+0x17d/0x1d0 [dm_mirror]
[  291.994680]  [] do_mirror+0xec/0x250 [dm_mirror]
[  291.994687]  [] process_one_work+0x14e/0x410
[  291.994691]  [] worker_thread+0x116/0x490
[  291.994694]  [] kthread+0xc7/0xe0
"""

Fixes it by setting "details.bi_bdev" to NULL in error path beforing
calling into mirror_end_io(), which will fail the IO properly.



Hi

Which kernel version is this ?

I'd thought we've already fixed this BZ for old mirrors:
https://bugzilla.redhat.com/show_bug.cgi?id=1382382

There similar BZ for md-raid based mirrors (--type raid1)
https://bugzilla.redhat.com/show_bug.cgi?id=1416099


Regards

Zdenej

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Johannes Thumshirn
On Mon, Jun 26, 2017 at 05:08:48PM +0800, Eric Ren wrote:
> When the primary mirror device fails, activating a mirrored
> LV will crash the kernel. It can be reproduced 100% with the
> scripts below:
> 
> """
> dd if=/dev/zero of=file.raw bs=1G count=1
> loopdev=$(losetup -f)
> losetup $loopdev file.raw
> dmsetup create pv1 --table "0 102399 linear $loopdev 0"
> dmsetup create pv2 --table "0 102399 linear $loopdev 102400"
> vgcreate vgtest /dev/mapper/pv1 /dev/mapper/pv2
> lvcreate -l1 --type mirror -m1 -n mirror12 --mirrorlog core \
>  vgtest /dev/mapper/pv1 /dev/mapper/pv2
> vgchange -an vgtest
> echo 0 1000 error | dmsetup load /dev/mapper/pv1
> dmsetup resume /dev/mapper/pv1
> vgchange -ay vgtest
> """

So as you have a reproducer, can you eventually write an blktest [1]
regression test for it?


Thanks,
Johannes

[1] https://github.com/osandov/blktests

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference

2017-06-26 Thread Eric Ren
When the primary mirror device fails, activating a mirrored
LV will crash the kernel. It can be reproduced 100% with the
scripts below:

"""
dd if=/dev/zero of=file.raw bs=1G count=1
loopdev=$(losetup -f)
losetup $loopdev file.raw
dmsetup create pv1 --table "0 102399 linear $loopdev 0"
dmsetup create pv2 --table "0 102399 linear $loopdev 102400"
vgcreate vgtest /dev/mapper/pv1 /dev/mapper/pv2
lvcreate -l1 --type mirror -m1 -n mirror12 --mirrorlog core \
 vgtest /dev/mapper/pv1 /dev/mapper/pv2
vgchange -an vgtest
echo 0 1000 error | dmsetup load /dev/mapper/pv1
dmsetup resume /dev/mapper/pv1
vgchange -ay vgtest
"""

The call trace:
"""
[  287.008629] device-mapper: raid1: Unable to read primary mirror during 
recovery
[  287.008632] device-mapper: raid1: Primary mirror (254:10) failed while 
out-of-sync: Reads may fail.
...
[  287.012480] BUG: unable to handle kernel NULL pointer dereference at 
0019
[  287.012515] IP: [] mirror_end_io+0x7f/0x130 [dm_mirror]
...
[  291.994645] Call Trace:
[  291.994671]  [] clone_endio+0x35/0xe0 [dm_mod]
[  291.994675]  [] do_reads+0x17d/0x1d0 [dm_mirror]
[  291.994680]  [] do_mirror+0xec/0x250 [dm_mirror]
[  291.994687]  [] process_one_work+0x14e/0x410
[  291.994691]  [] worker_thread+0x116/0x490
[  291.994694]  [] kthread+0xc7/0xe0
"""

Fixes it by setting "details.bi_bdev" to NULL in error path beforing
calling into mirror_end_io(), which will fail the IO properly.

Reported-by: Jerry Tang 
Signed-off-by: Eric Ren 
---
 drivers/md/dm-raid1.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index 4da8858..696e308 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -568,6 +568,7 @@ static void do_reads(struct mirror_set *ms, struct bio_list 
*reads)
region_t region;
struct bio *bio;
struct mirror *m;
+   struct dm_raid1_bio_record *bio_record;
 
while ((bio = bio_list_pop(reads))) {
region = dm_rh_bio_to_region(ms->rh, bio);
@@ -583,8 +584,16 @@ static void do_reads(struct mirror_set *ms, struct 
bio_list *reads)
 
if (likely(m))
read_async_bio(m, bio);
-   else
+   else {
+   /*
+* In mirror_end_io(), we will fail the IO properly
+* if details.bi_bdev is NULL.
+*/
+   bio_record = dm_per_bio_data(bio,
+   sizeof(struct dm_raid1_bio_record));
+   bio_record->details.bi_bdev = NULL;
bio_io_error(bio);
+   }
}
 }
 
-- 
2.10.2

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH] dm ioctl: constify ioctl lookup table

2017-06-26 Thread Eric Biggers
From: Eric Biggers 

Constify the lookup table for device-mapper ioctls so that it is placed
in .rodata.

Signed-off-by: Eric Biggers 
---
 drivers/md/dm-ioctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 41852ae287a5..8e9a1eff25d3 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1604,7 +1604,7 @@ static int target_message(struct dm_ioctl *param, size_t 
param_size)
  *---*/
 static ioctl_fn lookup_ioctl(unsigned int cmd, int *ioctl_flags)
 {
-   static struct {
+   static const struct {
int cmd;
int flags;
ioctl_fn fn;
-- 
2.13.1.611.g7e3b11ae1-goog

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH] dm: constify argument arrays

2017-06-26 Thread Eric Biggers
From: Eric Biggers 

The arrays of 'struct dm_arg' are never modified by the device-mapper
core, so constify them so that they are placed in .rodata.

(Exception: the args array in dm-raid cannot be constified because it is
allocated on the stack and modified.)

Signed-off-by: Eric Biggers 
---
 drivers/md/dm-cache-target.c  |  4 ++--
 drivers/md/dm-crypt.c |  2 +-
 drivers/md/dm-flakey.c|  4 ++--
 drivers/md/dm-integrity.c |  2 +-
 drivers/md/dm-mpath.c | 10 +-
 drivers/md/dm-switch.c|  2 +-
 drivers/md/dm-table.c |  7 ---
 drivers/md/dm-thin.c  |  2 +-
 drivers/md/dm-verity-target.c |  2 +-
 include/linux/device-mapper.h |  4 ++--
 10 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index d682a0511381..484fbe0327b4 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -2305,7 +2305,7 @@ static void init_features(struct cache_features *cf)
 static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
  char **error)
 {
-   static struct dm_arg _args[] = {
+   static const struct dm_arg _args[] = {
{0, 2, "Invalid number of cache feature arguments"},
};
 
@@ -2347,7 +2347,7 @@ static int parse_features(struct cache_args *ca, struct 
dm_arg_set *as,
 static int parse_policy(struct cache_args *ca, struct dm_arg_set *as,
char **error)
 {
-   static struct dm_arg _args[] = {
+   static const struct dm_arg _args[] = {
{0, 1024, "Invalid number of policy arguments"},
};
 
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index ebf9e72d479b..95c5faf05391 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -2514,7 +2514,7 @@ static int crypt_ctr_optional(struct dm_target *ti, 
unsigned int argc, char **ar
 {
struct crypt_config *cc = ti->private;
struct dm_arg_set as;
-   static struct dm_arg _args[] = {
+   static const struct dm_arg _args[] = {
{0, 6, "Invalid number of feature args"},
};
unsigned int opt_params, val;
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index 13305a182611..56cc3fba3eaa 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -51,7 +51,7 @@ static int parse_features(struct dm_arg_set *as, struct 
flakey_c *fc,
unsigned argc;
const char *arg_name;
 
-   static struct dm_arg _args[] = {
+   static const struct dm_arg _args[] = {
{0, 6, "Invalid number of feature args"},
{1, UINT_MAX, "Invalid corrupt bio byte"},
{0, 255, "Invalid corrupt value to write into bio byte 
(0-255)"},
@@ -178,7 +178,7 @@ static int parse_features(struct dm_arg_set *as, struct 
flakey_c *fc,
  */
 static int flakey_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 {
-   static struct dm_arg _args[] = {
+   static const struct dm_arg _args[] = {
{0, UINT_MAX, "Invalid up interval"},
{0, UINT_MAX, "Invalid down interval"},
};
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index 7910bfe50da4..866714c1dee1 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -2763,7 +2763,7 @@ static int dm_integrity_ctr(struct dm_target *ti, 
unsigned argc, char **argv)
int r;
unsigned extra_args;
struct dm_arg_set as;
-   static struct dm_arg _args[] = {
+   static const struct dm_arg _args[] = {
{0, 9, "Invalid number of feature args"},
};
unsigned journal_sectors, interleave_sectors, buffer_sectors, 
journal_watermark, sync_msec;
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 3df056b73b66..93f2a5bca918 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -691,7 +691,7 @@ static int parse_path_selector(struct dm_arg_set *as, 
struct priority_group *pg,
struct path_selector_type *pst;
unsigned ps_argc;
 
-   static struct dm_arg _args[] = {
+   static const struct dm_arg _args[] = {
{0, 1024, "invalid number of path selector args"},
};
 
@@ -815,7 +815,7 @@ static struct pgpath *parse_path(struct dm_arg_set *as, 
struct path_selector *ps
 static struct priority_group *parse_priority_group(struct dm_arg_set *as,
   struct multipath *m)
 {
-   static struct dm_arg _args[] = {
+   static const struct dm_arg _args[] = {
{1, 1024, "invalid number of paths"},
{0, 1024, "invalid number of selector args"}
};
@@ -891,7 +891,7 @@ static int parse_hw_handler(struct dm_arg_set *as, struct 
multipath *m)
int ret;
struct dm_target *ti = m->ti;
 
-   static struct dm_arg _args[] = {
+   static const struct dm_arg _args[] = {

Re: [dm-devel] [PATCH v4 09/11] libmultipath: retain_attached_hw_handler obsolete with 4.3+

2017-06-26 Thread Martin Wilck
On Fri, 2017-06-23 at 19:25 +0200, Xose Vazquez Perez wrote:
> On 06/22/2017 04:59 PM, Martin Wilck wrote:
> 
> > Kernels 4.3 and newer (commit 1bab0de0 "dm-mpath, scsi_dh: don't
> > let dm detach device handlers") imply "retain_attached_hw_handler
> > yes".
> > 
> > Clarify this in the propsel code, log messages, and documentation.
> > 
> > Signed-off-by: Martin Wilck 
> > Reviewed-by: Hannes Reinecke 
> > ---
> >  libmultipath/configure.c   |  3 ++-
> >  libmultipath/dmparser.c|  3 ++-
> >  libmultipath/propsel.c |  7 ++-
> >  libmultipath/util.c| 36
> > 
> >  libmultipath/util.h|  2 ++
> >  multipath/multipath.conf.5 | 15 +++
> >  6 files changed, 59 insertions(+), 7 deletions(-)
> 
> [...]
> > --- a/libmultipath/propsel.c
> > +++ b/libmultipath/propsel.c
> > @@ -628,7 +628,12 @@ int select_retain_hwhandler(struct config
> > *conf, struct multipath *mp)
> >  
> > if (!VERSION_GE(conf->version, minv_dm_retain)) {
> > mp->retain_hwhandler = RETAIN_HWHANDLER_OFF;
> > -   origin = "(setting: WARNING, requires kernel
> > version >= 1.5.0)";
> > +   origin = "(setting: WARNING, requires kernel dm-
> > mpath version >= 1.5.0)";
> 
> It would be more informative replace the dm-mpath version with the
> kernel version. No one cares about subsystems versions.

I disagree. This code should also work for vendor kernels which may
e.g. contain patches to update dm-mpath without updating the main
kernel (utsname) version.

The reason I used get_linux_version_code() for the new check my patch
introduced was that unfortunately, the dm-mpath version has not been
changed when the "retain_attached_hwhandler" feature was removed in
4.3. The next dm-mpath version change (to 1.10) happened in 4.4.
Thus I couldn't use the dm-mpath version and had to fallback to
utsname.

Thinking about it, the new check should probably be (dm_mpath version
>= 1.10 OR kernel verson >= 4.3). IMO that can be handled in an
incremental patch.

Regards,
Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

[dm-devel] [PATCH] dm thin: do not queue freed thin mapping for next stage processing

2017-06-26 Thread Vallish Vaidyeshwara
process_prepared_discard_passdown_pt1() should cleanup
dm_thin_new_mapping in cases of error.

dm_pool_inc_data_range() can fail trying to get a block reference:

metadata operation 'dm_pool_inc_data_range' failed: error = -61

When dm_pool_inc_data_range() fails, dm thin aborts current metadata
transaction and marks pool as PM_READ_ONLY. Memory for thin mapping
is released as well. However, current thin mapping will be queued
onto next stage as part of queue_passdown_pt2() or passdown_endio().
This dangling thin mapping memory when processed and accessed in
next stage will lead to device mapper crashing.

Code flow without fix:
-> process_prepared_discard_passdown_pt1(m)
   -> dm_thin_remove_range()
   -> discard passdown
  --> passdown_endio(m) queues m onto next stage
   -> dm_pool_inc_data_range() fails, frees memory m
but does not remove it from next stage queue

-> process_prepared_discard_passdown_pt2(m)
   -> processes freed memory m and crashes

One such stack:

Call Trace:
[] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison]
[] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool]
[] process_prepared_discard_passdown_pt2+0x4b/0x90 
[dm_thin_pool]
[] process_prepared+0x81/0xa0 [dm_thin_pool]
[] do_worker+0xc5/0x820 [dm_thin_pool]
[] ? __schedule+0x244/0x680
[] ? pwq_activate_delayed_work+0x42/0xb0
[] process_one_work+0x153/0x3f0
[] worker_thread+0x12b/0x4b0
[] ? rescuer_thread+0x350/0x350
[] kthread+0xca/0xe0
[] ? kthread_park+0x60/0x60
[] ret_from_fork+0x25/0x30

The fix is to first take the block ref count for discarded block and
then do a passdown discard of this block. If block ref count fails,
then bail out aborting current metadata transaction, mark pool as
PM_READ_ONLY and also free current thin mapping memory (existing error
handling code) without queueing this thin mapping onto next stage of
processing. If block ref count succeeds, then passdown discard of this
block. Discard callback of passdown_endio() will queue this thin mapping
onto next stage of processing.

Code flow with fix:
-> process_prepared_discard_passdown_pt1(m)
   -> dm_thin_remove_range()
   -> dm_pool_inc_data_range()
  --> if fails, free memory m and bail out
   -> discard passdown
  --> passdown_endio(m) queues m onto next stage

Cc: stable  # v4.9+
Reviewed-by: Eduardo Valentin 
Reviewed-by: Cristian Gafton 
Reviewed-by: Anchal Agarwal 
Signed-off-by: Vallish Vaidyeshwara 
---
 drivers/md/dm-thin.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 17ad50d..28808e5 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1094,6 +1094,19 @@ static void process_prepared_discard_passdown_pt1(struct 
dm_thin_new_mapping *m)
return;
}
 
+   /*
+* Increment the unmapped blocks.  This prevents a race between the
+* passdown io and reallocation of freed blocks.
+*/
+   r = dm_pool_inc_data_range(pool->pmd, m->data_block, data_end);
+   if (r) {
+   metadata_operation_failed(pool, "dm_pool_inc_data_range", r);
+   bio_io_error(m->bio);
+   cell_defer_no_holder(tc, m->cell);
+   mempool_free(m, pool->mapping_pool);
+   return;
+   }
+
discard_parent = bio_alloc(GFP_NOIO, 1);
if (!discard_parent) {
DMWARN("%s: unable to allocate top level discard bio for 
passdown. Skipping passdown.",
@@ -1114,19 +1127,6 @@ static void process_prepared_discard_passdown_pt1(struct 
dm_thin_new_mapping *m)
end_discard(&op, r);
}
}
-
-   /*
-* Increment the unmapped blocks.  This prevents a race between the
-* passdown io and reallocation of freed blocks.
-*/
-   r = dm_pool_inc_data_range(pool->pmd, m->data_block, data_end);
-   if (r) {
-   metadata_operation_failed(pool, "dm_pool_inc_data_range", r);
-   bio_io_error(m->bio);
-   cell_defer_no_holder(tc, m->cell);
-   mempool_free(m, pool->mapping_pool);
-   return;
-   }
 }
 
 static void process_prepared_discard_passdown_pt2(struct dm_thin_new_mapping 
*m)
-- 
2.7.3.AMZN

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] multipath-tools: beautify path_latency.c code

2017-06-26 Thread Bart Van Assche
On Thu, 2017-06-22 at 18:46 +0200, Xose Vazquez Perez wrote:
> Mainly running scripts/Lindent, from kernel dir, to replace indent spaces
> by tabs.

Hello Xose,

Why do you think this kind of changes is useful? Are you aware that
whitespace-only changes are very annoying to anyone else who is preparing
changes on the same file because such changes result in a huge number of
annoying rebase conflicts?

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v6 0/2] IV Generation algorithms for dm-crypt

2017-06-26 Thread Eric Biggers
On Fri, Jun 23, 2017 at 04:13:41PM +0800, Herbert Xu wrote:
> Binoy Jayan  wrote:
> > ===
> > dm-crypt optimization for larger block sizes
> > ===
> > 
> > Currently, the iv generation algorithms are implemented in dm-crypt.c. The 
> > goal
> > is to move these algorithms from the dm layer to the kernel crypto layer by
> > implementing them as template ciphers so they can be used in relation with
> > algorithms like aes, and with multiple modes like cbc, ecb etc. As part of 
> > this
> > patchset, the iv-generation code is moved from the dm layer to the crypto 
> > layer
> > and adapt the dm-layer to send a whole 'bio' (as defined in the block layer)
> > at a time. Each bio contains the in memory representation of physically
> > contiguous disk blocks. Since the bio itself may not be contiguous in main
> > memory, the dm layer sets up a chained scatterlist of these blocks split 
> > into
> > physically contiguous segments in memory so that DMA can be performed.
> 
> There is currently a patch-set for fscrypt to add essiv support.  It
> would be interesting to know whether your implementation of essiv
> can also be used in that patchset.  That would confirm that we're on
> the right track.
> 

You can find the fscrypt patch at https://patchwork.kernel.org/patch/9795327/

Note that it's encrypting 4096-byte blocks, not 512-byte.  Also, it's using
AES-256 for the ESSIV tfm (since it uses a SHA-256 hash) but AES-128 for the
"real" encryption.  It's possible this is a mistake and it should be AES-128 for
both.  (If it is, it needs to be fixed before it's released in 4.13.)

Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel