Re: [dm-devel] dm mirror: fix crash caused by NULL-pointer dereference
Hi Mike, On 06/26/2017 09:43 PM, Mike Snitzer wrote: On Mon, Jun 26 2017 at 7:49am -0400, Zdenek Kabelac wrote: [...snip...] Jon (cc'd) tested it and found that the original issue wasn't reproducible. But that the commit in question was the source of crashes (likely due to needing the fix from this thread). We can revisit for 4.13 -- but I'll defer to Heinz. Eric, please share your results from testing the latest 4.12-rc7. Testing on 4.12-rc7 doesn't crash the kernel. Sorry for the noise, please ignore this patch. It's still a confusion to me why backporting this fixes has problem. Will update this thread, if I find something. Regards, Eric Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm mirror: fix crash caused by NULL-pointer dereference
Hi, On 06/26/2017 11:27 PM, Eric Ren wrote: Hi Mike, On 06/26/2017 10:37 PM, Mike Snitzer wrote: On Mon, Jun 26 2017 at 9:47am -0400, Eric Ren wrote: [...snip...] """ Revert "dm mirror: use all available legs on multiple failures" dm io: fix duplicate bio completion due to missing ref count I have a confusion about this "dm io..." fix. The fix itself is good. Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev has failed, will crash the kernel with the discard operation from mkfs.ext4. However, mkfs.ext4 can succeed on a healthy mirrored device. This is the thing I don't understand, because no matter the mirrored device is good or not, there's always a duplicate bio completion before having this this fix, thus write_callback() will be called twice, crashing will occur on the second write_callback(): No, there is only a duplicate bio completion if the error path is taken (e.g. underlying device doesn't support discard). Hmm, when "op == REQ_OP_DISCARD", please see comments in do_region(): """ static void do_region(int op, int op_flags, unsigned region, struct dm_io_region *where, struct dpages *dp, struct io *io) { ... if (op == REQ_OP_DISCARD) special_cmd_max_sectors = q->limits.max_discard_sectors; ... if ((op == REQ_OP_DISCARD || op == REQ_OP_WRITE_ZEROES || op == REQ_OP_WRITE_SAME) && special_cmd_max_sectors == 0) { atomic_inc(&io->count);===> [1] dec_count(io, region, -EOPNOTSUPP); ===> [2] return; } """ [1] ref count fixed by patch "dm io: ..."; [2] we won't come here if "special_cmd_max_sectors != 0", which is true when both sides of the mirror is good. So only when a mirror device fails, "max_discard_sectors" on its queue is 0, thus this error path will be taken, right? Yes, I checked this as below: - print info """ @@ -310,8 +310,10 @@ static void do_region(int op, int op_flags, unsigned region, /* * Reject unsupported discard and write same requests. */ - if (op == REQ_OP_DISCARD) + if (op == REQ_OP_DISCARD) { special_cmd_max_sectors = q->limits.max_discard_sectors; + DMERR("do_region: op=DISCARD, q->limits.max_discard_sectors=%d\n", special_cmd_max_sectors); + } """ - log message when mkfs on mirror device that primary leg fails: [ 169.672880] device-mapper: io: do_region: op=DISCARD, q->limits.max_discard_sectors=0===> for the failed leg [ 169.672885] device-mapper: io: do_region: op=DISCARD, q->limits.max_discard_sectors=8388607 ===> for the good leg Thanks, Eric """ static void write_callback(unsigned long error, void *context) { unsigned i; struct bio *bio = (struct bio *) context; struct mirror_set *ms; int should_wake = 0; unsigned long flags; ms = bio_get_m(bio)->ms;> NULL pointer at the duplicate completion bio_set_m(bio, NULL); """ If no this fix, I expected the DISCARD IO would always crash the kernel, but it's not true when the mirrored device is good. Hope someone happen to know the reason can give some hints ;-P If the mirror is healthy then only one completion is returned to dm-mirror (via write_callback). The problem was the error patch wasn't managing the reference count as needed. Whereas dm-io's normal discard IO path does. Yes, I can understand this. Thanks a lot, Eric Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm-cache coherence issue
On Mon, Jun 26 2017 at 4:36pm -0400, Johannes Bauer wrote: > On 26.06.2017 21:56, Mike Snitzer wrote: > > >> Interesting, I did *not* change to writethrough. However, there > >> shouldn't have been any I/O on the device (it was not accessed by > >> anything after I switched to the cleaner policy). > [...] > >> Anyways, I'll try to replicate my scenario again because I'm actually > >> quite sure that I did everything correctly (I did it a few times). > > > > Except you didn't first switch to writethrough -- which is _not_ > > correct. > > Absolutely, very good to know. So even without any I/O being request, > dm-cache is allowed to "hold back" pages as long as the dm-cache device > is in writeback mode? s/pages/blocks/ The "dmsetup status" output for a DM cache device is showing dirty accounting is in terms of cache blocks. > Would this also explain why the "dmsetup wait" hung indefinitely? You need to read the dmsetup man page, dmsetup wait" has _nothing_ to do with waiting for IO to complete. It is about DM events, without specifying an event_nr you're just waiting for the device's event counter to increment (which may never happen if you aren't doing anything that'd trigger an event). See: " wait [--noflush] device_name [event_nr] Sleeps until the event counter for device_name exceeds event_nr. Use -v to see the event number returned. To wait until the next event is triggered, use info to find the last event number. With --noflush, the thin target (from version 1.3.0) doesn't commit any outstanding changes to disk before reporting its statistics." > I do think I followed a tutorial that I found on the net regarding this. > Scary that such a crucial fact is missing there. The fact that dirty > pages are reported as zero just gives the impression that everything is > coherent, when in fact it's not. I'll concede that it is weird that you're seeing a different md5sum for the origin vs the cache (that is in writeback mode yet reports 0 dirty blocks). But I think there is some important detail that would explain it; sadly I'd need to dig in and reproduce on a testbed to identify it. Maybe Joe will be able to offer a quick answer? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm-cache coherence issue
On 26.06.2017 21:56, Mike Snitzer wrote: >> Interesting, I did *not* change to writethrough. However, there >> shouldn't have been any I/O on the device (it was not accessed by >> anything after I switched to the cleaner policy). [...] >> Anyways, I'll try to replicate my scenario again because I'm actually >> quite sure that I did everything correctly (I did it a few times). > > Except you didn't first switch to writethrough -- which is _not_ > correct. Absolutely, very good to know. So even without any I/O being request, dm-cache is allowed to "hold back" pages as long as the dm-cache device is in writeback mode? Would this also explain why the "dmsetup wait" hung indefinitely? I do think I followed a tutorial that I found on the net regarding this. Scary that such a crucial fact is missing there. The fact that dirty pages are reported as zero just gives the impression that everything is coherent, when in fact it's not. Regardless, I find this and your explanation extremely interesting and want to thank you for clearing this up. Very fascinating topic indeed. Best regards, Johannes -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm-cache coherence issue
On Mon, Jun 26 2017 at 3:08pm -0400, Johannes Bauer wrote: > Hi Joe, > > On 26.06.2017 17:58, Joe Thornber wrote: > > On Mon, Jun 26, 2017 at 12:33:42PM +0100, Joe Thornber wrote: > >> On Sat, Jun 24, 2017 at 03:56:54PM +0200, Johannes Bauer wrote: > >>> So I seem to have a very basic misunderstanding of what the cleaner > >>> policy/dirty pages mean. Is there a way to force the cache to flush > >>> entirely? Apparently, "dmsetup wait" and/or "sync" don't do the job. > >> > >> I'll try and reproduce your scenario and get back to you. > > > > Here's a similar scenario that I've added to the dm test suite: > > > > https://github.com/jthornber/device-mapper-test-suite/commit/457e889b0c4d510609c0d7464af07f2ebee20768 > > > > It goes through all the steps you need to use to decommission a cache > > with dirty blocks. Namely: > > > > - switch to writethrough mode (so new io can't create new dirty blocks) > > - switch to the cleaner policy > > - wait for clean > > - remove the cache device before accessing the origin directly > > Interesting, I did *not* change to writethrough. However, there > shouldn't have been any I/O on the device (it was not accessed by > anything after I switched to the cleaner policy). > > On the advice of Zdenek Kabelac, who messaged me off-list, I therefore > have indeed been convinced that using dm-cache "by hand" maybe is too > dangerous (i.e., I had not accounted for "repairing" of a cache and am > really still unsure what LVM does that) and that lvmcache is a better > choice -- even though Ubuntu supports it really crappily for root devices: > https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1423796 a shame, > it's been like this for over two years. > > Anyways, I'll try to replicate my scenario again because I'm actually > quite sure that I did everything correctly (I did it a few times). Except you didn't first switch to writethrough -- which is _not_ correct. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm-cache coherence issue
Hi Joe, On 26.06.2017 17:58, Joe Thornber wrote: > On Mon, Jun 26, 2017 at 12:33:42PM +0100, Joe Thornber wrote: >> On Sat, Jun 24, 2017 at 03:56:54PM +0200, Johannes Bauer wrote: >>> So I seem to have a very basic misunderstanding of what the cleaner >>> policy/dirty pages mean. Is there a way to force the cache to flush >>> entirely? Apparently, "dmsetup wait" and/or "sync" don't do the job. >> >> I'll try and reproduce your scenario and get back to you. > > Here's a similar scenario that I've added to the dm test suite: > > https://github.com/jthornber/device-mapper-test-suite/commit/457e889b0c4d510609c0d7464af07f2ebee20768 > > It goes through all the steps you need to use to decommission a cache > with dirty blocks. Namely: > > - switch to writethrough mode (so new io can't create new dirty blocks) > - switch to the cleaner policy > - wait for clean > - remove the cache device before accessing the origin directly Interesting, I did *not* change to writethrough. However, there shouldn't have been any I/O on the device (it was not accessed by anything after I switched to the cleaner policy). On the advice of Zdenek Kabelac, who messaged me off-list, I therefore have indeed been convinced that using dm-cache "by hand" maybe is too dangerous (i.e., I had not accounted for "repairing" of a cache and am really still unsure what LVM does that) and that lvmcache is a better choice -- even though Ubuntu supports it really crappily for root devices: https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1423796 a shame, it's been like this for over two years. Anyways, I'll try to replicate my scenario again because I'm actually quite sure that I did everything correctly (I did it a few times). Thanks for you help, Johannes -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm thin: do not queue freed thin mapping for next stage processing
On Fri, Jun 23 2017 at 2:53pm -0400, Vallish Vaidyeshwara wrote: > process_prepared_discard_passdown_pt1() should cleanup > dm_thin_new_mapping in cases of error. > > dm_pool_inc_data_range() can fail trying to get a block reference: > > metadata operation 'dm_pool_inc_data_range' failed: error = -61 > > When dm_pool_inc_data_range() fails, dm thin aborts current metadata > transaction and marks pool as PM_READ_ONLY. Memory for thin mapping > is released as well. However, current thin mapping will be queued > onto next stage as part of queue_passdown_pt2() or passdown_endio(). > This dangling thin mapping memory when processed and accessed in > next stage will lead to device mapper crashing. > > Code flow without fix: > -> process_prepared_discard_passdown_pt1(m) >-> dm_thin_remove_range() >-> discard passdown > --> passdown_endio(m) queues m onto next stage >-> dm_pool_inc_data_range() fails, frees memory m > but does not remove it from next stage queue > > -> process_prepared_discard_passdown_pt2(m) >-> processes freed memory m and crashes > > One such stack: > > Call Trace: > [] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison] > [] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool] > [] process_prepared_discard_passdown_pt2+0x4b/0x90 > [dm_thin_pool] > [] process_prepared+0x81/0xa0 [dm_thin_pool] > [] do_worker+0xc5/0x820 [dm_thin_pool] > [] ? __schedule+0x244/0x680 > [] ? pwq_activate_delayed_work+0x42/0xb0 > [] process_one_work+0x153/0x3f0 > [] worker_thread+0x12b/0x4b0 > [] ? rescuer_thread+0x350/0x350 > [] kthread+0xca/0xe0 > [] ? kthread_park+0x60/0x60 > [] ret_from_fork+0x25/0x30 > > The fix is to first take the block ref count for discarded block and > then do a passdown discard of this block. If block ref count fails, > then bail out aborting current metadata transaction, mark pool as > PM_READ_ONLY and also free current thin mapping memory (existing error > handling code) without queueing this thin mapping onto next stage of > processing. If block ref count succeeds, then passdown discard of this > block. Discard callback of passdown_endio() will queue this thin mapping > onto next stage of processing. > > Code flow with fix: > -> process_prepared_discard_passdown_pt1(m) >-> dm_thin_remove_range() >-> dm_pool_inc_data_range() > --> if fails, free memory m and bail out >-> discard passdown > --> passdown_endio(m) queues m onto next stage > > Cc: stable # v4.9+ > Reviewed-by: Eduardo Valentin > Reviewed-by: Cristian Gafton > Reviewed-by: Anchal Agarwal > Signed-off-by: Vallish Vaidyeshwara > --- > drivers/md/dm-thin.c | 26 +- > 1 file changed, 13 insertions(+), 13 deletions(-) > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c > index 17ad50d..28808e5 100644 > --- a/drivers/md/dm-thin.c > +++ b/drivers/md/dm-thin.c > @@ -1094,6 +1094,19 @@ static void > process_prepared_discard_passdown_pt1(struct dm_thin_new_mapping *m) > return; > } > > + /* > + * Increment the unmapped blocks. This prevents a race between the > + * passdown io and reallocation of freed blocks. > + */ > + r = dm_pool_inc_data_range(pool->pmd, m->data_block, data_end); > + if (r) { > + metadata_operation_failed(pool, "dm_pool_inc_data_range", r); > + bio_io_error(m->bio); > + cell_defer_no_holder(tc, m->cell); > + mempool_free(m, pool->mapping_pool); > + return; > + } > + > discard_parent = bio_alloc(GFP_NOIO, 1); > if (!discard_parent) { > DMWARN("%s: unable to allocate top level discard bio for > passdown. Skipping passdown.", > @@ -1114,19 +1127,6 @@ static void > process_prepared_discard_passdown_pt1(struct dm_thin_new_mapping *m) > end_discard(&op, r); > } > } > - > - /* > - * Increment the unmapped blocks. This prevents a race between the > - * passdown io and reallocation of freed blocks. > - */ > - r = dm_pool_inc_data_range(pool->pmd, m->data_block, data_end); > - if (r) { > - metadata_operation_failed(pool, "dm_pool_inc_data_range", r); > - bio_io_error(m->bio); > - cell_defer_no_holder(tc, m->cell); > - mempool_free(m, pool->mapping_pool); > - return; > - } > } > > static void process_prepared_discard_passdown_pt2(struct dm_thin_new_mapping > *m) This looks like a good fix. But I'll wait for Joe's confirmation before staging for 4.12 final inclusion later this week. Thanks, Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH] libmultipath: update 3PARdata builtin config
This updated config comes from hp. Signed-off-by: Benjamin Marzinski --- libmultipath/hwtable.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c index 390d143..54bdcfc 100644 --- a/libmultipath/hwtable.c +++ b/libmultipath/hwtable.c @@ -49,6 +49,8 @@ static struct hwentry default_hw[] = { .hwhandler = "1 alua", .prio_name = PRIO_ALUA, .no_path_retry = 18, + .fast_io_fail = 10, + .dev_loss = MAX_DEV_LOSS_TMO, }, { /* RA8000 / ESA12000 */ -- 2.7.4 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm-cache coherence issue
On Mon, Jun 26, 2017 at 12:33:42PM +0100, Joe Thornber wrote: > On Sat, Jun 24, 2017 at 03:56:54PM +0200, Johannes Bauer wrote: > > So I seem to have a very basic misunderstanding of what the cleaner > > policy/dirty pages mean. Is there a way to force the cache to flush > > entirely? Apparently, "dmsetup wait" and/or "sync" don't do the job. > > I'll try and reproduce your scenario and get back to you. Here's a similar scenario that I've added to the dm test suite: https://github.com/jthornber/device-mapper-test-suite/commit/457e889b0c4d510609c0d7464af07f2ebee20768 It goes through all the steps you need to use to decommission a cache with dirty blocks. Namely: - switch to writethrough mode (so new io can't create new dirty blocks) - switch to the cleaner policy - wait for clean - remove the cache device before accessing the origin directly I've run it with a recent kernel, and one from about a year ago and it passes fine. - Joe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm mirror: fix crash caused by NULL-pointer dereference
Hi Mike, On 06/26/2017 10:37 PM, Mike Snitzer wrote: On Mon, Jun 26 2017 at 9:47am -0400, Eric Ren wrote: [...snip...] """ Revert "dm mirror: use all available legs on multiple failures" dm io: fix duplicate bio completion due to missing ref count I have a confusion about this "dm io..." fix. The fix itself is good. Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev has failed, will crash the kernel with the discard operation from mkfs.ext4. However, mkfs.ext4 can succeed on a healthy mirrored device. This is the thing I don't understand, because no matter the mirrored device is good or not, there's always a duplicate bio completion before having this this fix, thus write_callback() will be called twice, crashing will occur on the second write_callback(): No, there is only a duplicate bio completion if the error path is taken (e.g. underlying device doesn't support discard). Hmm, when "op == REQ_OP_DISCARD", please see comments in do_region(): """ static void do_region(int op, int op_flags, unsigned region, struct dm_io_region *where, struct dpages *dp, struct io *io) { ... if (op == REQ_OP_DISCARD) special_cmd_max_sectors = q->limits.max_discard_sectors; ... if ((op == REQ_OP_DISCARD || op == REQ_OP_WRITE_ZEROES || op == REQ_OP_WRITE_SAME) && special_cmd_max_sectors == 0) { atomic_inc(&io->count);===> [1] dec_count(io, region, -EOPNOTSUPP); ===> [2] return; } """ [1] ref count fixed by patch "dm io: ..."; [2] we won't come here if "special_cmd_max_sectors != 0", which is true when both sides of the mirror is good. So only when a mirror device fails, "max_discard_sectors" on its queue is 0, thus this error path will be taken, right? """ static void write_callback(unsigned long error, void *context) { unsigned i; struct bio *bio = (struct bio *) context; struct mirror_set *ms; int should_wake = 0; unsigned long flags; ms = bio_get_m(bio)->ms;> NULL pointer at the duplicate completion bio_set_m(bio, NULL); """ If no this fix, I expected the DISCARD IO would always crash the kernel, but it's not true when the mirrored device is good. Hope someone happen to know the reason can give some hints ;-P If the mirror is healthy then only one completion is returned to dm-mirror (via write_callback). The problem was the error patch wasn't managing the reference count as needed. Whereas dm-io's normal discard IO path does. Yes, I can understand this. Thanks a lot, Eric Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm mirror: fix crash caused by NULL-pointer dereference
On Mon, Jun 26 2017 at 9:47am -0400, Eric Ren wrote: > Hi, > > On 06/26/2017 06:55 PM, Eric Ren wrote: > >Hi Zdenek, > > > > > >On 06/26/2017 05:46 PM, Zdenek Kabelac wrote: > >>Dne 26.6.2017 v 11:08 Eric Ren napsal(a): > >>> > >>[... snip...] > >> > >>Hi > >> > >>Which kernel version is this ? > >> > >>I'd thought we've already fixed this BZ for old mirrors: > >>https://bugzilla.redhat.com/show_bug.cgi?id=1382382 > >> > >>There similar BZ for md-raid based mirrors (--type raid1) > >>https://bugzilla.redhat.com/show_bug.cgi?id=1416099 > >My base kernel version is 4.4.68, but with this 2 latest fixes applied: > > > >""" > >Revert "dm mirror: use all available legs on multiple failures" > >dm io: fix duplicate bio completion due to missing ref count > > I have a confusion about this "dm io..." fix. The fix itself is good. > > Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev > has failed, will crash the kernel with the discard operation from mkfs.ext4. > However, mkfs.ext4 can succeed on a healthy mirrored device. This > is the thing I don't understand, because no matter the mirrored device is > good or not, there's always a duplicate bio completion before having this > this fix, thus write_callback() will be called twice, crashing will > occur on the > second write_callback(): No, there is only a duplicate bio completion if the error path is taken (e.g. underlying device doesn't support discard). > """ > static void write_callback(unsigned long error, void *context) > { > unsigned i; > struct bio *bio = (struct bio *) context; > struct mirror_set *ms; > int should_wake = 0; > unsigned long flags; > > ms = bio_get_m(bio)->ms;> NULL pointer at the > duplicate completion > bio_set_m(bio, NULL); > """ > > If no this fix, I expected the DISCARD IO would always crash the > kernel, but it's not true when > the mirrored device is good. Hope someone happen to know the reason > can give some hints ;-P If the mirror is healthy then only one completion is returned to dm-mirror (via write_callback). The problem was the error patch wasn't managing the reference count as needed. Whereas dm-io's normal discard IO path does. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm mirror: fix crash caused by NULL-pointer dereference
Hi Mike, On 06/26/2017 09:43 PM, Mike Snitzer wrote: On Mon, Jun 26 2017 at 7:49am -0400, Zdenek Kabelac wrote: [...snip...] Ohh - I've -rc6 - while this 'revert' patch went to 4.12-rc7. I'm now starting to wonder why? It's been a real fix for a real issue - and 'revert' message states there is no such problem ?? I'm confused Mike - have you tried the sequence from BZ ? Jon (cc'd) tested it and found that the original issue wasn't reproducible. But that the commit in question was the source of crashes (likely due to needing the fix from this thread). We can revisit for 4.13 -- but I'll defer to Heinz. Eric, please share your results from testing the latest 4.12-rc7. Sure, but I will do it tomorrow, different timezone ;-) Maybe, Jon can also try my producer on the latest kernel. So, we can double confirm with each other. Thanks, Eric Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference
Hi, On 06/26/2017 06:55 PM, Eric Ren wrote: Hi Zdenek, On 06/26/2017 05:46 PM, Zdenek Kabelac wrote: Dne 26.6.2017 v 11:08 Eric Ren napsal(a): [... snip...] Hi Which kernel version is this ? I'd thought we've already fixed this BZ for old mirrors: https://bugzilla.redhat.com/show_bug.cgi?id=1382382 There similar BZ for md-raid based mirrors (--type raid1) https://bugzilla.redhat.com/show_bug.cgi?id=1416099 My base kernel version is 4.4.68, but with this 2 latest fixes applied: """ Revert "dm mirror: use all available legs on multiple failures" dm io: fix duplicate bio completion due to missing ref count I have a confusion about this "dm io..." fix. The fix itself is good. Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev has failed, will crash the kernel with the discard operation from mkfs.ext4. However, mkfs.ext4 can succeed on a healthy mirrored device. This is the thing I don't understand, because no matter the mirrored device is good or not, there's always a duplicate bio completion before having this this fix, thus write_callback() will be called twice, crashing will occur on the second write_callback(): """ static void write_callback(unsigned long error, void *context) { unsigned i; struct bio *bio = (struct bio *) context; struct mirror_set *ms; int should_wake = 0; unsigned long flags; ms = bio_get_m(bio)->ms;> NULL pointer at the duplicate completion bio_set_m(bio, NULL); """ If no this fix, I expected the DISCARD IO would always crash the kernel, but it's not true when the mirrored device is good. Hope someone happen to know the reason can give some hints ;-P Regards, Eric """ Yes, I've been working on dm-mirror crash issue for couple days, but only see this fixes above when I rebased to send my fixes out, hah. But, with this 2 fixes, the producer can still crash the kernel. Anyway, I can test with upstream kernel again ;-P Regards, Eric Regards Zdenej -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm mirror: fix crash caused by NULL-pointer dereference
On Mon, Jun 26 2017 at 7:49am -0400, Zdenek Kabelac wrote: > Dne 26.6.2017 v 12:55 Eric Ren napsal(a): > >Hi Zdenek, > > > > > >On 06/26/2017 05:46 PM, Zdenek Kabelac wrote: > >>Dne 26.6.2017 v 11:08 Eric Ren napsal(a): > >>> > >>[... snip...] > >> > >>Hi > >> > >>Which kernel version is this ? > >> > >>I'd thought we've already fixed this BZ for old mirrors: > >>https://bugzilla.redhat.com/show_bug.cgi?id=1382382 > >> > >>There similar BZ for md-raid based mirrors (--type raid1) > >>https://bugzilla.redhat.com/show_bug.cgi?id=1416099 > >My base kernel version is 4.4.68, but with this 2 latest fixes applied: > > > >""" > >Revert "dm mirror: use all available legs on multiple failures" > > Ohh - I've -rc6 - while this 'revert' patch went to 4.12-rc7. > > I'm now starting to wonder why? > > It's been a real fix for a real issue - and 'revert' message states > there is no such problem ?? > > I'm confused > > Mike - have you tried the sequence from BZ ? Jon (cc'd) tested it and found that the original issue wasn't reproducible. But that the commit in question was the source of crashes (likely due to needing the fix from this thread). We can revisit for 4.13 -- but I'll defer to Heinz. Eric, please share your results from testing the latest 4.12-rc7. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference
Dne 26.6.2017 v 12:55 Eric Ren napsal(a): Hi Zdenek, On 06/26/2017 05:46 PM, Zdenek Kabelac wrote: Dne 26.6.2017 v 11:08 Eric Ren napsal(a): [... snip...] Hi Which kernel version is this ? I'd thought we've already fixed this BZ for old mirrors: https://bugzilla.redhat.com/show_bug.cgi?id=1382382 There similar BZ for md-raid based mirrors (--type raid1) https://bugzilla.redhat.com/show_bug.cgi?id=1416099 My base kernel version is 4.4.68, but with this 2 latest fixes applied: """ Revert "dm mirror: use all available legs on multiple failures" Ohh - I've -rc6 - while this 'revert' patch went to 4.12-rc7. I'm now starting to wonder why? It's been a real fix for a real issue - and 'revert' message states there is no such problem ?? I'm confused Mike - have you tried the sequence from BZ ? Zdenek -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm-cache coherence issue
On Sat, Jun 24, 2017 at 03:56:54PM +0200, Johannes Bauer wrote: > So I seem to have a very basic misunderstanding of what the cleaner > policy/dirty pages mean. Is there a way to force the cache to flush > entirely? Apparently, "dmsetup wait" and/or "sync" don't do the job. Your understanding is correct. There is a tool in the latest thinp tools release called cache_writeback that does offline decommissioning of a cache. I'll try and reproduce your scenario and get back to you. - Joe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference
Hi Zdenek, On 06/26/2017 05:46 PM, Zdenek Kabelac wrote: Dne 26.6.2017 v 11:08 Eric Ren napsal(a): [... snip...] Hi Which kernel version is this ? I'd thought we've already fixed this BZ for old mirrors: https://bugzilla.redhat.com/show_bug.cgi?id=1382382 There similar BZ for md-raid based mirrors (--type raid1) https://bugzilla.redhat.com/show_bug.cgi?id=1416099 My base kernel version is 4.4.68, but with this 2 latest fixes applied: """ Revert "dm mirror: use all available legs on multiple failures" dm io: fix duplicate bio completion due to missing ref count """ Yes, I've been working on dm-mirror crash issue for couple days, but only see this fixes above when I rebased to send my fixes out, hah. But, with this 2 fixes, the producer can still crash the kernel. Anyway, I can test with upstream kernel again ;-P Regards, Eric Regards Zdenej -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference
Hi, On 06/26/2017 05:14 PM, Johannes Thumshirn wrote: On Mon, Jun 26, 2017 at 05:08:48PM +0800, Eric Ren wrote: When the primary mirror device fails, activating a mirrored LV will crash the kernel. It can be reproduced 100% with the scripts below: """ dd if=/dev/zero of=file.raw bs=1G count=1 loopdev=$(losetup -f) losetup $loopdev file.raw dmsetup create pv1 --table "0 102399 linear $loopdev 0" dmsetup create pv2 --table "0 102399 linear $loopdev 102400" vgcreate vgtest /dev/mapper/pv1 /dev/mapper/pv2 lvcreate -l1 --type mirror -m1 -n mirror12 --mirrorlog core \ vgtest /dev/mapper/pv1 /dev/mapper/pv2 vgchange -an vgtest echo 0 1000 error | dmsetup load /dev/mapper/pv1 dmsetup resume /dev/mapper/pv1 vgchange -ay vgtest """ So as you have a reproducer, can you eventually write an blktest [1] regression test for it? No problem, will do later on. Thanks, Eric Thanks, Johannes [1] https://github.com/osandov/blktests -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] move bounce limits settings into the drivers
ping? On Mon, Jun 19, 2017 at 09:26:18AM +0200, Christoph Hellwig wrote: > Currently we still default to a bounce all highmem setting for block > drivers. This series defaults to no bouncing and instead adds call > to blk_queue_bounce_limit to those drivers that need it. It also > has a few cleanups in that area. ---end quoted text--- -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference
Dne 26.6.2017 v 11:08 Eric Ren napsal(a): When the primary mirror device fails, activating a mirrored LV will crash the kernel. It can be reproduced 100% with the scripts below: """ dd if=/dev/zero of=file.raw bs=1G count=1 loopdev=$(losetup -f) losetup $loopdev file.raw dmsetup create pv1 --table "0 102399 linear $loopdev 0" dmsetup create pv2 --table "0 102399 linear $loopdev 102400" vgcreate vgtest /dev/mapper/pv1 /dev/mapper/pv2 lvcreate -l1 --type mirror -m1 -n mirror12 --mirrorlog core \ vgtest /dev/mapper/pv1 /dev/mapper/pv2 vgchange -an vgtest echo 0 1000 error | dmsetup load /dev/mapper/pv1 dmsetup resume /dev/mapper/pv1 vgchange -ay vgtest " > The call trace: """ [ 287.008629] device-mapper: raid1: Unable to read primary mirror during recovery [ 287.008632] device-mapper: raid1: Primary mirror (254:10) failed while out-of-sync: Reads may fail. ... [ 287.012480] BUG: unable to handle kernel NULL pointer dereference at 0019 [ 287.012515] IP: [] mirror_end_io+0x7f/0x130 [dm_mirror] ... [ 291.994645] Call Trace: [ 291.994671] [] clone_endio+0x35/0xe0 [dm_mod] [ 291.994675] [] do_reads+0x17d/0x1d0 [dm_mirror] [ 291.994680] [] do_mirror+0xec/0x250 [dm_mirror] [ 291.994687] [] process_one_work+0x14e/0x410 [ 291.994691] [] worker_thread+0x116/0x490 [ 291.994694] [] kthread+0xc7/0xe0 """ Fixes it by setting "details.bi_bdev" to NULL in error path beforing calling into mirror_end_io(), which will fail the IO properly. Hi Which kernel version is this ? I'd thought we've already fixed this BZ for old mirrors: https://bugzilla.redhat.com/show_bug.cgi?id=1382382 There similar BZ for md-raid based mirrors (--type raid1) https://bugzilla.redhat.com/show_bug.cgi?id=1416099 Regards Zdenej -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference
On Mon, Jun 26, 2017 at 05:08:48PM +0800, Eric Ren wrote: > When the primary mirror device fails, activating a mirrored > LV will crash the kernel. It can be reproduced 100% with the > scripts below: > > """ > dd if=/dev/zero of=file.raw bs=1G count=1 > loopdev=$(losetup -f) > losetup $loopdev file.raw > dmsetup create pv1 --table "0 102399 linear $loopdev 0" > dmsetup create pv2 --table "0 102399 linear $loopdev 102400" > vgcreate vgtest /dev/mapper/pv1 /dev/mapper/pv2 > lvcreate -l1 --type mirror -m1 -n mirror12 --mirrorlog core \ > vgtest /dev/mapper/pv1 /dev/mapper/pv2 > vgchange -an vgtest > echo 0 1000 error | dmsetup load /dev/mapper/pv1 > dmsetup resume /dev/mapper/pv1 > vgchange -ay vgtest > """ So as you have a reproducer, can you eventually write an blktest [1] regression test for it? Thanks, Johannes [1] https://github.com/osandov/blktests -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH] dm mirror: fix crash caused by NULL-pointer dereference
When the primary mirror device fails, activating a mirrored LV will crash the kernel. It can be reproduced 100% with the scripts below: """ dd if=/dev/zero of=file.raw bs=1G count=1 loopdev=$(losetup -f) losetup $loopdev file.raw dmsetup create pv1 --table "0 102399 linear $loopdev 0" dmsetup create pv2 --table "0 102399 linear $loopdev 102400" vgcreate vgtest /dev/mapper/pv1 /dev/mapper/pv2 lvcreate -l1 --type mirror -m1 -n mirror12 --mirrorlog core \ vgtest /dev/mapper/pv1 /dev/mapper/pv2 vgchange -an vgtest echo 0 1000 error | dmsetup load /dev/mapper/pv1 dmsetup resume /dev/mapper/pv1 vgchange -ay vgtest """ The call trace: """ [ 287.008629] device-mapper: raid1: Unable to read primary mirror during recovery [ 287.008632] device-mapper: raid1: Primary mirror (254:10) failed while out-of-sync: Reads may fail. ... [ 287.012480] BUG: unable to handle kernel NULL pointer dereference at 0019 [ 287.012515] IP: [] mirror_end_io+0x7f/0x130 [dm_mirror] ... [ 291.994645] Call Trace: [ 291.994671] [] clone_endio+0x35/0xe0 [dm_mod] [ 291.994675] [] do_reads+0x17d/0x1d0 [dm_mirror] [ 291.994680] [] do_mirror+0xec/0x250 [dm_mirror] [ 291.994687] [] process_one_work+0x14e/0x410 [ 291.994691] [] worker_thread+0x116/0x490 [ 291.994694] [] kthread+0xc7/0xe0 """ Fixes it by setting "details.bi_bdev" to NULL in error path beforing calling into mirror_end_io(), which will fail the IO properly. Reported-by: Jerry Tang Signed-off-by: Eric Ren --- drivers/md/dm-raid1.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c index 4da8858..696e308 100644 --- a/drivers/md/dm-raid1.c +++ b/drivers/md/dm-raid1.c @@ -568,6 +568,7 @@ static void do_reads(struct mirror_set *ms, struct bio_list *reads) region_t region; struct bio *bio; struct mirror *m; + struct dm_raid1_bio_record *bio_record; while ((bio = bio_list_pop(reads))) { region = dm_rh_bio_to_region(ms->rh, bio); @@ -583,8 +584,16 @@ static void do_reads(struct mirror_set *ms, struct bio_list *reads) if (likely(m)) read_async_bio(m, bio); - else + else { + /* +* In mirror_end_io(), we will fail the IO properly +* if details.bi_bdev is NULL. +*/ + bio_record = dm_per_bio_data(bio, + sizeof(struct dm_raid1_bio_record)); + bio_record->details.bi_bdev = NULL; bio_io_error(bio); + } } } -- 2.10.2 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH] dm ioctl: constify ioctl lookup table
From: Eric Biggers Constify the lookup table for device-mapper ioctls so that it is placed in .rodata. Signed-off-by: Eric Biggers --- drivers/md/dm-ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 41852ae287a5..8e9a1eff25d3 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1604,7 +1604,7 @@ static int target_message(struct dm_ioctl *param, size_t param_size) *---*/ static ioctl_fn lookup_ioctl(unsigned int cmd, int *ioctl_flags) { - static struct { + static const struct { int cmd; int flags; ioctl_fn fn; -- 2.13.1.611.g7e3b11ae1-goog -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH] dm: constify argument arrays
From: Eric Biggers The arrays of 'struct dm_arg' are never modified by the device-mapper core, so constify them so that they are placed in .rodata. (Exception: the args array in dm-raid cannot be constified because it is allocated on the stack and modified.) Signed-off-by: Eric Biggers --- drivers/md/dm-cache-target.c | 4 ++-- drivers/md/dm-crypt.c | 2 +- drivers/md/dm-flakey.c| 4 ++-- drivers/md/dm-integrity.c | 2 +- drivers/md/dm-mpath.c | 10 +- drivers/md/dm-switch.c| 2 +- drivers/md/dm-table.c | 7 --- drivers/md/dm-thin.c | 2 +- drivers/md/dm-verity-target.c | 2 +- include/linux/device-mapper.h | 4 ++-- 10 files changed, 20 insertions(+), 19 deletions(-) diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index d682a0511381..484fbe0327b4 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -2305,7 +2305,7 @@ static void init_features(struct cache_features *cf) static int parse_features(struct cache_args *ca, struct dm_arg_set *as, char **error) { - static struct dm_arg _args[] = { + static const struct dm_arg _args[] = { {0, 2, "Invalid number of cache feature arguments"}, }; @@ -2347,7 +2347,7 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as, static int parse_policy(struct cache_args *ca, struct dm_arg_set *as, char **error) { - static struct dm_arg _args[] = { + static const struct dm_arg _args[] = { {0, 1024, "Invalid number of policy arguments"}, }; diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index ebf9e72d479b..95c5faf05391 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -2514,7 +2514,7 @@ static int crypt_ctr_optional(struct dm_target *ti, unsigned int argc, char **ar { struct crypt_config *cc = ti->private; struct dm_arg_set as; - static struct dm_arg _args[] = { + static const struct dm_arg _args[] = { {0, 6, "Invalid number of feature args"}, }; unsigned int opt_params, val; diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index 13305a182611..56cc3fba3eaa 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -51,7 +51,7 @@ static int parse_features(struct dm_arg_set *as, struct flakey_c *fc, unsigned argc; const char *arg_name; - static struct dm_arg _args[] = { + static const struct dm_arg _args[] = { {0, 6, "Invalid number of feature args"}, {1, UINT_MAX, "Invalid corrupt bio byte"}, {0, 255, "Invalid corrupt value to write into bio byte (0-255)"}, @@ -178,7 +178,7 @@ static int parse_features(struct dm_arg_set *as, struct flakey_c *fc, */ static int flakey_ctr(struct dm_target *ti, unsigned int argc, char **argv) { - static struct dm_arg _args[] = { + static const struct dm_arg _args[] = { {0, UINT_MAX, "Invalid up interval"}, {0, UINT_MAX, "Invalid down interval"}, }; diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c index 7910bfe50da4..866714c1dee1 100644 --- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -2763,7 +2763,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) int r; unsigned extra_args; struct dm_arg_set as; - static struct dm_arg _args[] = { + static const struct dm_arg _args[] = { {0, 9, "Invalid number of feature args"}, }; unsigned journal_sectors, interleave_sectors, buffer_sectors, journal_watermark, sync_msec; diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 3df056b73b66..93f2a5bca918 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -691,7 +691,7 @@ static int parse_path_selector(struct dm_arg_set *as, struct priority_group *pg, struct path_selector_type *pst; unsigned ps_argc; - static struct dm_arg _args[] = { + static const struct dm_arg _args[] = { {0, 1024, "invalid number of path selector args"}, }; @@ -815,7 +815,7 @@ static struct pgpath *parse_path(struct dm_arg_set *as, struct path_selector *ps static struct priority_group *parse_priority_group(struct dm_arg_set *as, struct multipath *m) { - static struct dm_arg _args[] = { + static const struct dm_arg _args[] = { {1, 1024, "invalid number of paths"}, {0, 1024, "invalid number of selector args"} }; @@ -891,7 +891,7 @@ static int parse_hw_handler(struct dm_arg_set *as, struct multipath *m) int ret; struct dm_target *ti = m->ti; - static struct dm_arg _args[] = { + static const struct dm_arg _args[] = {
Re: [dm-devel] [PATCH v4 09/11] libmultipath: retain_attached_hw_handler obsolete with 4.3+
On Fri, 2017-06-23 at 19:25 +0200, Xose Vazquez Perez wrote: > On 06/22/2017 04:59 PM, Martin Wilck wrote: > > > Kernels 4.3 and newer (commit 1bab0de0 "dm-mpath, scsi_dh: don't > > let dm detach device handlers") imply "retain_attached_hw_handler > > yes". > > > > Clarify this in the propsel code, log messages, and documentation. > > > > Signed-off-by: Martin Wilck > > Reviewed-by: Hannes Reinecke > > --- > > libmultipath/configure.c | 3 ++- > > libmultipath/dmparser.c| 3 ++- > > libmultipath/propsel.c | 7 ++- > > libmultipath/util.c| 36 > > > > libmultipath/util.h| 2 ++ > > multipath/multipath.conf.5 | 15 +++ > > 6 files changed, 59 insertions(+), 7 deletions(-) > > [...] > > --- a/libmultipath/propsel.c > > +++ b/libmultipath/propsel.c > > @@ -628,7 +628,12 @@ int select_retain_hwhandler(struct config > > *conf, struct multipath *mp) > > > > if (!VERSION_GE(conf->version, minv_dm_retain)) { > > mp->retain_hwhandler = RETAIN_HWHANDLER_OFF; > > - origin = "(setting: WARNING, requires kernel > > version >= 1.5.0)"; > > + origin = "(setting: WARNING, requires kernel dm- > > mpath version >= 1.5.0)"; > > It would be more informative replace the dm-mpath version with the > kernel version. No one cares about subsystems versions. I disagree. This code should also work for vendor kernels which may e.g. contain patches to update dm-mpath without updating the main kernel (utsname) version. The reason I used get_linux_version_code() for the new check my patch introduced was that unfortunately, the dm-mpath version has not been changed when the "retain_attached_hwhandler" feature was removed in 4.3. The next dm-mpath version change (to 1.10) happened in 4.4. Thus I couldn't use the dm-mpath version and had to fallback to utsname. Thinking about it, the new check should probably be (dm_mpath version >= 1.10 OR kernel verson >= 4.3). IMO that can be handled in an incremental patch. Regards, Martin -- Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH] dm thin: do not queue freed thin mapping for next stage processing
process_prepared_discard_passdown_pt1() should cleanup dm_thin_new_mapping in cases of error. dm_pool_inc_data_range() can fail trying to get a block reference: metadata operation 'dm_pool_inc_data_range' failed: error = -61 When dm_pool_inc_data_range() fails, dm thin aborts current metadata transaction and marks pool as PM_READ_ONLY. Memory for thin mapping is released as well. However, current thin mapping will be queued onto next stage as part of queue_passdown_pt2() or passdown_endio(). This dangling thin mapping memory when processed and accessed in next stage will lead to device mapper crashing. Code flow without fix: -> process_prepared_discard_passdown_pt1(m) -> dm_thin_remove_range() -> discard passdown --> passdown_endio(m) queues m onto next stage -> dm_pool_inc_data_range() fails, frees memory m but does not remove it from next stage queue -> process_prepared_discard_passdown_pt2(m) -> processes freed memory m and crashes One such stack: Call Trace: [] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison] [] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool] [] process_prepared_discard_passdown_pt2+0x4b/0x90 [dm_thin_pool] [] process_prepared+0x81/0xa0 [dm_thin_pool] [] do_worker+0xc5/0x820 [dm_thin_pool] [] ? __schedule+0x244/0x680 [] ? pwq_activate_delayed_work+0x42/0xb0 [] process_one_work+0x153/0x3f0 [] worker_thread+0x12b/0x4b0 [] ? rescuer_thread+0x350/0x350 [] kthread+0xca/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x25/0x30 The fix is to first take the block ref count for discarded block and then do a passdown discard of this block. If block ref count fails, then bail out aborting current metadata transaction, mark pool as PM_READ_ONLY and also free current thin mapping memory (existing error handling code) without queueing this thin mapping onto next stage of processing. If block ref count succeeds, then passdown discard of this block. Discard callback of passdown_endio() will queue this thin mapping onto next stage of processing. Code flow with fix: -> process_prepared_discard_passdown_pt1(m) -> dm_thin_remove_range() -> dm_pool_inc_data_range() --> if fails, free memory m and bail out -> discard passdown --> passdown_endio(m) queues m onto next stage Cc: stable # v4.9+ Reviewed-by: Eduardo Valentin Reviewed-by: Cristian Gafton Reviewed-by: Anchal Agarwal Signed-off-by: Vallish Vaidyeshwara --- drivers/md/dm-thin.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 17ad50d..28808e5 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -1094,6 +1094,19 @@ static void process_prepared_discard_passdown_pt1(struct dm_thin_new_mapping *m) return; } + /* +* Increment the unmapped blocks. This prevents a race between the +* passdown io and reallocation of freed blocks. +*/ + r = dm_pool_inc_data_range(pool->pmd, m->data_block, data_end); + if (r) { + metadata_operation_failed(pool, "dm_pool_inc_data_range", r); + bio_io_error(m->bio); + cell_defer_no_holder(tc, m->cell); + mempool_free(m, pool->mapping_pool); + return; + } + discard_parent = bio_alloc(GFP_NOIO, 1); if (!discard_parent) { DMWARN("%s: unable to allocate top level discard bio for passdown. Skipping passdown.", @@ -1114,19 +1127,6 @@ static void process_prepared_discard_passdown_pt1(struct dm_thin_new_mapping *m) end_discard(&op, r); } } - - /* -* Increment the unmapped blocks. This prevents a race between the -* passdown io and reallocation of freed blocks. -*/ - r = dm_pool_inc_data_range(pool->pmd, m->data_block, data_end); - if (r) { - metadata_operation_failed(pool, "dm_pool_inc_data_range", r); - bio_io_error(m->bio); - cell_defer_no_holder(tc, m->cell); - mempool_free(m, pool->mapping_pool); - return; - } } static void process_prepared_discard_passdown_pt2(struct dm_thin_new_mapping *m) -- 2.7.3.AMZN -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] multipath-tools: beautify path_latency.c code
On Thu, 2017-06-22 at 18:46 +0200, Xose Vazquez Perez wrote: > Mainly running scripts/Lindent, from kernel dir, to replace indent spaces > by tabs. Hello Xose, Why do you think this kind of changes is useful? Are you aware that whitespace-only changes are very annoying to anyone else who is preparing changes on the same file because such changes result in a huge number of annoying rebase conflicts? Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v6 0/2] IV Generation algorithms for dm-crypt
On Fri, Jun 23, 2017 at 04:13:41PM +0800, Herbert Xu wrote: > Binoy Jayan wrote: > > === > > dm-crypt optimization for larger block sizes > > === > > > > Currently, the iv generation algorithms are implemented in dm-crypt.c. The > > goal > > is to move these algorithms from the dm layer to the kernel crypto layer by > > implementing them as template ciphers so they can be used in relation with > > algorithms like aes, and with multiple modes like cbc, ecb etc. As part of > > this > > patchset, the iv-generation code is moved from the dm layer to the crypto > > layer > > and adapt the dm-layer to send a whole 'bio' (as defined in the block layer) > > at a time. Each bio contains the in memory representation of physically > > contiguous disk blocks. Since the bio itself may not be contiguous in main > > memory, the dm layer sets up a chained scatterlist of these blocks split > > into > > physically contiguous segments in memory so that DMA can be performed. > > There is currently a patch-set for fscrypt to add essiv support. It > would be interesting to know whether your implementation of essiv > can also be used in that patchset. That would confirm that we're on > the right track. > You can find the fscrypt patch at https://patchwork.kernel.org/patch/9795327/ Note that it's encrypting 4096-byte blocks, not 512-byte. Also, it's using AES-256 for the ESSIV tfm (since it uses a SHA-256 hash) but AES-128 for the "real" encryption. It's possible this is a mistake and it should be AES-128 for both. (If it is, it needs to be fixed before it's released in 4.13.) Eric -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel