[dm-devel] multipath - unable to use multiple active paths at once, and deprecated example in docs
My use case doesn't lend itself well to multipathd, so I'm trying to implement multipathing with device mapper directly. My table is (kernel 4.19.79): 0 1562378240 multipath 4 queue_if_no_path retain_attached_hw_handler queue_mode bio 0 1 1 queue-length 0 4 1 253:11 1 253:8 1 253:9 1 253:10 1 What I've found with this setup is that, aside from the first path in the group, none of the other paths receive IO/bios. The only "real" path is 253:11, the rest of them are to dm error targets. Still though, you can see the status of this multipath target is: 0 1562378240 multipath 2 0 0 0 1 1 A 0 4 1 253:11 A 0 309 253:8 A 0 0 253:9 A 0 0 253:10 A 0 0 So 253:11 has a queue of 309, while the rest of the devices have a queue of zero and show an active status, indicating no IO has triggered the underlying dm error target causing the 2nd, 3rd, and 4th paths to fail. Before diving much deeper into the relevant kernel code, I figured I'd check to see if there's any obvious reason this should not work the way I expect (where individual paths are balanced within the group). I realize that Documentation/device-mapper/dm-queue-length.txt is also out-dated (it makes suggestions that are deprecated), but still that documentation implies this table would balance the load. Here is the table from those docs. test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128 ( >1 is deprecated since those docs) My only assumption is that the multipath features, in particular queue_mode bio, prevent this from behaving properly. If that is the case, why can this not be achieved with bios, as this is not a limitation of raid1 (which will load balance read IO to device mapper targets). I also believe this is the only feature option that is viable for me, since this multipath device is on top of device mapper targets. The documentation would imply that the queue should be roughly the same to every device, and that after every IO a new path is chosen for the next IO based on the lowest entry. The code looks like it does this as described, but maybe there's some condition preventing it from doing so (while still counting the queue). Is there anything I can do to get this target to behave as I assumed it would from Documentation/device-mapper/dm-queue-length.txt ? Also, for what it's worth, round-robin behaves the same way as queue-length. Thank you for your time! - Drew -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm-thin: Several Questions on dm-thin performance.
On Fri, Nov 22, 2019 at 11:14:15AM +0800, JeffleXu wrote: > The first question is what's the purpose of data cell? In thin_bio_map(), > normal bio will be packed as a virtual cell and data cell. I can understand > that virtual cell is used to prevent discard bio and non-discard bio > targeting the same block from being processed at the same time. I find it > was added in commit e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin: > fix race between simultaneous io and discards to same block), but I'm still > confused about the use of data cell. As you are aware there are two address spaces for the locks. The 'virtual' one refers to cells in the logical address space of the thin devices, and the 'data' one refers to the underlying data device. There are certain conditions where we unfortunately need to hold both of these (eg, to prevent a data block being reprovisioned before an io to it has completed). > The second question is the impact of virtual cell and data cell on IO > performance. If $data_block_size is large for example 1G, in multithread fio > test, most bio will be buffered in cell->bios list and then be processed by > worker thread asynchronously, even when there's no discard bio. Thus the > original parallel IO is processed by worker thread serially now. As the > number of fio test threads increase, the single worker thread can easily get > CPU 100%, and thus become the bottleneck of the performance since dm-thin > workqueue is ordered unbound. Yep, this is a big issue. Take a look at dm-bio-prison-v2.h, this is the new interface that we need to move dm-thin across to use (dm-cache already uses it). It allows concurrent holders of a cell (ie, read locks), so we'll be able to remap much more io without handing it off to a worker thread. Once this is done I want to add an extra field to cells that will cache the mapping, this way if you acquire a cell that is already held then you can avoid the expensive btree lookup. Together these changes should make a huge difference to the performance. If you've got some spare coding cycles I'd love some help with this ;) - Joe -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [git pull] device mapper changes for 5.5
Hi Linus, The following changes since commit a99d8080aaf358d5d23581244e5da23b35e340b9: Linux 5.4-rc6 (2019-11-03 14:07:26 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-5.5/dm-changes for you to fetch changes up to f612b2132db529feac4f965f28a1b9258ea7c22b: Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues" (2019-11-20 17:27:39 -0500) Please pull, thanks! Mike - Fix DM core to disallow stacking request-based DM on partitions. - Fix DM raid target to properly resync raidset even if bitmap needed additional pages. - Fix DM crypt performance regression due to use of WQ_HIGHPRI for the IO and crypt workqueues. - Fix DM integrity metadata layout that was aligned on 128K boundary rather than the intended 4K boundary (removes 124K of wasted space for each metadata block). - Improve the DM thin, cache and clone targets to use spin_lock_irq rather than spin_lock_irqsave where possible. - Fix DM thin single thread performance that was lost due to needless workqueue wakeups. - Fix DM zoned target performance that was lost due to excessive backing device checks. - Add ability to trigger write failure with the DM dust test target. - Fix whitespace indentation in drivers/md/Kconfig. - Various smalls fixes and cleanups (e.g. use struct_size, fix uninitialized variable, variable renames, etc). Bryan Gurney (3): dm dust: change result vars to r dm dust: change ret to r in dust_map_read and dust_map dm dust: add limited write failure mode Dmitry Fomichev (1): dm zoned: reduce overhead of backing device checks Gustavo A. R. Silva (1): dm stripe: use struct_size() in kmalloc() Heinz Mauelshagen (4): dm raid: change rs_set_dev_and_array_sectors API and callers dm raid: to ensure resynchronization, perform raid set grow in preresume dm raid: simplify rs_setup_recovery call chain dm raid: streamline rs_get_progress() and its raid_status() caller side Jeffle Xu (1): dm thin: wakeup worker only when deferred bios exist Krzysztof Kozlowski (1): dm: Fix Kconfig indentation Maged Mokhtar (1): dm writecache: handle REQ_FUA Mike Snitzer (2): dm table: do not allow request-based DM to stack on partitions Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues" Mikulas Patocka (6): dm writecache: fix uninitialized variable warning dm clone: replace spin_lock_irqsave with spin_lock_irq dm thin: replace spin_lock_irqsave with spin_lock_irq dm bio prison: replace spin_lock_irqsave with spin_lock_irq dm cache: replace spin_lock_irqsave with spin_lock_irq dm integrity: fix excessive alignment of metadata runs Nathan Chancellor (1): dm raid: Remove unnecessary negation of a shift in raid10_format_to_md_layout Nikos Tsironis (1): dm clone: add bucket_lock_irq/bucket_unlock_irq helpers .../admin-guide/device-mapper/dm-integrity.rst | 5 + .../admin-guide/device-mapper/dm-raid.rst | 2 + drivers/md/Kconfig | 54 +++ drivers/md/dm-bio-prison-v1.c | 27 ++-- drivers/md/dm-bio-prison-v2.c | 26 ++-- drivers/md/dm-cache-target.c | 77 -- drivers/md/dm-clone-metadata.c | 29 ++-- drivers/md/dm-clone-metadata.h | 4 +- drivers/md/dm-clone-target.c | 62 drivers/md/dm-crypt.c | 9 +- drivers/md/dm-dust.c | 97 drivers/md/dm-integrity.c | 28 +++- drivers/md/dm-raid.c | 164 +++-- drivers/md/dm-stripe.c | 15 +- drivers/md/dm-table.c | 27 +--- drivers/md/dm-thin.c | 118 +++ drivers/md/dm-writecache.c | 5 +- drivers/md/dm-zoned-metadata.c | 29 ++-- drivers/md/dm-zoned-reclaim.c | 8 +- drivers/md/dm-zoned-target.c | 54 +-- drivers/md/dm-zoned.h | 2 + include/linux/device-mapper.h | 3 - 22 files changed, 433 insertions(+), 412 deletions(-) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] ignore/update integrity checksums
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On Fri, 22 Nov 2019, Erich Eckner wrote: Hi, I have multiple disks with LUKS+integrity created by cryptsetup luksFormat /dev/sde --key-file /mnt/key/key --integrity hmac-sha256 which are part of a raid6. Details of the device: /dev/mapper/leg0 is active. type:LUKS2 cipher: aes-xts-plain64 keysize: 768 bits key location: keyring integrity: hmac(sha256) integrity keysize: 256 bits device: /dev/sdb sector size: 512 offset: 0 sectors size:11031354576 sectors mode:read/write Recently, I rebooted this box and apparently, I missed to cleanly sync the disks, so they now report integrity errors, when mdadm probes (during assemble) for the raid superblock: device-mapper: crypt: dm-1: INTEGRITY AEAD ERROR, sector 11031354368 There was no write activity on the raid before the reboot except for a running mdadm /dev/md0 --replace /dev/dm-0 --with /dev/dm-1 which of course might have written a lot to all superblocks. Since I believe, the superblocks should be mostly in-sync (except for event-counters?): Is there a way to ignore or re-calculate the integrity checks? Also: What is the correct way to assure, that data has been synced to the disk(s) before switching off power? (If that matters, there is a raid-controller underneath: "06:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6 controllers (rev 01)" - but it does not actually handle the raid, it only feeds the disks through to the os) I can execute any command after closing the luks-integrity device, my question aims at: what should I execute? regards, Erich -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel just a follow-up experimentation with the broken disks: I noticed /sys/block/dm-0/integrity/read_verify and similar which should manipulate verification upon read and updating upon write, according to https://github.com/ibuildthecloud/ubuntu-kernel/blob/master/Documentation/block/data-integrity.txt#L169 However, changing /sys/block/dm-0/integrity/read_verify to 0 (it was at 1 before) does not change the behaviour: `mdadm --examine` still generates read errors and cannot find its superblock for the corresponding crypt device. Oh, I just see, that I forgot all the details of my system in the first email - sry! - here it comes: This box is running arch linux with up-to-date packages. # uname -a Linux backup 5.3.12-arch1-1 #1 SMP PREEMPT Wed, 20 Nov 2019 19:45:16 + x86_64 GNU/Linux # pacman -Q cryptsetup mdadm cryptsetup 2.2.2-1 mdadm 4.1-2 regards, Erich -BEGIN PGP SIGNATURE- iQIzBAEBCAAdFiEE3p92iMrPBP64GmxZCu7JB1Xae1oFAl3Xs8gACgkQCu7JB1Xa e1qykQ//cTcBJ1NGt2R3Cs23AGwdsUzsqyrf3ahPPWryFvN53GbYT6LOrjCqpF8n DWNFQ/xBqDwpuQh3l3MatEoaIewGMgkjt2gcpG1aVPnZWJo1hgJDhpXW8locBpN+ h/q8bxcD9nOkMApDwu9bQxLIJcqBZISTgE8QmAvKvXFiPSuh22Gbth6NtSki2G3U 95TPvNbdMJCSG32SEQ1F3/4nky20CMkd31sam44pL//Mr9a6sSeKeaUeQ9rX13Oe 58kmkKp3d8ZIp89ruWgca2wEE5SZA3qdewWs5dfkCnLxzIs8tjL9jfrp1wR+yKNb DiDj2i203IPxxIgcwu+aDPxboCrJDu3zEE1OcQf6ZDd7nGqsdaf2aGp/F/SkqQ9s d20+DZSi+2NSQ8n782/zXuroVmy1of4eDn2flFSZFOKfuqgtWcQFnN+JZsZT/bIQ xNKuD8eIY5lvO58Rpkg38JjIMfVuQiGdjfh1CK/ZUeYOXC6YnGgmOaZqdIWMH4ya JknY7oXkCluGRNKn3wObufRMa3LkAjoRTmHRw4JpLWJgBxXaj7vsc/thrWy+Uw+3 cmwTMQBR3YttEAYsMRp+s6QNvN6ny3FlNAupnrSTiujjSnO7RZKTV5VeBOcAeoL4 4dTi0FTpehWWA6U3UlmUOafyZjGCHuXDVHKAbVUHiHjQJIpSztw= =9Xhi -END PGP SIGNATURE- -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] ignore/update integrity checksums
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi, I have multiple disks with LUKS+integrity created by cryptsetup luksFormat /dev/sde --key-file /mnt/key/key --integrity hmac-sha256 which are part of a raid6. Details of the device: /dev/mapper/leg0 is active. type:LUKS2 cipher: aes-xts-plain64 keysize: 768 bits key location: keyring integrity: hmac(sha256) integrity keysize: 256 bits device: /dev/sdb sector size: 512 offset: 0 sectors size:11031354576 sectors mode:read/write Recently, I rebooted this box and apparently, I missed to cleanly sync the disks, so they now report integrity errors, when mdadm probes (during assemble) for the raid superblock: device-mapper: crypt: dm-1: INTEGRITY AEAD ERROR, sector 11031354368 There was no write activity on the raid before the reboot except for a running mdadm /dev/md0 --replace /dev/dm-0 --with /dev/dm-1 which of course might have written a lot to all superblocks. Since I believe, the superblocks should be mostly in-sync (except for event-counters?): Is there a way to ignore or re-calculate the integrity checks? Also: What is the correct way to assure, that data has been synced to the disk(s) before switching off power? (If that matters, there is a raid-controller underneath: "06:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6 controllers (rev 01)" - but it does not actually handle the raid, it only feeds the disks through to the os) I can execute any command after closing the luks-integrity device, my question aims at: what should I execute? regards, Erich -BEGIN PGP SIGNATURE- iQIzBAEBCAAdFiEE3p92iMrPBP64GmxZCu7JB1Xae1oFAl3XpUQACgkQCu7JB1Xa e1ry/hAAsWBT+D4UgVEbLNQj8s2cpFP8z4iTfT2vTG9UeEilxjpxrU5tPcRGKANv qewyTwC5JrH6Mst3AE6P+ToSUWisMOcUXSg663dh4q8lmJjL/IdBrTU6cMqrFJV/ QXx3crf3j6soZ9AXSQgGRJm0wIV24WKg5vgTTDug56TR0ifUK+KJrWsqtforhtYu hynMYlMHOcYUX4aPRHL6M56K9dSt92/9i5oRcm+jmKKSnTceJHduSD4RPMs93Lb9 JYIef1Qg+0wZqPZr1JD5AR+IU4J4dvysiqP9H3XYNIZaGYo2dGI6q6+6ksuGDxDu sCeMFNlOdftptC8HlVUm29sERNEXqN+cWZ5X9OdEPKPuY+fQtF4vugy9rRcw+Q41 FHLGs5Aip/jGbXxh0vtOflcpZF4O4j4mu5r0KWEh1YgMIOb+JeWueGamSCdvb0yP p/ZwfCM0a3sncOw5YRIIS6YXshCl57xzgRv326Z199Y9uROAd0pGvW2B91pKS2QM W++AaEozRAT5X/5PSSJpw9bx2iprS6YwkygCeFuIswcrMfvbaiwBdXeIVvPZRJjO lTVMnxWIqt3zkDWaZdwJzQCFycLtlGGXlei4Y3TCBfwRRGBWj6iLRY6t7SlKX9ST 7KHoha1dp2EheKJV6eZXo0mxrF7bt6gcPeVpPZIKKGJG5XkgHMA= =VedA -END PGP SIGNATURE- -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH AUTOSEL 4.19 100/219] dm raid: fix false -EBUSY when handling check/repair message
From: Heinz Mauelshagen [ Upstream commit 74694bcbdf7e28a5ad548cdda9ac56d30be00d13 ] Sending a check/repair message infrequently leads to -EBUSY instead of properly identifying an active resync. This occurs because raid_message() is testing recovery bits in a racy way. Fix by calling decipher_sync_action() from raid_message() to properly identify the idle state of the RAID device. Signed-off-by: Heinz Mauelshagen Signed-off-by: Mike Snitzer Signed-off-by: Sasha Levin --- drivers/md/dm-raid.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index b78a8a4d061ca..416998523d455 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3690,8 +3690,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, set_bit(MD_RECOVERY_INTR, >recovery); md_reap_sync_thread(mddev); } - } else if (test_bit(MD_RECOVERY_RUNNING, >recovery) || - test_bit(MD_RECOVERY_NEEDED, >recovery)) + } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle) return -EBUSY; else if (!strcasecmp(argv[0], "resync")) ; /* MD_RECOVERY_NEEDED set below */ -- 2.20.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH AUTOSEL 4.14 046/127] dm flakey: Properly corrupt multi-page bios.
From: Sweet Tea [ Upstream commit a00f5276e26636cbf72f24f79831026d2e2868e7 ] The flakey target is documented to be able to corrupt the Nth byte in a bio, but does not corrupt byte indices after the first biovec in the bio. Change the corrupting function to actually corrupt the Nth byte no matter in which biovec that index falls. A test device generating two-page bios, atop a flakey device configured to corrupt a byte index on the second page, verified both the failure to corrupt before this patch and the expected corruption after this change. Signed-off-by: John Dorminy Signed-off-by: Mike Snitzer Signed-off-by: Sasha Levin --- drivers/md/dm-flakey.c | 33 ++--- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index 0c1ef63c3461b..b1b68e01b889c 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -282,20 +282,31 @@ static void flakey_map_bio(struct dm_target *ti, struct bio *bio) static void corrupt_bio_data(struct bio *bio, struct flakey_c *fc) { - unsigned bio_bytes = bio_cur_bytes(bio); - char *data = bio_data(bio); + unsigned int corrupt_bio_byte = fc->corrupt_bio_byte - 1; + + struct bvec_iter iter; + struct bio_vec bvec; + + if (!bio_has_data(bio)) + return; /* -* Overwrite the Nth byte of the data returned. +* Overwrite the Nth byte of the bio's data, on whichever page +* it falls. */ - if (data && bio_bytes >= fc->corrupt_bio_byte) { - data[fc->corrupt_bio_byte - 1] = fc->corrupt_bio_value; - - DMDEBUG("Corrupting data bio=%p by writing %u to byte %u " - "(rw=%c bi_opf=%u bi_sector=%llu cur_bytes=%u)\n", - bio, fc->corrupt_bio_value, fc->corrupt_bio_byte, - (bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_opf, - (unsigned long long)bio->bi_iter.bi_sector, bio_bytes); + bio_for_each_segment(bvec, bio, iter) { + if (bio_iter_len(bio, iter) > corrupt_bio_byte) { + char *segment = (page_address(bio_iter_page(bio, iter)) ++ bio_iter_offset(bio, iter)); + segment[corrupt_bio_byte] = fc->corrupt_bio_value; + DMDEBUG("Corrupting data bio=%p by writing %u to byte %u " + "(rw=%c bi_opf=%u bi_sector=%llu size=%u)\n", + bio, fc->corrupt_bio_value, fc->corrupt_bio_byte, + (bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_opf, + (unsigned long long)bio->bi_iter.bi_sector, bio->bi_iter.bi_size); + break; + } + corrupt_bio_byte -= bio_iter_len(bio, iter); } } -- 2.20.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] dm-thin: Several Questions on dm-thin performance.
Hi guys, I have several questions on dm-thin when I'm testing and evaluating IO performance of dm-thin. I would be grateful if someone could spend a little time on it. The first question is what's the purpose of data cell? In thin_bio_map(), normal bio will be packed as a virtual cell and data cell. I can understand that virtual cell is used to prevent discard bio and non-discard bio targeting the same block from being processed at the same time. I find it was added in commit e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin: fix race between simultaneous io and discards to same block), but I'm still confused about the use of data cell. The second question is the impact of virtual cell and data cell on IO performance. If $data_block_size is large for example 1G, in multithread fio test, most bio will be buffered in cell->bios list and then be processed by worker thread asynchronously, even when there's no discard bio. Thus the original parallel IO is processed by worker thread serially now. As the number of fio test threads increase, the single worker thread can easily get CPU 100%, and thus become the bottleneck of the performance since dm-thin workqueue is ordered unbound. Using an nvme SSD and fio (direct=1, ioengine=libaio, iodepth=128, numjobs=4, rw=read, bs=4k), the bandwidth on bare nvme is 1589MiB/s. The bandwidth on thin device is only 1274MiB/s, while the four fio threads run at 200% CPU and the single worker thread is always runing at 100% CPU. perf of worker thread showes that process_bio() consumes 86% of the time. Regards Jeffle Xu -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH AUTOSEL 4.19 098/219] dm flakey: Properly corrupt multi-page bios.
From: Sweet Tea [ Upstream commit a00f5276e26636cbf72f24f79831026d2e2868e7 ] The flakey target is documented to be able to corrupt the Nth byte in a bio, but does not corrupt byte indices after the first biovec in the bio. Change the corrupting function to actually corrupt the Nth byte no matter in which biovec that index falls. A test device generating two-page bios, atop a flakey device configured to corrupt a byte index on the second page, verified both the failure to corrupt before this patch and the expected corruption after this change. Signed-off-by: John Dorminy Signed-off-by: Mike Snitzer Signed-off-by: Sasha Levin --- drivers/md/dm-flakey.c | 33 ++--- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index b86d2439ffc76..2fcf62fb2844f 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -287,20 +287,31 @@ static void flakey_map_bio(struct dm_target *ti, struct bio *bio) static void corrupt_bio_data(struct bio *bio, struct flakey_c *fc) { - unsigned bio_bytes = bio_cur_bytes(bio); - char *data = bio_data(bio); + unsigned int corrupt_bio_byte = fc->corrupt_bio_byte - 1; + + struct bvec_iter iter; + struct bio_vec bvec; + + if (!bio_has_data(bio)) + return; /* -* Overwrite the Nth byte of the data returned. +* Overwrite the Nth byte of the bio's data, on whichever page +* it falls. */ - if (data && bio_bytes >= fc->corrupt_bio_byte) { - data[fc->corrupt_bio_byte - 1] = fc->corrupt_bio_value; - - DMDEBUG("Corrupting data bio=%p by writing %u to byte %u " - "(rw=%c bi_opf=%u bi_sector=%llu cur_bytes=%u)\n", - bio, fc->corrupt_bio_value, fc->corrupt_bio_byte, - (bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_opf, - (unsigned long long)bio->bi_iter.bi_sector, bio_bytes); + bio_for_each_segment(bvec, bio, iter) { + if (bio_iter_len(bio, iter) > corrupt_bio_byte) { + char *segment = (page_address(bio_iter_page(bio, iter)) ++ bio_iter_offset(bio, iter)); + segment[corrupt_bio_byte] = fc->corrupt_bio_value; + DMDEBUG("Corrupting data bio=%p by writing %u to byte %u " + "(rw=%c bi_opf=%u bi_sector=%llu size=%u)\n", + bio, fc->corrupt_bio_value, fc->corrupt_bio_byte, + (bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_opf, + (unsigned long long)bio->bi_iter.bi_sector, bio->bi_iter.bi_size); + break; + } + corrupt_bio_byte -= bio_iter_len(bio, iter); } } -- 2.20.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH AUTOSEL 4.9 32/91] dm flakey: Properly corrupt multi-page bios.
From: Sweet Tea [ Upstream commit a00f5276e26636cbf72f24f79831026d2e2868e7 ] The flakey target is documented to be able to corrupt the Nth byte in a bio, but does not corrupt byte indices after the first biovec in the bio. Change the corrupting function to actually corrupt the Nth byte no matter in which biovec that index falls. A test device generating two-page bios, atop a flakey device configured to corrupt a byte index on the second page, verified both the failure to corrupt before this patch and the expected corruption after this change. Signed-off-by: John Dorminy Signed-off-by: Mike Snitzer Signed-off-by: Sasha Levin --- drivers/md/dm-flakey.c | 33 ++--- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index 3643cba713518..742c1fa870dae 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -258,20 +258,31 @@ static void flakey_map_bio(struct dm_target *ti, struct bio *bio) static void corrupt_bio_data(struct bio *bio, struct flakey_c *fc) { - unsigned bio_bytes = bio_cur_bytes(bio); - char *data = bio_data(bio); + unsigned int corrupt_bio_byte = fc->corrupt_bio_byte - 1; + + struct bvec_iter iter; + struct bio_vec bvec; + + if (!bio_has_data(bio)) + return; /* -* Overwrite the Nth byte of the data returned. +* Overwrite the Nth byte of the bio's data, on whichever page +* it falls. */ - if (data && bio_bytes >= fc->corrupt_bio_byte) { - data[fc->corrupt_bio_byte - 1] = fc->corrupt_bio_value; - - DMDEBUG("Corrupting data bio=%p by writing %u to byte %u " - "(rw=%c bi_opf=%u bi_sector=%llu cur_bytes=%u)\n", - bio, fc->corrupt_bio_value, fc->corrupt_bio_byte, - (bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_opf, - (unsigned long long)bio->bi_iter.bi_sector, bio_bytes); + bio_for_each_segment(bvec, bio, iter) { + if (bio_iter_len(bio, iter) > corrupt_bio_byte) { + char *segment = (page_address(bio_iter_page(bio, iter)) ++ bio_iter_offset(bio, iter)); + segment[corrupt_bio_byte] = fc->corrupt_bio_value; + DMDEBUG("Corrupting data bio=%p by writing %u to byte %u " + "(rw=%c bi_opf=%u bi_sector=%llu size=%u)\n", + bio, fc->corrupt_bio_value, fc->corrupt_bio_byte, + (bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_opf, + (unsigned long long)bio->bi_iter.bi_sector, bio->bi_iter.bi_size); + break; + } + corrupt_bio_byte -= bio_iter_len(bio, iter); } } -- 2.20.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] dm-thin: Several Questions on dm-thin performance.
Hi guys, I have several questions on dm-thin when I'm testing and evaluating IO performance of dm-thin. I would be grateful if someone could spend a little time on it. The first question is what's the purpose of data cell? In thin_bio_map(), normal bio will be packed as a virtual cell and data cell. I can understand that virtual cell is used to prevent discard bio and non-discard bio targeting the same block from being processed at the same time. I find it was added in commit e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin: fix race between simultaneous io and discards to same block), but I'm still confused about the use of data cell. The second question is the impact of virtual cell and data cell on IO performance. If $data_block_size is large for example 1G, in multithread fio test, most bio will be buffered in cell->bios list and then be processed by worker thread asynchronously, even when there's no discard bio. Thus the original parallel IO is processed by worker thread serially now. As the number of fio test threads increase, the single worker thread can easily get CPU 100%, and thus become the bottleneck of the performance since dm-thin workqueue is ordered unbound. Using an nvme SSD and fio (direct=1, ioengine=libaio, iodepth=128, numjobs=4, rw=read, bs=4k), the bandwidth on bare nvme is 1589MiB/s. The bandwidth on thin device is only 1274MiB/s, while the four fio threads run at 200% CPU and the single worker thread is always runing at 100% CPU. perf of worker thread showes that process_bio() consumes 86% of the time. Besides it seems that I can't send email to dm-devel@redhat.com mailing list. Regards Jeffle Xu -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel