[dm-devel] multipath - unable to use multiple active paths at once, and deprecated example in docs

2019-11-22 Thread Drew Hastings
My use case doesn't lend itself well to multipathd, so I'm trying to
implement multipathing with device mapper directly.

My table is (kernel 4.19.79):

0 1562378240 multipath 4 queue_if_no_path retain_attached_hw_handler
queue_mode bio 0 1 1 queue-length 0 4 1 253:11 1 253:8 1 253:9 1 253:10 1

What I've found with this setup is that, aside from the first path in the
group, none of the other paths receive IO/bios.

The only "real" path is 253:11, the rest of them are to dm error targets.
Still though, you can see the status of this multipath target is:

0 1562378240 multipath 2 0 0 0 1 1 A 0 4 1 253:11 A 0 309 253:8 A 0 0 253:9
A 0 0 253:10 A 0 0

So 253:11 has a queue of 309, while the rest of the devices have a queue of
zero and show an active status, indicating no IO has triggered the
underlying dm error target causing the 2nd, 3rd, and 4th paths to fail.

Before diving much deeper into the relevant kernel code, I figured I'd
check to see if there's any obvious reason this should not work the way I
expect (where individual paths are balanced within the group).

I realize that Documentation/device-mapper/dm-queue-length.txt is also
out-dated (it makes suggestions that are deprecated), but still that
documentation implies this table would balance the load. Here is the table
from those docs.

test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128 ( >1 is
deprecated since those docs)

My only assumption is that the multipath features, in particular queue_mode
bio, prevent this from behaving properly. If that is the case, why can this
not be achieved with bios, as this is not a limitation of raid1 (which will
load balance read IO to device mapper targets). I also believe this is the
only feature option that is viable for me, since this multipath device is
on top of device mapper targets.

The documentation would imply that the queue should be roughly the same to
every device, and that after every IO a new path is chosen for the next IO
based on the lowest entry. The code looks like it does this as described,
but maybe there's some condition preventing it from doing so (while still
counting the queue). Is there anything I can do to get this target to
behave as I assumed it would from
Documentation/device-mapper/dm-queue-length.txt ?

Also, for what it's worth, round-robin behaves the same way as queue-length.

Thank you for your time!
- Drew
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] dm-thin: Several Questions on dm-thin performance.

2019-11-22 Thread Joe Thornber
On Fri, Nov 22, 2019 at 11:14:15AM +0800, JeffleXu wrote:

> The first question is what's the purpose of data cell? In thin_bio_map(),
> normal bio will be packed as a virtual cell and data cell. I can understand
> that virtual cell is used to prevent discard bio and non-discard bio
> targeting the same block from being processed at the same time. I find it
> was added in commit     e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin:
> fix race between simultaneous io and discards to same block), but I'm still
> confused about the use of data cell.

As you are aware there are two address spaces for the locks.  The 'virtual' one
refers to cells in the logical address space of the thin devices, and the 
'data' one
refers to the underlying data device.  There are certain conditions where we 
unfortunately need to hold both of these (eg, to prevent a data block being 
reprovisioned
before an io to it has completed).

> The second question is the impact of virtual cell and data cell on IO
> performance. If $data_block_size is large for example 1G, in multithread fio
> test, most bio will be buffered in cell->bios list and then be processed by
> worker thread asynchronously, even when there's no discard bio. Thus the
> original parallel IO is processed by worker thread serially now. As the
> number of fio test threads increase, the single worker thread can easily get
> CPU 100%, and thus become the bottleneck of the performance since dm-thin
> workqueue is ordered unbound.

Yep, this is a big issue.  Take a look at dm-bio-prison-v2.h, this is the
new interface that we need to move dm-thin across to use (dm-cache already uses 
it).
It allows concurrent holders of a cell (ie, read locks), so we'll be able to 
remap
much more io without handing it off to a worker thread.  Once this is done I 
want
to add an extra field to cells that will cache the mapping, this way if you 
acquire a
cell that is already held then you can avoid the expensive btree lookup.  
Together 
these changes should make a huge difference to the performance.

If you've got some spare coding cycles I'd love some help with this ;)

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [git pull] device mapper changes for 5.5

2019-11-22 Thread Mike Snitzer
Hi Linus,

The following changes since commit a99d8080aaf358d5d23581244e5da23b35e340b9:

  Linux 5.4-rc6 (2019-11-03 14:07:26 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git 
tags/for-5.5/dm-changes

for you to fetch changes up to f612b2132db529feac4f965f28a1b9258ea7c22b:

  Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues" (2019-11-20 
17:27:39 -0500)

Please pull, thanks!
Mike


- Fix DM core to disallow stacking request-based DM on partitions.

- Fix DM raid target to properly resync raidset even if bitmap needed
  additional pages.

- Fix DM crypt performance regression due to use of WQ_HIGHPRI for the
  IO and crypt workqueues.

- Fix DM integrity metadata layout that was aligned on 128K boundary
  rather than the intended 4K boundary (removes 124K of wasted space for
  each metadata block).

- Improve the DM thin, cache and clone targets to use spin_lock_irq
  rather than spin_lock_irqsave where possible.

- Fix DM thin single thread performance that was lost due to needless
  workqueue wakeups.

- Fix DM zoned target performance that was lost due to excessive backing
  device checks.

- Add ability to trigger write failure with the DM dust test target.

- Fix whitespace indentation in drivers/md/Kconfig.

- Various smalls fixes and cleanups (e.g. use struct_size, fix
  uninitialized variable, variable renames, etc).


Bryan Gurney (3):
  dm dust: change result vars to r
  dm dust: change ret to r in dust_map_read and dust_map
  dm dust: add limited write failure mode

Dmitry Fomichev (1):
  dm zoned: reduce overhead of backing device checks

Gustavo A. R. Silva (1):
  dm stripe: use struct_size() in kmalloc()

Heinz Mauelshagen (4):
  dm raid: change rs_set_dev_and_array_sectors API and callers
  dm raid: to ensure resynchronization, perform raid set grow in preresume
  dm raid: simplify rs_setup_recovery call chain
  dm raid: streamline rs_get_progress() and its raid_status() caller side

Jeffle Xu (1):
  dm thin: wakeup worker only when deferred bios exist

Krzysztof Kozlowski (1):
  dm: Fix Kconfig indentation

Maged Mokhtar (1):
  dm writecache: handle REQ_FUA

Mike Snitzer (2):
  dm table: do not allow request-based DM to stack on partitions
  Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues"

Mikulas Patocka (6):
  dm writecache: fix uninitialized variable warning
  dm clone: replace spin_lock_irqsave with spin_lock_irq
  dm thin: replace spin_lock_irqsave with spin_lock_irq
  dm bio prison: replace spin_lock_irqsave with spin_lock_irq
  dm cache: replace spin_lock_irqsave with spin_lock_irq
  dm integrity: fix excessive alignment of metadata runs

Nathan Chancellor (1):
  dm raid: Remove unnecessary negation of a shift in 
raid10_format_to_md_layout

Nikos Tsironis (1):
  dm clone: add bucket_lock_irq/bucket_unlock_irq helpers

 .../admin-guide/device-mapper/dm-integrity.rst |   5 +
 .../admin-guide/device-mapper/dm-raid.rst  |   2 +
 drivers/md/Kconfig |  54 +++
 drivers/md/dm-bio-prison-v1.c  |  27 ++--
 drivers/md/dm-bio-prison-v2.c  |  26 ++--
 drivers/md/dm-cache-target.c   |  77 --
 drivers/md/dm-clone-metadata.c |  29 ++--
 drivers/md/dm-clone-metadata.h |   4 +-
 drivers/md/dm-clone-target.c   |  62 
 drivers/md/dm-crypt.c  |   9 +-
 drivers/md/dm-dust.c   |  97 
 drivers/md/dm-integrity.c  |  28 +++-
 drivers/md/dm-raid.c   | 164 +++--
 drivers/md/dm-stripe.c |  15 +-
 drivers/md/dm-table.c  |  27 +---
 drivers/md/dm-thin.c   | 118 +++
 drivers/md/dm-writecache.c |   5 +-
 drivers/md/dm-zoned-metadata.c |  29 ++--
 drivers/md/dm-zoned-reclaim.c  |   8 +-
 drivers/md/dm-zoned-target.c   |  54 +--
 drivers/md/dm-zoned.h  |   2 +
 include/linux/device-mapper.h  |   3 -
 22 files changed, 433 insertions(+), 412 deletions(-)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] ignore/update integrity checksums

2019-11-22 Thread Erich Eckner

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On Fri, 22 Nov 2019, Erich Eckner wrote:


Hi,

I have multiple disks with LUKS+integrity created by

cryptsetup luksFormat /dev/sde --key-file /mnt/key/key --integrity 
hmac-sha256


which are part of a raid6. Details of the device:

/dev/mapper/leg0 is active.
 type:LUKS2
 cipher:  aes-xts-plain64
 keysize: 768 bits
 key location: keyring
 integrity: hmac(sha256)
 integrity keysize: 256 bits
 device:  /dev/sdb
 sector size:  512
 offset:  0 sectors
 size:11031354576 sectors
 mode:read/write


Recently, I rebooted this box and apparently, I missed to cleanly sync the 
disks, so they now report integrity errors, when mdadm probes (during 
assemble) for the raid superblock:


device-mapper: crypt: dm-1: INTEGRITY AEAD ERROR, sector 11031354368

There was no write activity on the raid before the reboot except for a 
running


mdadm /dev/md0 --replace /dev/dm-0 --with /dev/dm-1

which of course might have written a lot to all superblocks.

Since I believe, the superblocks should be mostly in-sync (except for 
event-counters?): Is there a way to ignore or re-calculate the integrity 
checks?


Also: What is the correct way to assure, that data has been synced to the 
disk(s) before switching off power? (If that matters, there is a 
raid-controller underneath: "06:00.0 RAID bus controller: Hewlett-Packard 
Company Smart Array G6 controllers (rev 01)" - but it
does not actually handle the raid, it only feeds the disks through to the os) 
I can execute any command after closing the luks-integrity device, my 
question aims at: what should I execute?


regards,
Erich


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



just a follow-up experimentation with the broken disks:

I noticed /sys/block/dm-0/integrity/read_verify and similar which should 
manipulate verification upon read and updating upon write, according to


https://github.com/ibuildthecloud/ubuntu-kernel/blob/master/Documentation/block/data-integrity.txt#L169

However, changing /sys/block/dm-0/integrity/read_verify to 0 (it was at 1 
before) does not change the behaviour: `mdadm --examine` still generates 
read errors and cannot find its superblock for the corresponding crypt 
device.


Oh, I just see, that I forgot all the details of my system in the first 
email - sry! - here it comes:


This box is running arch linux with up-to-date packages.

# uname -a
Linux backup 5.3.12-arch1-1 #1 SMP PREEMPT Wed, 20 Nov 2019 19:45:16 + 
x86_64 GNU/Linux


# pacman -Q cryptsetup mdadm
cryptsetup 2.2.2-1
mdadm 4.1-2

regards,
Erich


-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEE3p92iMrPBP64GmxZCu7JB1Xae1oFAl3Xs8gACgkQCu7JB1Xa
e1qykQ//cTcBJ1NGt2R3Cs23AGwdsUzsqyrf3ahPPWryFvN53GbYT6LOrjCqpF8n
DWNFQ/xBqDwpuQh3l3MatEoaIewGMgkjt2gcpG1aVPnZWJo1hgJDhpXW8locBpN+
h/q8bxcD9nOkMApDwu9bQxLIJcqBZISTgE8QmAvKvXFiPSuh22Gbth6NtSki2G3U
95TPvNbdMJCSG32SEQ1F3/4nky20CMkd31sam44pL//Mr9a6sSeKeaUeQ9rX13Oe
58kmkKp3d8ZIp89ruWgca2wEE5SZA3qdewWs5dfkCnLxzIs8tjL9jfrp1wR+yKNb
DiDj2i203IPxxIgcwu+aDPxboCrJDu3zEE1OcQf6ZDd7nGqsdaf2aGp/F/SkqQ9s
d20+DZSi+2NSQ8n782/zXuroVmy1of4eDn2flFSZFOKfuqgtWcQFnN+JZsZT/bIQ
xNKuD8eIY5lvO58Rpkg38JjIMfVuQiGdjfh1CK/ZUeYOXC6YnGgmOaZqdIWMH4ya
JknY7oXkCluGRNKn3wObufRMa3LkAjoRTmHRw4JpLWJgBxXaj7vsc/thrWy+Uw+3
cmwTMQBR3YttEAYsMRp+s6QNvN6ny3FlNAupnrSTiujjSnO7RZKTV5VeBOcAeoL4
4dTi0FTpehWWA6U3UlmUOafyZjGCHuXDVHKAbVUHiHjQJIpSztw=
=9Xhi
-END PGP SIGNATURE-


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] ignore/update integrity checksums

2019-11-22 Thread Erich Eckner

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi,

I have multiple disks with LUKS+integrity created by

cryptsetup luksFormat /dev/sde --key-file /mnt/key/key --integrity hmac-sha256

which are part of a raid6. Details of the device:

/dev/mapper/leg0 is active.
  type:LUKS2
  cipher:  aes-xts-plain64
  keysize: 768 bits
  key location: keyring
  integrity: hmac(sha256)
  integrity keysize: 256 bits
  device:  /dev/sdb
  sector size:  512
  offset:  0 sectors
  size:11031354576 sectors
  mode:read/write


Recently, I rebooted this box and apparently, I missed to cleanly sync the 
disks, so they now report integrity errors, when mdadm probes (during 
assemble) for the raid superblock:


device-mapper: crypt: dm-1: INTEGRITY AEAD ERROR, sector 11031354368

There was no write activity on the raid before the reboot except for a 
running


mdadm /dev/md0 --replace /dev/dm-0 --with /dev/dm-1

which of course might have written a lot to all superblocks.

Since I believe, the superblocks should be mostly in-sync (except for 
event-counters?): Is there a way to ignore or re-calculate the integrity 
checks?


Also: What is the correct way to assure, that data has been synced to the 
disk(s) before switching off power? (If that matters, there is a 
raid-controller underneath: "06:00.0 RAID bus controller: Hewlett-Packard 
Company Smart Array G6 controllers (rev 01)" - but it
does not actually handle the raid, it only feeds the disks through to the 
os) I can execute any command after closing the luks-integrity device, my 
question aims at: what should I execute?


regards,
Erich


-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEE3p92iMrPBP64GmxZCu7JB1Xae1oFAl3XpUQACgkQCu7JB1Xa
e1ry/hAAsWBT+D4UgVEbLNQj8s2cpFP8z4iTfT2vTG9UeEilxjpxrU5tPcRGKANv
qewyTwC5JrH6Mst3AE6P+ToSUWisMOcUXSg663dh4q8lmJjL/IdBrTU6cMqrFJV/
QXx3crf3j6soZ9AXSQgGRJm0wIV24WKg5vgTTDug56TR0ifUK+KJrWsqtforhtYu
hynMYlMHOcYUX4aPRHL6M56K9dSt92/9i5oRcm+jmKKSnTceJHduSD4RPMs93Lb9
JYIef1Qg+0wZqPZr1JD5AR+IU4J4dvysiqP9H3XYNIZaGYo2dGI6q6+6ksuGDxDu
sCeMFNlOdftptC8HlVUm29sERNEXqN+cWZ5X9OdEPKPuY+fQtF4vugy9rRcw+Q41
FHLGs5Aip/jGbXxh0vtOflcpZF4O4j4mu5r0KWEh1YgMIOb+JeWueGamSCdvb0yP
p/ZwfCM0a3sncOw5YRIIS6YXshCl57xzgRv326Z199Y9uROAd0pGvW2B91pKS2QM
W++AaEozRAT5X/5PSSJpw9bx2iprS6YwkygCeFuIswcrMfvbaiwBdXeIVvPZRJjO
lTVMnxWIqt3zkDWaZdwJzQCFycLtlGGXlei4Y3TCBfwRRGBWj6iLRY6t7SlKX9ST
7KHoha1dp2EheKJV6eZXo0mxrF7bt6gcPeVpPZIKKGJG5XkgHMA=
=VedA
-END PGP SIGNATURE-


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH AUTOSEL 4.19 100/219] dm raid: fix false -EBUSY when handling check/repair message

2019-11-22 Thread Sasha Levin
From: Heinz Mauelshagen 

[ Upstream commit 74694bcbdf7e28a5ad548cdda9ac56d30be00d13 ]

Sending a check/repair message infrequently leads to -EBUSY instead of
properly identifying an active resync.  This occurs because
raid_message() is testing recovery bits in a racy way.

Fix by calling decipher_sync_action() from raid_message() to properly
identify the idle state of the RAID device.

Signed-off-by: Heinz Mauelshagen 
Signed-off-by: Mike Snitzer 
Signed-off-by: Sasha Levin 
---
 drivers/md/dm-raid.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index b78a8a4d061ca..416998523d455 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3690,8 +3690,7 @@ static int raid_message(struct dm_target *ti, unsigned 
int argc, char **argv,
set_bit(MD_RECOVERY_INTR, >recovery);
md_reap_sync_thread(mddev);
}
-   } else if (test_bit(MD_RECOVERY_RUNNING, >recovery) ||
-  test_bit(MD_RECOVERY_NEEDED, >recovery))
+   } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
return -EBUSY;
else if (!strcasecmp(argv[0], "resync"))
; /* MD_RECOVERY_NEEDED set below */
-- 
2.20.1


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH AUTOSEL 4.14 046/127] dm flakey: Properly corrupt multi-page bios.

2019-11-22 Thread Sasha Levin
From: Sweet Tea 

[ Upstream commit a00f5276e26636cbf72f24f79831026d2e2868e7 ]

The flakey target is documented to be able to corrupt the Nth byte in
a bio, but does not corrupt byte indices after the first biovec in the
bio. Change the corrupting function to actually corrupt the Nth byte
no matter in which biovec that index falls.

A test device generating two-page bios, atop a flakey device configured
to corrupt a byte index on the second page, verified both the failure
to corrupt before this patch and the expected corruption after this
change.

Signed-off-by: John Dorminy 
Signed-off-by: Mike Snitzer 
Signed-off-by: Sasha Levin 
---
 drivers/md/dm-flakey.c | 33 ++---
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index 0c1ef63c3461b..b1b68e01b889c 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -282,20 +282,31 @@ static void flakey_map_bio(struct dm_target *ti, struct 
bio *bio)
 
 static void corrupt_bio_data(struct bio *bio, struct flakey_c *fc)
 {
-   unsigned bio_bytes = bio_cur_bytes(bio);
-   char *data = bio_data(bio);
+   unsigned int corrupt_bio_byte = fc->corrupt_bio_byte - 1;
+
+   struct bvec_iter iter;
+   struct bio_vec bvec;
+
+   if (!bio_has_data(bio))
+   return;
 
/*
-* Overwrite the Nth byte of the data returned.
+* Overwrite the Nth byte of the bio's data, on whichever page
+* it falls.
 */
-   if (data && bio_bytes >= fc->corrupt_bio_byte) {
-   data[fc->corrupt_bio_byte - 1] = fc->corrupt_bio_value;
-
-   DMDEBUG("Corrupting data bio=%p by writing %u to byte %u "
-   "(rw=%c bi_opf=%u bi_sector=%llu cur_bytes=%u)\n",
-   bio, fc->corrupt_bio_value, fc->corrupt_bio_byte,
-   (bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_opf,
-   (unsigned long long)bio->bi_iter.bi_sector, bio_bytes);
+   bio_for_each_segment(bvec, bio, iter) {
+   if (bio_iter_len(bio, iter) > corrupt_bio_byte) {
+   char *segment = (page_address(bio_iter_page(bio, iter))
++ bio_iter_offset(bio, iter));
+   segment[corrupt_bio_byte] = fc->corrupt_bio_value;
+   DMDEBUG("Corrupting data bio=%p by writing %u to byte 
%u "
+   "(rw=%c bi_opf=%u bi_sector=%llu size=%u)\n",
+   bio, fc->corrupt_bio_value, 
fc->corrupt_bio_byte,
+   (bio_data_dir(bio) == WRITE) ? 'w' : 'r', 
bio->bi_opf,
+   (unsigned long long)bio->bi_iter.bi_sector, 
bio->bi_iter.bi_size);
+   break;
+   }
+   corrupt_bio_byte -= bio_iter_len(bio, iter);
}
 }
 
-- 
2.20.1


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] dm-thin: Several Questions on dm-thin performance.

2019-11-22 Thread JeffleXu

Hi guys,

I have several questions on dm-thin when I'm testing and evaluating IO 
performance of dm-thin. I would be grateful if someone could spend a 
little time on it.



The first question is what's the purpose of data cell? In 
thin_bio_map(), normal bio will be packed as a virtual cell and data 
cell. I can understand that virtual cell is used to prevent discard bio 
and non-discard bio targeting the same block from being processed at the 
same time. I find it was added in commit     
e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin: fix race between 
simultaneous io and discards to same block), but I'm still confused 
about the use of data cell.



The second question is the impact of virtual cell and data cell on IO 
performance. If $data_block_size is large for example 1G, in multithread 
fio test, most bio will be buffered in cell->bios list and then be 
processed by worker thread asynchronously, even when there's no discard 
bio. Thus the original parallel IO is processed by worker thread 
serially now. As the number of fio test threads increase, the single 
worker thread can easily get CPU 100%, and thus become the bottleneck of 
the performance since dm-thin workqueue is ordered unbound.


Using an nvme SSD and fio (direct=1, ioengine=libaio, iodepth=128, 
numjobs=4, rw=read, bs=4k), the bandwidth on bare nvme is 1589MiB/s. The 
bandwidth on thin device is only 1274MiB/s, while the four fio threads 
run at 200% CPU and the single worker thread is always runing at 100% 
CPU. perf of worker thread showes that process_bio() consumes 86% of the 
time.



Regards

Jeffle Xu


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

[dm-devel] [PATCH AUTOSEL 4.19 098/219] dm flakey: Properly corrupt multi-page bios.

2019-11-22 Thread Sasha Levin
From: Sweet Tea 

[ Upstream commit a00f5276e26636cbf72f24f79831026d2e2868e7 ]

The flakey target is documented to be able to corrupt the Nth byte in
a bio, but does not corrupt byte indices after the first biovec in the
bio. Change the corrupting function to actually corrupt the Nth byte
no matter in which biovec that index falls.

A test device generating two-page bios, atop a flakey device configured
to corrupt a byte index on the second page, verified both the failure
to corrupt before this patch and the expected corruption after this
change.

Signed-off-by: John Dorminy 
Signed-off-by: Mike Snitzer 
Signed-off-by: Sasha Levin 
---
 drivers/md/dm-flakey.c | 33 ++---
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index b86d2439ffc76..2fcf62fb2844f 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -287,20 +287,31 @@ static void flakey_map_bio(struct dm_target *ti, struct 
bio *bio)
 
 static void corrupt_bio_data(struct bio *bio, struct flakey_c *fc)
 {
-   unsigned bio_bytes = bio_cur_bytes(bio);
-   char *data = bio_data(bio);
+   unsigned int corrupt_bio_byte = fc->corrupt_bio_byte - 1;
+
+   struct bvec_iter iter;
+   struct bio_vec bvec;
+
+   if (!bio_has_data(bio))
+   return;
 
/*
-* Overwrite the Nth byte of the data returned.
+* Overwrite the Nth byte of the bio's data, on whichever page
+* it falls.
 */
-   if (data && bio_bytes >= fc->corrupt_bio_byte) {
-   data[fc->corrupt_bio_byte - 1] = fc->corrupt_bio_value;
-
-   DMDEBUG("Corrupting data bio=%p by writing %u to byte %u "
-   "(rw=%c bi_opf=%u bi_sector=%llu cur_bytes=%u)\n",
-   bio, fc->corrupt_bio_value, fc->corrupt_bio_byte,
-   (bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_opf,
-   (unsigned long long)bio->bi_iter.bi_sector, bio_bytes);
+   bio_for_each_segment(bvec, bio, iter) {
+   if (bio_iter_len(bio, iter) > corrupt_bio_byte) {
+   char *segment = (page_address(bio_iter_page(bio, iter))
++ bio_iter_offset(bio, iter));
+   segment[corrupt_bio_byte] = fc->corrupt_bio_value;
+   DMDEBUG("Corrupting data bio=%p by writing %u to byte 
%u "
+   "(rw=%c bi_opf=%u bi_sector=%llu size=%u)\n",
+   bio, fc->corrupt_bio_value, 
fc->corrupt_bio_byte,
+   (bio_data_dir(bio) == WRITE) ? 'w' : 'r', 
bio->bi_opf,
+   (unsigned long long)bio->bi_iter.bi_sector, 
bio->bi_iter.bi_size);
+   break;
+   }
+   corrupt_bio_byte -= bio_iter_len(bio, iter);
}
 }
 
-- 
2.20.1


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH AUTOSEL 4.9 32/91] dm flakey: Properly corrupt multi-page bios.

2019-11-22 Thread Sasha Levin
From: Sweet Tea 

[ Upstream commit a00f5276e26636cbf72f24f79831026d2e2868e7 ]

The flakey target is documented to be able to corrupt the Nth byte in
a bio, but does not corrupt byte indices after the first biovec in the
bio. Change the corrupting function to actually corrupt the Nth byte
no matter in which biovec that index falls.

A test device generating two-page bios, atop a flakey device configured
to corrupt a byte index on the second page, verified both the failure
to corrupt before this patch and the expected corruption after this
change.

Signed-off-by: John Dorminy 
Signed-off-by: Mike Snitzer 
Signed-off-by: Sasha Levin 
---
 drivers/md/dm-flakey.c | 33 ++---
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index 3643cba713518..742c1fa870dae 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -258,20 +258,31 @@ static void flakey_map_bio(struct dm_target *ti, struct 
bio *bio)
 
 static void corrupt_bio_data(struct bio *bio, struct flakey_c *fc)
 {
-   unsigned bio_bytes = bio_cur_bytes(bio);
-   char *data = bio_data(bio);
+   unsigned int corrupt_bio_byte = fc->corrupt_bio_byte - 1;
+
+   struct bvec_iter iter;
+   struct bio_vec bvec;
+
+   if (!bio_has_data(bio))
+   return;
 
/*
-* Overwrite the Nth byte of the data returned.
+* Overwrite the Nth byte of the bio's data, on whichever page
+* it falls.
 */
-   if (data && bio_bytes >= fc->corrupt_bio_byte) {
-   data[fc->corrupt_bio_byte - 1] = fc->corrupt_bio_value;
-
-   DMDEBUG("Corrupting data bio=%p by writing %u to byte %u "
-   "(rw=%c bi_opf=%u bi_sector=%llu cur_bytes=%u)\n",
-   bio, fc->corrupt_bio_value, fc->corrupt_bio_byte,
-   (bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_opf,
-   (unsigned long long)bio->bi_iter.bi_sector, bio_bytes);
+   bio_for_each_segment(bvec, bio, iter) {
+   if (bio_iter_len(bio, iter) > corrupt_bio_byte) {
+   char *segment = (page_address(bio_iter_page(bio, iter))
++ bio_iter_offset(bio, iter));
+   segment[corrupt_bio_byte] = fc->corrupt_bio_value;
+   DMDEBUG("Corrupting data bio=%p by writing %u to byte 
%u "
+   "(rw=%c bi_opf=%u bi_sector=%llu size=%u)\n",
+   bio, fc->corrupt_bio_value, 
fc->corrupt_bio_byte,
+   (bio_data_dir(bio) == WRITE) ? 'w' : 'r', 
bio->bi_opf,
+   (unsigned long long)bio->bi_iter.bi_sector, 
bio->bi_iter.bi_size);
+   break;
+   }
+   corrupt_bio_byte -= bio_iter_len(bio, iter);
}
 }
 
-- 
2.20.1


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] dm-thin: Several Questions on dm-thin performance.

2019-11-22 Thread JeffleXu

Hi guys,

I have several questions on dm-thin when I'm testing and evaluating IO 
performance of dm-thin. I would be grateful if someone could spend a 
little time on it.



The first question is what's the purpose of data cell? In 
thin_bio_map(), normal bio will be packed as a virtual cell and data 
cell. I can understand that virtual cell is used to prevent discard bio 
and non-discard bio targeting the same block from being processed at the 
same time. I find it was added in commit     
e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin: fix race between 
simultaneous io and discards to same block), but I'm still confused 
about the use of data cell.



The second question is the impact of virtual cell and data cell on IO 
performance. If $data_block_size is large for example 1G, in multithread 
fio test, most bio will be buffered in cell->bios list and then be 
processed by worker thread asynchronously, even when there's no discard 
bio. Thus the original parallel IO is processed by worker thread 
serially now. As the number of fio test threads increase, the single 
worker thread can easily get CPU 100%, and thus become the bottleneck of 
the performance since dm-thin workqueue is ordered unbound.


Using an nvme SSD and fio (direct=1, ioengine=libaio, iodepth=128, 
numjobs=4, rw=read, bs=4k), the bandwidth on bare nvme is 1589MiB/s. The 
bandwidth on thin device is only 1274MiB/s, while the four fio threads 
run at 200% CPU and the single worker thread is always runing at 100% 
CPU. perf of worker thread showes that process_bio() consumes 86% of the 
time.



Besides it seems that I can't send email to dm-devel@redhat.com mailing 
list.



Regards

Jeffle Xu


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel