Re: [Ocfs2-devel] [PATCH 26/27] block: uncouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD

2022-04-06 Thread Martin K. Petersen


Christoph,

> Secure erase is a very different operation from discard in that it is
> a data integrity operation vs hint.  Fully split the limits and helper
> infrastructure to make the separation more clear.

Great!

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 23/27] block: add a bdev_max_discard_sectors helper

2022-04-06 Thread Martin K. Petersen


Christoph,

> Add a helper to query the number of sectors support per each discard
> bio based on the block device and use this helper to stop various
> places from poking into the request_queue to see if discard is
> supported and if so how much.  This mirrors what is done e.g. for
> write zeroes as well.

Nicer!

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 25/27] block: remove QUEUE_FLAG_DISCARD

2022-04-06 Thread Martin K. Petersen


Christoph,

> Just use a non-zero max_discard_sectors as an indicator for discard
> support, similar to what is done for write zeroes.

Very happy to finally see this flag removed!

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 24/27] block: add a bdev_discard_granularity helper

2022-04-06 Thread Martin K. Petersen


Christoph,

> Abstract away implementation details from file systems by providing a
> block_device based helper to retreive the discard granularity.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 22/27] block: refactor discard bio size limiting

2022-04-06 Thread Martin K. Petersen


Christoph,

> Move all the logic to limit the discard bio size into a common helper
> so that it is better documented.

Looks OK.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 21/27] block: move {bdev, queue_limit}_discard_alignment out of line

2022-04-06 Thread Martin K. Petersen


Christoph,

> No need to inline these fairly larger helpers.  Also fix the return
> value to be unsigned, just like the field in struct queue_limits.

I believe the original reason for the signed int here was to be able to
express -1 for sysfs. I am not sure why I didn't just use the misaligned
flag.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 20/27] block: use bdev_discard_alignment in part_discard_alignment_show

2022-04-06 Thread Martin K. Petersen


Christoph,

> Use the bdev based alignment helper instead of open coding it.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 19/27] block: remove queue_discard_alignment

2022-04-06 Thread Martin K. Petersen


Christoph,

> Just use bdev_alignment_offset in disk_discard_alignment_show instead.
> That helpers is the same except for an always false branch that
> doesn't matter in this slow path.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 18/27] block: move bdev_alignment_offset and queue_limit_alignment_offset out of line

2022-04-06 Thread Martin K. Petersen


Christoph,

> No need to inline these fairly larger helpers.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 17/27] block: use bdev_alignment_offset in disk_alignment_offset_show

2022-04-06 Thread Martin K. Petersen


Christoph,

> This does the same as the open coded variant except for an extra
> branch, and allows to remove queue_alignment_offset entirely.

Also fine.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 15/27] block: use bdev_alignment_offset in part_alignment_offset_show

2022-04-06 Thread Martin K. Petersen


Christoph,

> Replace the open coded offset calculation with the proper helper.
> This is an ABI change in that the -1 for a misaligned partition is
> properly propagated, which can be considered a bug fix and maches what
> is done on the whole device.

Looks good.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 10/27] block: add a bdev_nonrot helper

2022-04-06 Thread Martin K. Petersen


Christoph,

> Add a helper to check the nonrot flag based on the block_device
> instead of having to poke into the block layer internal request_queue.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [dm-devel] [PATCH 03/27] target: fix discard alignment on partitions

2022-04-06 Thread Martin K. Petersen


Christoph,

> Use the proper bdev_discard_alignment helper that accounts for partition
> offsets.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 13/27] block: add a bdev_stable_writes helper

2022-04-06 Thread Martin K. Petersen


Christoph,

> Add a helper to check the stable writes flag based on the block_device
> instead of having to poke into the block layer internal request_queue.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 12/27] block: add a bdev_fua helper

2022-04-06 Thread Martin K. Petersen


Christoph,

> Add a helper to check the FUA flag based on the block_device instead
> of having to poke into the block layer internal request_queue.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Ocfs2-devel] [PATCH 11/27] block: add a bdev_write_cache helper

2022-04-06 Thread Martin K. Petersen


Christoph,

> Add a helper to check the write cache flag based on the block_device
> instead of having to poke into the block layer internal request_queue.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [dm-devel] [PATCH 14/27] block: add a bdev_max_zone_append_sectors helper

2022-04-06 Thread Martin K. Petersen


Christoph,

> Add a helper to check the max supported sectors for zone append based
> on the block_device instead of having to poke into the block layer
> internal request_queue.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 02/27] target: pass a block_device to target_configure_unmap_from_queue

2022-04-06 Thread Martin K. Petersen


Christoph,

> The target code is a consumer of the block layer and should generally
> work on struct block_device.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [dm-devel] [PATCH 01/27] target: remove an incorrect unmap zeroes data deduction

2022-04-06 Thread Martin K. Petersen


Christoph,

> For block devices the target code implements UNMAP as calls to
> blkdev_issue_discard, which does not guarantee zeroing just because
> Write Zeroes is supported.
>
> Note that this does not affect the file backed path which uses
> fallocate to punch holes.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [dm-devel] [PATCH 01/27] target: remove an incorrect unmap zeroes data deduction

2022-04-06 Thread Martin K. Petersen


Christoph,

> For block devices the target code implements UNMAP as calls to
> blkdev_issue_discard, which does not guarantee zeroing just because
> Write Zeroes is supported.
>
> Note that this does not affect the file backed path which uses
> fallocate to punch holes.
>
> Fixes: 2237498f0b5c ("target/iblock: Convert WRITE_SAME to 
> blkdev_issue_zeroout")
> Signed-off-by: Christoph Hellwig 



-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 5.10 15/25] vhost_vdpa: don't setup irq offloading when irq_num < 0

2022-04-06 Thread Sasha Levin
From: Zhu Lingshan 

[ Upstream commit cce0ab2b2a39072d81f98017f7b076f3410ef740 ]

When irq number is negative(e.g., -EINVAL), the virtqueue
may be disabled or the virtqueues are sharing a device irq.
In such case, we should not setup irq offloading for a virtqueue.

Signed-off-by: Zhu Lingshan 
Link: https://lore.kernel.org/r/20220222115428.998334-3-lingshan@intel.com
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Sasha Levin 
---
 drivers/vhost/vdpa.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index e4d60009d908..04578aa87e4d 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -97,8 +97,11 @@ static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, 
u16 qid)
return;
 
irq = ops->get_vq_irq(vdpa, qid);
+   if (irq < 0)
+   return;
+
irq_bypass_unregister_producer(>call_ctx.producer);
-   if (!vq->call_ctx.ctx || irq < 0)
+   if (!vq->call_ctx.ctx)
return;
 
vq->call_ctx.producer.token = vq->call_ctx.ctx;
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 5.10 16/25] tools/virtio: compile with -pthread

2022-04-06 Thread Sasha Levin
From: "Michael S. Tsirkin" 

[ Upstream commit f03560a57c1f60db6ac23ffd9714e1c69e2f95c7 ]

When using pthreads, one has to compile and link with -lpthread,
otherwise e.g. glibc is not guaranteed to be reentrant.

This replaces -lpthread.

Reported-by: Matthew Wilcox 
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Sasha Levin 
---
 tools/virtio/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
index 0d7bbe49359d..1b25cc7c64bb 100644
--- a/tools/virtio/Makefile
+++ b/tools/virtio/Makefile
@@ -5,7 +5,8 @@ virtio_test: virtio_ring.o virtio_test.o
 vringh_test: vringh_test.o vringh.o virtio_ring.o
 
 CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I 
../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing 
-fno-common -MMD -U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h
-LDFLAGS += -lpthread
+CFLAGS += -pthread
+LDFLAGS += -pthread
 vpath %.c ../../drivers/virtio ../../drivers/vhost
 mod:
${MAKE} -C `pwd`/../.. M=`pwd`/vhost_test V=${V}
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 5.15 16/27] vhost_vdpa: don't setup irq offloading when irq_num < 0

2022-04-06 Thread Sasha Levin
From: Zhu Lingshan 

[ Upstream commit cce0ab2b2a39072d81f98017f7b076f3410ef740 ]

When irq number is negative(e.g., -EINVAL), the virtqueue
may be disabled or the virtqueues are sharing a device irq.
In such case, we should not setup irq offloading for a virtqueue.

Signed-off-by: Zhu Lingshan 
Link: https://lore.kernel.org/r/20220222115428.998334-3-lingshan@intel.com
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Sasha Levin 
---
 drivers/vhost/vdpa.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index d62f05d056b7..299a99532618 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -97,8 +97,11 @@ static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, 
u16 qid)
return;
 
irq = ops->get_vq_irq(vdpa, qid);
+   if (irq < 0)
+   return;
+
irq_bypass_unregister_producer(>call_ctx.producer);
-   if (!vq->call_ctx.ctx || irq < 0)
+   if (!vq->call_ctx.ctx)
return;
 
vq->call_ctx.producer.token = vq->call_ctx.ctx;
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 5.15 17/27] tools/virtio: compile with -pthread

2022-04-06 Thread Sasha Levin
From: "Michael S. Tsirkin" 

[ Upstream commit f03560a57c1f60db6ac23ffd9714e1c69e2f95c7 ]

When using pthreads, one has to compile and link with -lpthread,
otherwise e.g. glibc is not guaranteed to be reentrant.

This replaces -lpthread.

Reported-by: Matthew Wilcox 
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Sasha Levin 
---
 tools/virtio/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
index 0d7bbe49359d..1b25cc7c64bb 100644
--- a/tools/virtio/Makefile
+++ b/tools/virtio/Makefile
@@ -5,7 +5,8 @@ virtio_test: virtio_ring.o virtio_test.o
 vringh_test: vringh_test.o vringh.o virtio_ring.o
 
 CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I 
../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing 
-fno-common -MMD -U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h
-LDFLAGS += -lpthread
+CFLAGS += -pthread
+LDFLAGS += -pthread
 vpath %.c ../../drivers/virtio ../../drivers/vhost
 mod:
${MAKE} -C `pwd`/../.. M=`pwd`/vhost_test V=${V}
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 5.16 20/30] tools/virtio: compile with -pthread

2022-04-06 Thread Sasha Levin
From: "Michael S. Tsirkin" 

[ Upstream commit f03560a57c1f60db6ac23ffd9714e1c69e2f95c7 ]

When using pthreads, one has to compile and link with -lpthread,
otherwise e.g. glibc is not guaranteed to be reentrant.

This replaces -lpthread.

Reported-by: Matthew Wilcox 
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Sasha Levin 
---
 tools/virtio/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
index 0d7bbe49359d..1b25cc7c64bb 100644
--- a/tools/virtio/Makefile
+++ b/tools/virtio/Makefile
@@ -5,7 +5,8 @@ virtio_test: virtio_ring.o virtio_test.o
 vringh_test: vringh_test.o vringh.o virtio_ring.o
 
 CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I 
../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing 
-fno-common -MMD -U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h
-LDFLAGS += -lpthread
+CFLAGS += -pthread
+LDFLAGS += -pthread
 vpath %.c ../../drivers/virtio ../../drivers/vhost
 mod:
${MAKE} -C `pwd`/../.. M=`pwd`/vhost_test V=${V}
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 5.16 19/30] vhost_vdpa: don't setup irq offloading when irq_num < 0

2022-04-06 Thread Sasha Levin
From: Zhu Lingshan 

[ Upstream commit cce0ab2b2a39072d81f98017f7b076f3410ef740 ]

When irq number is negative(e.g., -EINVAL), the virtqueue
may be disabled or the virtqueues are sharing a device irq.
In such case, we should not setup irq offloading for a virtqueue.

Signed-off-by: Zhu Lingshan 
Link: https://lore.kernel.org/r/20220222115428.998334-3-lingshan@intel.com
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Sasha Levin 
---
 drivers/vhost/vdpa.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index e3c4f059b21a..2c226329c132 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -97,8 +97,11 @@ static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, 
u16 qid)
return;
 
irq = ops->get_vq_irq(vdpa, qid);
+   if (irq < 0)
+   return;
+
irq_bypass_unregister_producer(>call_ctx.producer);
-   if (!vq->call_ctx.ctx || irq < 0)
+   if (!vq->call_ctx.ctx)
return;
 
vq->call_ctx.producer.token = vq->call_ctx.ctx;
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 5.17 19/31] vhost_vdpa: don't setup irq offloading when irq_num < 0

2022-04-06 Thread Sasha Levin
From: Zhu Lingshan 

[ Upstream commit cce0ab2b2a39072d81f98017f7b076f3410ef740 ]

When irq number is negative(e.g., -EINVAL), the virtqueue
may be disabled or the virtqueues are sharing a device irq.
In such case, we should not setup irq offloading for a virtqueue.

Signed-off-by: Zhu Lingshan 
Link: https://lore.kernel.org/r/20220222115428.998334-3-lingshan@intel.com
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Sasha Levin 
---
 drivers/vhost/vdpa.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index ec5249e8c32d..05f5fd2af58f 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -97,8 +97,11 @@ static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, 
u16 qid)
return;
 
irq = ops->get_vq_irq(vdpa, qid);
+   if (irq < 0)
+   return;
+
irq_bypass_unregister_producer(>call_ctx.producer);
-   if (!vq->call_ctx.ctx || irq < 0)
+   if (!vq->call_ctx.ctx)
return;
 
vq->call_ctx.producer.token = vq->call_ctx.ctx;
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 5.17 20/31] tools/virtio: compile with -pthread

2022-04-06 Thread Sasha Levin
From: "Michael S. Tsirkin" 

[ Upstream commit f03560a57c1f60db6ac23ffd9714e1c69e2f95c7 ]

When using pthreads, one has to compile and link with -lpthread,
otherwise e.g. glibc is not guaranteed to be reentrant.

This replaces -lpthread.

Reported-by: Matthew Wilcox 
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Sasha Levin 
---
 tools/virtio/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
index 0d7bbe49359d..1b25cc7c64bb 100644
--- a/tools/virtio/Makefile
+++ b/tools/virtio/Makefile
@@ -5,7 +5,8 @@ virtio_test: virtio_ring.o virtio_test.o
 vringh_test: vringh_test.o vringh.o virtio_ring.o
 
 CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I 
../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing 
-fno-common -MMD -U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h
-LDFLAGS += -lpthread
+CFLAGS += -pthread
+LDFLAGS += -pthread
 vpath %.c ../../drivers/virtio ../../drivers/vhost
 mod:
${MAKE} -C `pwd`/../.. M=`pwd`/vhost_test V=${V}
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-06 Thread Christoph Hellwig
On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
> > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > > > Oh, I didn't know about device_get_dma_attr()..
> > 
> > Which is completely broken for any non-OF, non-ACPI plaform.
> 
> I saw that, but I spent some time searching and could not find an
> iommu driver that would load independently of OF or ACPI. ie no IOMMU
> platform drivers are created by board files. Things like Intel/AMD
> discover only from ACPI, etc.

s390?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-06 Thread Christoph Hellwig
On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > Oh, I didn't know about device_get_dma_attr()..

Which is completely broken for any non-OF, non-ACPI plaform.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-06 Thread Robin Murphy

On 2022-04-06 15:14, Jason Gunthorpe wrote:

On Wed, Apr 06, 2022 at 03:51:50PM +0200, Christoph Hellwig wrote:

On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote:

Didn't see it

I'll move dev_is_dma_coherent to device.h along with
device_iommu_mapped() and others then


No.  It it is internal for a reason.  It also doesn't actually work
outside of the dma core.  E.g. for non-swiotlb ARM configs it will
not actually work.


Really? It is the only condition that dma_info_to_prot() tests to
decide of IOMMU_CACHE is used or not, so you are saying that there is
a condition where a device can be attached to an iommu_domain and
dev_is_dma_coherent() returns the wrong information? How does
dma-iommu.c safely use it then?


The common iommu-dma layer happens to be part of the subset of the DMA 
core which *does* play the dev->dma_coherent game. 32-bit Arm has its 
own IOMMU DMA ops which do not. I don't know if the set of PowerPCs with 
CONFIG_NOT_COHERENT_CACHE intersects the set of PowerPCs that can do 
VFIO, but that would be another example if so.



In any case I still need to do something about the places checking
IOMMU_CAP_CACHE_COHERENCY and thinking that means IOMMU_CACHE
works. Any idea?


Can we improve the IOMMU drivers such that that *can* be the case 
(within a reasonable margin of error)? That's kind of where I was hoping 
to head with device_iommu_capable(), e.g. [1].


Robin.

[1] 
https://gitlab.arm.com/linux-arm/linux-rm/-/commit/53390e9505b3791adedc0974e251e5c7360e402e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-06 Thread Christoph Hellwig
On Wed, Apr 06, 2022 at 11:14:46AM -0300, Jason Gunthorpe wrote:
> Really? It is the only condition that dma_info_to_prot() tests to
> decide of IOMMU_CACHE is used or not, so you are saying that there is
> a condition where a device can be attached to an iommu_domain and
> dev_is_dma_coherent() returns the wrong information? How does
> dma-iommu.c safely use it then?

arm does not use dma-iommu.c
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 4/5] virtio-pci: implement synchronize_vqs()

2022-04-06 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 03:04:32PM +0200, Cornelia Huck wrote:
> On Wed, Apr 06 2022, "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Apr 06, 2022 at 04:35:37PM +0800, Jason Wang wrote:
> >> This patch implements PCI version of synchronize_vqs().
> >> 
> >> Cc: Thomas Gleixner 
> >> Cc: Peter Zijlstra 
> >> Cc: "Paul E. McKenney" 
> >> Cc: Marc Zyngier 
> >> Signed-off-by: Jason Wang 
> >
> > Please add implementations at least for ccw and mmio.
> 
> I'm not sure what (if anything) can/should be done for ccw...
> 
> >
> >> ---
> >>  drivers/virtio/virtio_pci_common.c | 14 ++
> >>  drivers/virtio/virtio_pci_common.h |  2 ++
> >>  drivers/virtio/virtio_pci_legacy.c |  1 +
> >>  drivers/virtio/virtio_pci_modern.c |  2 ++
> >>  4 files changed, 19 insertions(+)
> >> 
> >> diff --git a/drivers/virtio/virtio_pci_common.c 
> >> b/drivers/virtio/virtio_pci_common.c
> >> index d724f676608b..b78c8bc93a97 100644
> >> --- a/drivers/virtio/virtio_pci_common.c
> >> +++ b/drivers/virtio/virtio_pci_common.c
> >> @@ -37,6 +37,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
> >>synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> >>  }
> >>  
> >> +void vp_synchronize_vqs(struct virtio_device *vdev)
> >> +{
> >> +  struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> >> +  int i;
> >> +
> >> +  if (vp_dev->intx_enabled) {
> >> +  synchronize_irq(vp_dev->pci_dev->irq);
> >> +  return;
> >> +  }
> >> +
> >> +  for (i = 0; i < vp_dev->msix_vectors; ++i)
> >> +  synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> >> +}
> >> +
> 
> ...given that this seems to synchronize threaded interrupt handlers?

No, any handlers at all. The point is to make sure any memory changes
made prior to this op are visible to callbacks.

Jason, maybe add that to the documentation?

> Halil, do you think ccw needs to do anything? (AFAICS, we only have one
> 'irq' for channel devices anyway, and the handler just calls the
> relevant callbacks directly.)

Then you need to synchronize with that.

> >>  /* the notify function used when creating a virt queue */
> >>  bool vp_notify(struct virtqueue *vq)
> >>  {

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-06 Thread Robin Murphy

On 2022-04-05 17:16, Jason Gunthorpe wrote:

vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct
way to do this is via dev_is_dma_coherent()


Not necessarily...

Disregarding the complete disaster of PCIe No Snoop on Arm-Based 
systems, there's the more interesting effectively-opposite scenario 
where an SMMU bridges non-coherent devices to a coherent interconnect. 
It's not something we take advantage of yet in Linux, and it can only be 
properly described in ACPI, but there do exist situations where 
IOMMU_CACHE is capable of making the device's traffic snoop, but 
dev_is_dma_coherent() - and device_get_dma_attr() for external users - 
would still say non-coherent because they can't assume that the SMMU is 
enabled and programmed in just the right way.


I've also not thought too much about how things might look with S2FWB 
thrown into the mix in future...


Robin.


like the DMA API does. If
IOMMU_CACHE is not supported then these drivers won't work as they don't
call any coherency-restoring routines around their DMAs.

Signed-off-by: Jason Gunthorpe 
---
  drivers/infiniband/hw/usnic/usnic_uiom.c | 16 +++-
  drivers/vhost/vdpa.c |  3 ++-
  2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c 
b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 760b254ba42d6b..24d118198ac756 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -42,6 +42,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "usnic_log.h"

  #include "usnic_uiom.h"
@@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, 
struct device *dev)
struct usnic_uiom_dev *uiom_dev;
int err;
  
+	if (!dev_is_dma_coherent(dev)) {

+   usnic_err("IOMMU of %s does not support cache coherency\n",
+   dev_name(dev));
+   return -EINVAL;
+   }
+
uiom_dev = kzalloc(sizeof(*uiom_dev), GFP_ATOMIC);
if (!uiom_dev)
return -ENOMEM;
@@ -483,13 +490,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, 
struct device *dev)
if (err)
goto out_free_dev;
  
-	if (!iommu_capable(dev->bus, IOMMU_CAP_CACHE_COHERENCY)) {

-   usnic_err("IOMMU of %s does not support cache coherency\n",
-   dev_name(dev));
-   err = -EINVAL;
-   goto out_detach_device;
-   }
-
spin_lock(>lock);
list_add_tail(_dev->link, >devs);
pd->dev_cnt++;
@@ -497,8 +497,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, 
struct device *dev)
  
  	return 0;
  
-out_detach_device:

-   iommu_detach_device(pd->domain, dev);
  out_free_dev:
kfree(uiom_dev);
return err;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 4c2f0bd062856a..05ea5800febc37 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -22,6 +22,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "vhost.h"
  
@@ -929,7 +930,7 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)

if (!bus)
return -EFAULT;
  
-	if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))

+   if (!dev_is_dma_coherent(dma_dev))
return -ENOTSUPP;
  
  	v->domain = iommu_domain_alloc(bus);

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-06 Thread Christoph Hellwig
On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote:
> Didn't see it
> 
> I'll move dev_is_dma_coherent to device.h along with
> device_iommu_mapped() and others then

No.  It it is internal for a reason.  It also doesn't actually work
outside of the dma core.  E.g. for non-swiotlb ARM configs it will
not actually work.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 4/5] virtio-pci: implement synchronize_vqs()

2022-04-06 Thread Cornelia Huck
On Wed, Apr 06 2022, "Michael S. Tsirkin"  wrote:

> On Wed, Apr 06, 2022 at 04:35:37PM +0800, Jason Wang wrote:
>> This patch implements PCI version of synchronize_vqs().
>> 
>> Cc: Thomas Gleixner 
>> Cc: Peter Zijlstra 
>> Cc: "Paul E. McKenney" 
>> Cc: Marc Zyngier 
>> Signed-off-by: Jason Wang 
>
> Please add implementations at least for ccw and mmio.

I'm not sure what (if anything) can/should be done for ccw...

>
>> ---
>>  drivers/virtio/virtio_pci_common.c | 14 ++
>>  drivers/virtio/virtio_pci_common.h |  2 ++
>>  drivers/virtio/virtio_pci_legacy.c |  1 +
>>  drivers/virtio/virtio_pci_modern.c |  2 ++
>>  4 files changed, 19 insertions(+)
>> 
>> diff --git a/drivers/virtio/virtio_pci_common.c 
>> b/drivers/virtio/virtio_pci_common.c
>> index d724f676608b..b78c8bc93a97 100644
>> --- a/drivers/virtio/virtio_pci_common.c
>> +++ b/drivers/virtio/virtio_pci_common.c
>> @@ -37,6 +37,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
>>  synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
>>  }
>>  
>> +void vp_synchronize_vqs(struct virtio_device *vdev)
>> +{
>> +struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> +int i;
>> +
>> +if (vp_dev->intx_enabled) {
>> +synchronize_irq(vp_dev->pci_dev->irq);
>> +return;
>> +}
>> +
>> +for (i = 0; i < vp_dev->msix_vectors; ++i)
>> +synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
>> +}
>> +

...given that this seems to synchronize threaded interrupt handlers?
Halil, do you think ccw needs to do anything? (AFAICS, we only have one
'irq' for channel devices anyway, and the handler just calls the
relevant callbacks directly.)

>>  /* the notify function used when creating a virt queue */
>>  bool vp_notify(struct virtqueue *vq)
>>  {

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 5/5] virtio: harden vring IRQ

2022-04-06 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 04:35:38PM +0800, Jason Wang wrote:
> This is a rework on the previous IRQ hardening that is done for
> virtio-pci where several drawbacks were found and were reverted:
> 
> 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
>that is used by some device such as virtio-blk
> 2) done only for PCI transport
> 
> In this patch, we tries to borrow the idea from the INTX IRQ hardening
> in the reverted commit 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
> by introducing a global device_ready variable for each
> virtio_device. Then we can to toggle it during
> virtio_reset_device()/virtio_device_ready(). A
> virtio_synchornize_vqs() is used in both virtio_device_ready() and
> virtio_reset_device() to synchronize with the vring callbacks. With
> this, vring_interrupt() can return check and early if driver_ready is
> false.
> 
> Note that the hardening is only done for vring interrupt since the
> config interrupt hardening is already done in commit 22b7050a024d7
> ("virtio: defer config changed notifications"). But the method that is
> used by config interrupt can't be reused by the vring interrupt
> handler because it uses spinlock to do the synchronization which is
> expensive.
> 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Signed-off-by: Jason Wang 
> ---
>  drivers/virtio/virtio.c   | 11 +++
>  drivers/virtio/virtio_ring.c  |  9 -
>  include/linux/virtio.h|  2 ++
>  include/linux/virtio_config.h |  8 
>  4 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 8dde44ea044a..2f3a6f8e3d9c 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -220,6 +220,17 @@ static int virtio_features_ok(struct virtio_device *dev)
>   * */
>  void virtio_reset_device(struct virtio_device *dev)
>  {
> + if (READ_ONCE(dev->driver_ready)) {
> + /*
> +  * The below virtio_synchronize_vqs() guarantees that any
> +  * interrupt for this line arriving after
> +  * virtio_synchronize_vqs() has completed is guaranteed to see
> +  * driver_ready == false.
> +  */
> + WRITE_ONCE(dev->driver_ready, false);
> + virtio_synchronize_vqs(dev);
> + }
> +
>   dev->config->reset(dev);
>  }
>  EXPORT_SYMBOL_GPL(virtio_reset_device);
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cfb028ca238e..a4592e55c9f8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2127,10 +2127,17 @@ static inline bool more_used(const struct 
> vring_virtqueue *vq)
>   return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
>  }
>  
> -irqreturn_t vring_interrupt(int irq, void *_vq)
> +irqreturn_t vring_interrupt(int irq, void *v)
>  {
> + struct virtqueue *_vq = v;
> + struct virtio_device *vdev = _vq->vdev;
>   struct vring_virtqueue *vq = to_vvq(_vq);
>  
> + if (!READ_ONCE(vdev->driver_ready)) {


I am not sure why we need READ_ONCE here, it's done under lock.


Accrdingly, same thing above for READ_ONCE and WRITE_ONCE.


> + dev_warn_once(>dev, "virtio vring IRQ raised before 
> DRIVER_OK");
> + return IRQ_NONE;
> + }
> +
>   if (!more_used(vq)) {
>   pr_debug("virtqueue interrupt with no work for %p\n", vq);
>   return IRQ_NONE;
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 5464f398912a..dfa2638a293e 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -95,6 +95,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
>   * @failed: saved value for VIRTIO_CONFIG_S_FAILED bit (for restore)
>   * @config_enabled: configuration change reporting enabled
>   * @config_change_pending: configuration change reported while disabled
> + * @driver_ready: whehter the driver is ready (e.g for vring callbacks)
>   * @config_lock: protects configuration change reporting
>   * @dev: underlying device.
>   * @id: the device type identification (used to match it with a driver).
> @@ -109,6 +110,7 @@ struct virtio_device {
>   bool failed;
>   bool config_enabled;
>   bool config_change_pending;
> + bool driver_ready;
>   spinlock_t config_lock;
>   spinlock_t vqs_list_lock; /* Protects VQs list access */
>   struct device dev;
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 08b73d9bbff2..c9e207bf2c9c 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -246,6 +246,14 @@ void virtio_device_ready(struct virtio_device *dev)
>  {
>   unsigned status = dev->config->get_status(dev);
>  
> + virtio_synchronize_vqs(dev);
> +/*
> + * The above virtio_synchronize_vqs() make sure


makes sure

> + * vring_interrupt() will 

Re: [PATCH V2 4/5] virtio-pci: implement synchronize_vqs()

2022-04-06 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 04:35:37PM +0800, Jason Wang wrote:
> This patch implements PCI version of synchronize_vqs().
> 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Signed-off-by: Jason Wang 

Please add implementations at least for ccw and mmio.

> ---
>  drivers/virtio/virtio_pci_common.c | 14 ++
>  drivers/virtio/virtio_pci_common.h |  2 ++
>  drivers/virtio/virtio_pci_legacy.c |  1 +
>  drivers/virtio/virtio_pci_modern.c |  2 ++
>  4 files changed, 19 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index d724f676608b..b78c8bc93a97 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -37,6 +37,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
>   synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
>  }
>  
> +void vp_synchronize_vqs(struct virtio_device *vdev)
> +{
> + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> + int i;
> +
> + if (vp_dev->intx_enabled) {
> + synchronize_irq(vp_dev->pci_dev->irq);
> + return;
> + }
> +
> + for (i = 0; i < vp_dev->msix_vectors; ++i)
> + synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> +}
> +
>  /* the notify function used when creating a virt queue */
>  bool vp_notify(struct virtqueue *vq)
>  {
> diff --git a/drivers/virtio/virtio_pci_common.h 
> b/drivers/virtio/virtio_pci_common.h
> index eb17a29fc7ef..2b84d5c1b5bc 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -105,6 +105,8 @@ static struct virtio_pci_device *to_vp_device(struct 
> virtio_device *vdev)
>  void vp_synchronize_vectors(struct virtio_device *vdev);
>  /* the notify function used when creating a virt queue */
>  bool vp_notify(struct virtqueue *vq);
> +/* synchronize with callbacks */
> +void vp_synchronize_vqs(struct virtio_device *vdev);
>  /* the config->del_vqs() implementation */
>  void vp_del_vqs(struct virtio_device *vdev);
>  /* the config->find_vqs() implementation */
> diff --git a/drivers/virtio/virtio_pci_legacy.c 
> b/drivers/virtio/virtio_pci_legacy.c
> index 6f4e34ce96b8..5a9e62320edc 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -192,6 +192,7 @@ static const struct virtio_config_ops 
> virtio_pci_config_ops = {
>   .reset  = vp_reset,
>   .find_vqs   = vp_find_vqs,
>   .del_vqs= vp_del_vqs,
> + .synchronize_vqs = vp_synchronize_vqs,
>   .get_features   = vp_get_features,
>   .finalize_features = vp_finalize_features,
>   .bus_name   = vp_bus_name,
> diff --git a/drivers/virtio/virtio_pci_modern.c 
> b/drivers/virtio/virtio_pci_modern.c
> index a2671a20ef77..584850389855 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -394,6 +394,7 @@ static const struct virtio_config_ops 
> virtio_pci_config_nodev_ops = {
>   .reset  = vp_reset,
>   .find_vqs   = vp_modern_find_vqs,
>   .del_vqs= vp_del_vqs,
> + .synchronize_vqs = vp_synchronize_vqs,
>   .get_features   = vp_get_features,
>   .finalize_features = vp_finalize_features,
>   .bus_name   = vp_bus_name,
> @@ -411,6 +412,7 @@ static const struct virtio_config_ops 
> virtio_pci_config_ops = {
>   .reset  = vp_reset,
>   .find_vqs   = vp_modern_find_vqs,
>   .del_vqs= vp_del_vqs,
> + .synchronize_vqs = vp_synchronize_vqs,
>   .get_features   = vp_get_features,
>   .finalize_features = vp_finalize_features,
>   .bus_name   = vp_bus_name,
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 3/5] virtio: introduce config op to synchronize vring callbacks

2022-04-06 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 04:35:36PM +0800, Jason Wang wrote:
> This patch introduce

introduces

> a new

new

> virtio config ops to vring
> callbacks. Transport specific method is required to call
> synchornize_irq() on the IRQs. For the transport that doesn't provide
> synchronize_vqs(), use synchornize_rcu() as a fallback.
> 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Signed-off-by: Jason Wang 
> ---
>  include/linux/virtio_config.h | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index b341dd62aa4d..08b73d9bbff2 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -57,6 +57,8 @@ struct virtio_shm_region {
>   *   include a NULL entry for vqs unused by driver
>   *   Returns 0 on success or error status
>   * @del_vqs: free virtqueues found by find_vqs().
> + * @synchronize_vqs: synchronize with the virtqueue callbacks.
> + *   vdev: the virtio_device

I think I prefer synchronize_callbacks

>   * @get_features: get the array of feature bits for this device.
>   *   vdev: the virtio_device
>   *   Returns the first 64 feature bits (all we currently need).
> @@ -89,6 +91,7 @@ struct virtio_config_ops {
>   const char * const names[], const bool *ctx,
>   struct irq_affinity *desc);
>   void (*del_vqs)(struct virtio_device *);
> + void (*synchronize_vqs)(struct virtio_device *);
>   u64 (*get_features)(struct virtio_device *vdev);
>   int (*finalize_features)(struct virtio_device *vdev);
>   const char *(*bus_name)(struct virtio_device *vdev);
> @@ -217,6 +220,19 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, 
> unsigned nvqs,
> desc);
>  }
>  
> +/**
> + * virtio_synchronize_vqs - synchronize with virtqueue callbacks
> + * @vdev: the device
> + */
> +static inline
> +void virtio_synchronize_vqs(struct virtio_device *dev)
> +{
> + if (dev->config->synchronize_vqs)
> + dev->config->synchronize_vqs(dev);
> + else
> + synchronize_rcu();

I am not sure about this fallback and the latency impact.
Maybe synchronize_rcu_expedited is better here.

> +}
> +
>  /**
>   * virtio_device_ready - enable vq use in probe function
>   * @vdev: the device
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 2/5] virtio: use virtio_reset_device() when possible

2022-04-06 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 04:35:35PM +0800, Jason Wang wrote:
> This allows us to do common extension without duplicating codes.

codes -> code

> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Signed-off-by: Jason Wang 
> ---
>  drivers/virtio/virtio.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 75c8d560bbd3..8dde44ea044a 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -430,7 +430,7 @@ int register_virtio_device(struct virtio_device *dev)
>  
>   /* We always start by resetting the device, in case a previous
>* driver messed it up.  This also tests that code path a little. */
> - dev->config->reset(dev);
> + virtio_reset_device(dev);
>  
>   /* Acknowledge that we've seen the device. */
>   virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> @@ -496,7 +496,7 @@ int virtio_device_restore(struct virtio_device *dev)
>  
>   /* We always start by resetting the device, in case a previous
>* driver messed it up. */
> - dev->config->reset(dev);
> + virtio_reset_device(dev);
>  
>   /* Acknowledge that we've seen the device. */
>   virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 1/5] virtio: use virtio_device_ready() in virtio_device_restore()

2022-04-06 Thread Michael S. Tsirkin
patch had wrong mime type. I managed to extra it but pls fix.

> 
> 
> From: Stefano Garzarella 
> 
> It will allows us

will allow us

> to do extension on virtio_device_ready() without
> duplicating codes.

code

> 
> Cc: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: "Paul E. McKenney" 
> Cc: Marc Zyngier 
> Signed-off-by: Stefano Garzarella 
> Signed-off-by: Jason Wang 
> ---
>  drivers/virtio/virtio.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 22f15f444f75..75c8d560bbd3 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -526,8 +526,9 @@ int virtio_device_restore(struct virtio_device *dev)
>   goto err;
>   }
>  
> - /* Finally, tell the device we're all set */
> - virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> + /* If restore didn't do it, mark device DRIVER_OK ourselves. */
> + if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
> + virtio_device_ready(dev);
>  
>   virtio_config_enable(dev);

it's unfortunate that this adds an extra vmexit since virtio_device_ready
calls get_status too.

We now have:

static inline
void virtio_device_ready(struct virtio_device *dev)
{
unsigned status = dev->config->get_status(dev);

BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
}


I propose adding a helper and putting common code there.

>  
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 0/5] rework on the IRQ hardening of virtio

2022-04-06 Thread Michael S. Tsirkin
On Wed, Apr 06, 2022 at 04:35:33PM +0800, Jason Wang wrote:
> Hi All:
> 
> This is a rework on the IRQ hardening for virtio which is done
> previously by the following commits are reverted:
> 
> 9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
> 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
> 
> The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
> with the assumption of the affinity managed IRQ that is used by some
> virtio drivers. And what's more, it is only done for virtio-pci but
> not other transports.
> 
> In this rework, I try to implement a general virtio solution which
> borrows the idea of the INTX hardening by introducing a boolean for
> virtqueue callback enabling and toggle it in virtio_device_ready()
> and virtio_reset_device(). Then vring_interrupt() can simply check and
> return early if the driver is not ready.


All of a sudden all patches are having a wrong mime type.

It is application/octet-stream; should be text/plain

Pls fix and repost, thanks!

> Please review.
> 
> Changes since v1:
> 
> - Use transport specific irq synchronization method when possible
> - Drop the module parameter and enable the hardening unconditonally
> - Tweak the barrier/ordering facilities used in the code
> - Reanme irq_soft_enabled to driver_ready
> - Avoid unnecssary IRQ synchornization (e.g during boot)
> 
> Jason Wang (4):
>   virtio: use virtio_reset_device() when possible
>   virtio: introduce config op to synchronize vring callbacks
>   virtio-pci: implement synchronize_vqs()
>   virtio: harden vring IRQ
> 
> Stefano Garzarella (1):
>   virtio: use virtio_device_ready() in virtio_device_restore()
> 
>  drivers/virtio/virtio.c| 20 
>  drivers/virtio/virtio_pci_common.c | 14 ++
>  drivers/virtio/virtio_pci_common.h |  2 ++
>  drivers/virtio/virtio_pci_legacy.c |  1 +
>  drivers/virtio/virtio_pci_modern.c |  2 ++
>  drivers/virtio/virtio_ring.c   |  9 -
>  include/linux/virtio.h |  2 ++
>  include/linux/virtio_config.h  | 24 
>  8 files changed, 69 insertions(+), 5 deletions(-)
> 
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 5/5] virtio: harden vring IRQ

2022-04-06 Thread Jason Wang
This is a rework on the previous IRQ hardening that is done for
virtio-pci where several drawbacks were found and were reverted:

1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
   that is used by some device such as virtio-blk
2) done only for PCI transport

In this patch, we tries to borrow the idea from the INTX IRQ hardening
in the reverted commit 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
by introducing a global device_ready variable for each
virtio_device. Then we can to toggle it during
virtio_reset_device()/virtio_device_ready(). A
virtio_synchornize_vqs() is used in both virtio_device_ready() and
virtio_reset_device() to synchronize with the vring callbacks. With
this, vring_interrupt() can return check and early if driver_ready is
false.

Note that the hardening is only done for vring interrupt since the
config interrupt hardening is already done in commit 22b7050a024d7
("virtio: defer config changed notifications"). But the method that is
used by config interrupt can't be reused by the vring interrupt
handler because it uses spinlock to do the synchronization which is
expensive.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Signed-off-by: Jason Wang 
---
 drivers/virtio/virtio.c   | 11 +++
 drivers/virtio/virtio_ring.c  |  9 -
 include/linux/virtio.h|  2 ++
 include/linux/virtio_config.h |  8 
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 8dde44ea044a..2f3a6f8e3d9c 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -220,6 +220,17 @@ static int virtio_features_ok(struct virtio_device *dev)
  * */
 void virtio_reset_device(struct virtio_device *dev)
 {
+   if (READ_ONCE(dev->driver_ready)) {
+   /*
+* The below virtio_synchronize_vqs() guarantees that any
+* interrupt for this line arriving after
+* virtio_synchronize_vqs() has completed is guaranteed to see
+* driver_ready == false.
+*/
+   WRITE_ONCE(dev->driver_ready, false);
+   virtio_synchronize_vqs(dev);
+   }
+
dev->config->reset(dev);
 }
 EXPORT_SYMBOL_GPL(virtio_reset_device);
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cfb028ca238e..a4592e55c9f8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2127,10 +2127,17 @@ static inline bool more_used(const struct 
vring_virtqueue *vq)
return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
 }
 
-irqreturn_t vring_interrupt(int irq, void *_vq)
+irqreturn_t vring_interrupt(int irq, void *v)
 {
+   struct virtqueue *_vq = v;
+   struct virtio_device *vdev = _vq->vdev;
struct vring_virtqueue *vq = to_vvq(_vq);
 
+   if (!READ_ONCE(vdev->driver_ready)) {
+   dev_warn_once(>dev, "virtio vring IRQ raised before 
DRIVER_OK");
+   return IRQ_NONE;
+   }
+
if (!more_used(vq)) {
pr_debug("virtqueue interrupt with no work for %p\n", vq);
return IRQ_NONE;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 5464f398912a..dfa2638a293e 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -95,6 +95,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
  * @failed: saved value for VIRTIO_CONFIG_S_FAILED bit (for restore)
  * @config_enabled: configuration change reporting enabled
  * @config_change_pending: configuration change reported while disabled
+ * @driver_ready: whehter the driver is ready (e.g for vring callbacks)
  * @config_lock: protects configuration change reporting
  * @dev: underlying device.
  * @id: the device type identification (used to match it with a driver).
@@ -109,6 +110,7 @@ struct virtio_device {
bool failed;
bool config_enabled;
bool config_change_pending;
+   bool driver_ready;
spinlock_t config_lock;
spinlock_t vqs_list_lock; /* Protects VQs list access */
struct device dev;
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 08b73d9bbff2..c9e207bf2c9c 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -246,6 +246,14 @@ void virtio_device_ready(struct virtio_device *dev)
 {
unsigned status = dev->config->get_status(dev);
 
+   virtio_synchronize_vqs(dev);
+/*
+ * The above virtio_synchronize_vqs() make sure
+ * vring_interrupt() will see the driver specific setup if it
+ * see driver_ready as true.
+ */
+   WRITE_ONCE(dev->driver_ready, true);
+
BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
 }
-- 
2.25.1

___
Virtualization mailing list

[PATCH V2 4/5] virtio-pci: implement synchronize_vqs()

2022-04-06 Thread Jason Wang
This patch implements PCI version of synchronize_vqs().

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Signed-off-by: Jason Wang 
---
 drivers/virtio/virtio_pci_common.c | 14 ++
 drivers/virtio/virtio_pci_common.h |  2 ++
 drivers/virtio/virtio_pci_legacy.c |  1 +
 drivers/virtio/virtio_pci_modern.c |  2 ++
 4 files changed, 19 insertions(+)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index d724f676608b..b78c8bc93a97 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -37,6 +37,20 @@ void vp_synchronize_vectors(struct virtio_device *vdev)
synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
 }
 
+void vp_synchronize_vqs(struct virtio_device *vdev)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   int i;
+
+   if (vp_dev->intx_enabled) {
+   synchronize_irq(vp_dev->pci_dev->irq);
+   return;
+   }
+
+   for (i = 0; i < vp_dev->msix_vectors; ++i)
+   synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
+}
+
 /* the notify function used when creating a virt queue */
 bool vp_notify(struct virtqueue *vq)
 {
diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index eb17a29fc7ef..2b84d5c1b5bc 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -105,6 +105,8 @@ static struct virtio_pci_device *to_vp_device(struct 
virtio_device *vdev)
 void vp_synchronize_vectors(struct virtio_device *vdev);
 /* the notify function used when creating a virt queue */
 bool vp_notify(struct virtqueue *vq);
+/* synchronize with callbacks */
+void vp_synchronize_vqs(struct virtio_device *vdev);
 /* the config->del_vqs() implementation */
 void vp_del_vqs(struct virtio_device *vdev);
 /* the config->find_vqs() implementation */
diff --git a/drivers/virtio/virtio_pci_legacy.c 
b/drivers/virtio/virtio_pci_legacy.c
index 6f4e34ce96b8..5a9e62320edc 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -192,6 +192,7 @@ static const struct virtio_config_ops virtio_pci_config_ops 
= {
.reset  = vp_reset,
.find_vqs   = vp_find_vqs,
.del_vqs= vp_del_vqs,
+   .synchronize_vqs = vp_synchronize_vqs,
.get_features   = vp_get_features,
.finalize_features = vp_finalize_features,
.bus_name   = vp_bus_name,
diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index a2671a20ef77..584850389855 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -394,6 +394,7 @@ static const struct virtio_config_ops 
virtio_pci_config_nodev_ops = {
.reset  = vp_reset,
.find_vqs   = vp_modern_find_vqs,
.del_vqs= vp_del_vqs,
+   .synchronize_vqs = vp_synchronize_vqs,
.get_features   = vp_get_features,
.finalize_features = vp_finalize_features,
.bus_name   = vp_bus_name,
@@ -411,6 +412,7 @@ static const struct virtio_config_ops virtio_pci_config_ops 
= {
.reset  = vp_reset,
.find_vqs   = vp_modern_find_vqs,
.del_vqs= vp_del_vqs,
+   .synchronize_vqs = vp_synchronize_vqs,
.get_features   = vp_get_features,
.finalize_features = vp_finalize_features,
.bus_name   = vp_bus_name,
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 3/5] virtio: introduce config op to synchronize vring callbacks

2022-04-06 Thread Jason Wang
This patch introduce a new virtio config ops to vring
callbacks. Transport specific method is required to call
synchornize_irq() on the IRQs. For the transport that doesn't provide
synchronize_vqs(), use synchornize_rcu() as a fallback.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Signed-off-by: Jason Wang 
---
 include/linux/virtio_config.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index b341dd62aa4d..08b73d9bbff2 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -57,6 +57,8 @@ struct virtio_shm_region {
  * include a NULL entry for vqs unused by driver
  * Returns 0 on success or error status
  * @del_vqs: free virtqueues found by find_vqs().
+ * @synchronize_vqs: synchronize with the virtqueue callbacks.
+ * vdev: the virtio_device
  * @get_features: get the array of feature bits for this device.
  * vdev: the virtio_device
  * Returns the first 64 feature bits (all we currently need).
@@ -89,6 +91,7 @@ struct virtio_config_ops {
const char * const names[], const bool *ctx,
struct irq_affinity *desc);
void (*del_vqs)(struct virtio_device *);
+   void (*synchronize_vqs)(struct virtio_device *);
u64 (*get_features)(struct virtio_device *vdev);
int (*finalize_features)(struct virtio_device *vdev);
const char *(*bus_name)(struct virtio_device *vdev);
@@ -217,6 +220,19 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, 
unsigned nvqs,
  desc);
 }
 
+/**
+ * virtio_synchronize_vqs - synchronize with virtqueue callbacks
+ * @vdev: the device
+ */
+static inline
+void virtio_synchronize_vqs(struct virtio_device *dev)
+{
+   if (dev->config->synchronize_vqs)
+   dev->config->synchronize_vqs(dev);
+   else
+   synchronize_rcu();
+}
+
 /**
  * virtio_device_ready - enable vq use in probe function
  * @vdev: the device
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 2/5] virtio: use virtio_reset_device() when possible

2022-04-06 Thread Jason Wang
This allows us to do common extension without duplicating codes.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Signed-off-by: Jason Wang 
---
 drivers/virtio/virtio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 75c8d560bbd3..8dde44ea044a 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -430,7 +430,7 @@ int register_virtio_device(struct virtio_device *dev)
 
/* We always start by resetting the device, in case a previous
 * driver messed it up.  This also tests that code path a little. */
-   dev->config->reset(dev);
+   virtio_reset_device(dev);
 
/* Acknowledge that we've seen the device. */
virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
@@ -496,7 +496,7 @@ int virtio_device_restore(struct virtio_device *dev)
 
/* We always start by resetting the device, in case a previous
 * driver messed it up. */
-   dev->config->reset(dev);
+   virtio_reset_device(dev);
 
/* Acknowledge that we've seen the device. */
virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 1/5] virtio: use virtio_device_ready() in virtio_device_restore()

2022-04-06 Thread Jason Wang
From: Stefano Garzarella 

It will allows us to do extension on virtio_device_ready() without
duplicating codes.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Signed-off-by: Stefano Garzarella 
Signed-off-by: Jason Wang 
---
 drivers/virtio/virtio.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 22f15f444f75..75c8d560bbd3 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -526,8 +526,9 @@ int virtio_device_restore(struct virtio_device *dev)
goto err;
}
 
-   /* Finally, tell the device we're all set */
-   virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+   /* If restore didn't do it, mark device DRIVER_OK ourselves. */
+   if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
+   virtio_device_ready(dev);
 
virtio_config_enable(dev);
 
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 0/5] rework on the IRQ hardening of virtio

2022-04-06 Thread Jason Wang
Hi All:

This is a rework on the IRQ hardening for virtio which is done
previously by the following commits are reverted:

9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
080cd7c3ac87 ("virtio-pci: harden INTX interrupts")

The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
with the assumption of the affinity managed IRQ that is used by some
virtio drivers. And what's more, it is only done for virtio-pci but
not other transports.

In this rework, I try to implement a general virtio solution which
borrows the idea of the INTX hardening by introducing a boolean for
virtqueue callback enabling and toggle it in virtio_device_ready()
and virtio_reset_device(). Then vring_interrupt() can simply check and
return early if the driver is not ready.

Please review.

Changes since v1:

- Use transport specific irq synchronization method when possible
- Drop the module parameter and enable the hardening unconditonally
- Tweak the barrier/ordering facilities used in the code
- Reanme irq_soft_enabled to driver_ready
- Avoid unnecssary IRQ synchornization (e.g during boot)

Jason Wang (4):
  virtio: use virtio_reset_device() when possible
  virtio: introduce config op to synchronize vring callbacks
  virtio-pci: implement synchronize_vqs()
  virtio: harden vring IRQ

Stefano Garzarella (1):
  virtio: use virtio_device_ready() in virtio_device_restore()

 drivers/virtio/virtio.c| 20 
 drivers/virtio/virtio_pci_common.c | 14 ++
 drivers/virtio/virtio_pci_common.h |  2 ++
 drivers/virtio/virtio_pci_legacy.c |  1 +
 drivers/virtio/virtio_pci_modern.c |  2 ++
 drivers/virtio/virtio_ring.c   |  9 -
 include/linux/virtio.h |  2 ++
 include/linux/virtio_config.h  | 24 
 8 files changed, 69 insertions(+), 5 deletions(-)

-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RESEND V2 3/3] vdpa/mlx5: Use consistent RQT size

2022-04-06 Thread Jason Wang


在 2022/4/6 上午10:35, Jason Wang 写道:


在 2022/4/4 下午7:24, Michael S. Tsirkin 写道:

On Mon, Apr 04, 2022 at 11:07:36AM +, Eli Cohen wrote:

From: Michael S. Tsirkin 
Sent: Monday, April 4, 2022 1:35 PM
To: Jason Wang 
Cc: Eli Cohen ; hdan...@sina.com; 
virtualization@lists.linux-foundation.org; 
linux-ker...@vger.kernel.org

Subject: Re: [PATCH RESEND V2 3/3] vdpa/mlx5: Use consistent RQT size

On Tue, Mar 29, 2022 at 12:21:09PM +0800, Jason Wang wrote:

From: Eli Cohen 

The current code evaluates RQT size based on the configured number of
virtqueues. This can raise an issue in the following scenario:

Assume MQ was negotiated.
1. mlx5_vdpa_set_map() gets called.
2. handle_ctrl_mq() is called setting cur_num_vqs to some value, 
lower

    than the configured max VQs.
3. A second set_map gets called, but now a smaller number of VQs 
is used

    to evaluate the size of the RQT.
4. handle_ctrl_mq() is called with a value larger than what the 
RQT can

    hold. This will emit errors and the driver state is compromised.

To fix this, we use a new field in struct mlx5_vdpa_net to hold the
required number of entries in the RQT. This value is evaluated in
mlx5_vdpa_set_driver_features() where we have the negotiated features
all set up.

In addtion

addition?

Do you need me to send another version?

It's a bit easier that way but I can handle it manually too.



Let me send a new version with this fixed.



Ok, it looks like if I use git-send-email when From:tag is not me. The 
patch will be sent as a attachment as spotted by Maxime.


Eli, would you please send a v3 with my acked-by? (Since I don't want to 
change the author)


Thanks








If so, let's wait for Jason's reply.

Right.


to that, we take into consideration the max capability of RQT
entries early when the device is added so we don't need to take 
consider

it when creating the RQT.

Last, we remove the use of mlx5_vdpa_max_qps() which just returns the
max_vas / 2 and make the code clearer.

Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support")
Signed-off-by: Eli Cohen 

Jason I don't have your ack or S.O.B on this one.



My bad, for some reason, I miss that.

Will fix.

Thanks






---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 61 
+++

  1 file changed, 21 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c

index 53b8c1a68f90..61bec1ed0bc9 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -161,6 +161,7 @@ struct mlx5_vdpa_net {
  struct mlx5_flow_handle *rx_rule_mcast;
  bool setup;
  u32 cur_num_vqs;
+    u32 rqt_size;
  struct notifier_block nb;
  struct vdpa_callback config_cb;
  struct mlx5_vdpa_wq_ent cvq_ent;
@@ -204,17 +205,12 @@ static __virtio16 cpu_to_mlx5vdpa16(struct 
mlx5_vdpa_dev *mvdev, u16 val)
  return __cpu_to_virtio16(mlx5_vdpa_is_little_endian(mvdev), 
val);

  }

-static inline u32 mlx5_vdpa_max_qps(int max_vqs)
-{
-    return max_vqs / 2;
-}
-
  static u16 ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev)
  {
  if (!(mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
  return 2;

-    return 2 * mlx5_vdpa_max_qps(mvdev->max_vqs);
+    return mvdev->max_vqs;
  }

  static bool is_ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev, u16 idx)
@@ -1236,25 +1232,13 @@ static void teardown_vq(struct 
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *

  static int create_rqt(struct mlx5_vdpa_net *ndev)
  {
  __be32 *list;
-    int max_rqt;
  void *rqtc;
  int inlen;
  void *in;
  int i, j;
  int err;
-    int num;
-
-    if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_MQ)))
-    num = 1;
-    else
-    num = ndev->cur_num_vqs / 2;

-    max_rqt = min_t(int, roundup_pow_of_two(num),
-    1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
-    if (max_rqt < 1)
-    return -EOPNOTSUPP;
-
-    inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + max_rqt * 
MLX5_ST_SZ_BYTES(rq_num);
+    inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + ndev->rqt_size * 
MLX5_ST_SZ_BYTES(rq_num);

  in = kzalloc(inlen, GFP_KERNEL);
  if (!in)
  return -ENOMEM;
@@ -1263,12 +1247,12 @@ static int create_rqt(struct mlx5_vdpa_net 
*ndev)

  rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);

  MLX5_SET(rqtc, rqtc, list_q_type, 
MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);

-    MLX5_SET(rqtc, rqtc, rqt_max_size, max_rqt);
+    MLX5_SET(rqtc, rqtc, rqt_max_size, ndev->rqt_size);
  list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]);
-    for (i = 0, j = 0; i < max_rqt; i++, j += 2)
-    list[i] = cpu_to_be32(ndev->vqs[j % (2 * num)].virtq_id);
+    for (i = 0, j = 0; i < ndev->rqt_size; i++, j += 2)
+    list[i] = cpu_to_be32(ndev->vqs[j % 
ndev->cur_num_vqs].virtq_id);


-    MLX5_SET(rqtc, rqtc, rqt_actual_size, max_rqt);
+    MLX5_SET(rqtc, rqtc, rqt_actual_size, ndev->rqt_size);
  err = mlx5_vdpa_create_rqt(>mvdev, in, inlen, 

Re: [PATCH v3 0/4] Introduce akcipher service for virtio-crypto

2022-04-06 Thread Michael S. Tsirkin
On Tue, Apr 05, 2022 at 10:33:42AM +0200, Cornelia Huck wrote:
> On Tue, Apr 05 2022, "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Apr 04, 2022 at 05:39:24PM +0200, Cornelia Huck wrote:
> >> On Mon, Mar 07 2022, "Michael S. Tsirkin"  wrote:
> >> 
> >> > On Mon, Mar 07, 2022 at 10:42:30AM +0800, zhenwei pi wrote:
> >> >> Hi, Michael & Lei
> >> >> 
> >> >> The full patchset has been reviewed by Gonglei, thanks to Gonglei.
> >> >> Should I modify the virtio crypto specification(use "__le32 
> >> >> akcipher_algo;"
> >> >> instead of "__le32 reserve;" only, see v1->v2 change), and start a new 
> >> >> issue
> >> >> for a revoting procedure?
> >> >
> >> > You can but not it probably will be deferred to 1.3. OK with you?
> >> >
> >> >> Also cc Cornelia Huck.
> >> 
> >> [Apologies, I'm horribly behind on my email backlog, and on virtio
> >> things in general :(]
> >> 
> >> The akcipher update had been deferred for 1.2, so I think it will be 1.3
> >> material. However, I just noticed while browsing the fine lwn.net merge
> >> window summary that this seems to have been merged already. That
> >> situation is less than ideal, although I don't expect any really bad
> >> problems, given that there had not been any negative feedback for the
> >> spec proposal that I remember.
> >
> > Let's open a 1.3 branch? What do you think?
> 
> Yes, that's probably best, before things start piling up.

OK, want to do it? And we can then start voting on 1.3 things
straight away.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 15/27] block: use bdev_alignment_offset in part_alignment_offset_show

2022-04-06 Thread Alan Robinson
Hi Christoph,

On Wed, Apr 06, 2022 at 06:05:04AM +, Christoph Hellwig wrote:
> From: Christoph Hellwig 
> Subject: [PATCH 15/27] block: use bdev_alignment_offset in
>  part_alignment_offset_show
> 
> Replace the open coded offset calculation with the proper helper.
> This is an ABI change in that the -1 for a misaligned partition is
> properly propagated, which can be considered a bug fix and maches

s/maches/matches/

> what is done on the whole device.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/partitions/core.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index 2ef8dfa1e5c85..240b3fff521e4 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -200,11 +200,7 @@ static ssize_t part_ro_show(struct device *dev,
>  static ssize_t part_alignment_offset_show(struct device *dev,
> struct device_attribute *attr, char 
> *buf)
>  {
> - struct block_device *bdev = dev_to_bdev(dev);
> -
> - return sprintf(buf, "%u\n",
> - queue_limit_alignment_offset(_get_queue(bdev)->limits,
> - bdev->bd_start_sect));
> + return sprintf(buf, "%u\n", bdev_alignment_offset(dev_to_bdev(dev)));

Should this now be %d instead of %u, there are one or two examples of
both in the rest of the patch series.

Alan

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 3/5] iommu: Introduce the domain op enforce_cache_coherency()

2022-04-06 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, April 6, 2022 12:16 AM
> 
> This new mechanism will replace using IOMMU_CAP_CACHE_COHERENCY
> and
> IOMMU_CACHE to control the no-snoop blocking behavior of the IOMMU.
> 
> Currently only Intel and AMD IOMMUs are known to support this
> feature. They both implement it as an IOPTE bit, that when set, will cause
> PCIe TLPs to that IOVA with the no-snoop bit set to be treated as though
> the no-snoop bit was clear.
> 
> The new API is triggered by calling enforce_cache_coherency() before
> mapping any IOVA to the domain which globally switches on no-snoop
> blocking. This allows other implementations that might block no-snoop
> globally and outside the IOPTE - AMD also documents such an HW capability.
> 
> Leave AMD out of sync with Intel and have it block no-snoop even for
> in-kernel users. This can be trivially resolved in a follow up patch.
> 
> Only VFIO will call this new API.

Is it too restrictive? In theory vdpa may also implement a contract with
KVM and then wants to call this new API too?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 2/5] vfio: Require that devices support DMA cache coherence

2022-04-06 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, April 6, 2022 3:29 AM
> 
> On Tue, Apr 05, 2022 at 01:10:44PM -0600, Alex Williamson wrote:
> > On Tue,  5 Apr 2022 13:16:01 -0300
> > Jason Gunthorpe  wrote:
> >
> > > dev_is_dma_coherent() is the control to determine if IOMMU_CACHE can
> be
> > > supported.
> > >
> > > IOMMU_CACHE means that normal DMAs do not require any additional
> coherency
> > > mechanism and is the basic uAPI that VFIO exposes to userspace. For
> > > instance VFIO applications like DPDK will not work if additional coherency
> > > operations are required.
> > >
> > > Therefore check dev_is_dma_coherent() before allowing a device to join
> a
> > > domain. This will block device/platform/iommu combinations from using
> VFIO
> > > that do not support cache coherent DMA.
> > >
> > > Signed-off-by: Jason Gunthorpe 
> > >  drivers/vfio/vfio.c | 6 ++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > > index a4555014bd1e72..2a3aa3e742d943 100644
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -32,6 +32,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include "vfio.h"
> > >
> > >  #define DRIVER_VERSION   "0.3"
> > > @@ -1348,6 +1349,11 @@ static int vfio_group_get_device_fd(struct
> vfio_group *group, char *buf)
> > >   if (IS_ERR(device))
> > >   return PTR_ERR(device);
> > >
> > > + if (group->type == VFIO_IOMMU && !dev_is_dma_coherent(device-
> >dev)) {
> > > + ret = -ENODEV;
> > > + goto err_device_put;
> > > + }
> > > +
> >
> > Failing at the point where the user is trying to gain access to the
> > device seems a little late in the process and opaque, wouldn't we
> > rather have vfio bus drivers fail to probe such devices?  I'd expect
> > this to occur in the vfio_register_group_dev() path.  Thanks,
> 
> Yes, that is a good point.
> 
> So like this:
> 
>  int vfio_register_group_dev(struct vfio_device *device)
>  {
> +   if (!dev_is_dma_coherent(device->dev))
> +   return -EINVAL;
> +
> return __vfio_register_dev(device,
> vfio_group_find_or_alloc(device->dev));
>  }
> 
> I fixed it up.
> 

if that is the case should it also apply to usnic and vdpa in the first
patch (i.e. fail the probe)?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 0/5] Make the iommu driver no-snoop block feature consistent

2022-04-06 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, April 6, 2022 12:16 AM
> 
> PCIe defines a 'no-snoop' bit in each the TLP which is usually implemented
> by a platform as bypassing elements in the DMA coherent CPU cache
> hierarchy. A driver can command a device to set this bit on some of its
> transactions as a micro-optimization.
> 
> However, the driver is now responsible to synchronize the CPU cache with
> the DMA that bypassed it. On x86 this is done through the wbinvd
> instruction, and the i915 GPU driver is the only Linux DMA driver that
> calls it.

More accurately x86 supports both unprivileged clflush instructions
to invalidate one cacheline and a privileged wbinvd instruction to
invalidate the entire cache. Replacing 'this is done' with 'this may
be done' is clearer.

> 
> The problem comes that KVM on x86 will normally disable the wbinvd
> instruction in the guest and render it a NOP. As the driver running in the
> guest is not aware the wbinvd doesn't work it may still cause the device
> to set the no-snoop bit and the platform will bypass the CPU cache.
> Without a working wbinvd there is no way to re-synchronize the CPU cache
> and the driver in the VM has data corruption.
> 
> Thus, we see a general direction on x86 that the IOMMU HW is able to block
> the no-snoop bit in the TLP. This NOP's the optimization and allows KVM to
> to NOP the wbinvd without causing any data corruption.
> 
> This control for Intel IOMMU was exposed by using IOMMU_CACHE and
> IOMMU_CAP_CACHE_COHERENCY, however these two values now have
> multiple
> meanings and usages beyond blocking no-snoop and the whole thing has
> become confused.

Also point out your finding about AMD IOMMU?

> 
> Change it so that:
>  - IOMMU_CACHE is only about the DMA coherence of normal DMAs from a
>device. It is used by the DMA API and set when the DMA API will not be
>doing manual cache coherency operations.
> 
>  - dev_is_dma_coherent() indicates if IOMMU_CACHE can be used with the
>device
> 
>  - The new optional domain op enforce_cache_coherency() will cause the
>entire domain to block no-snoop requests - ie there is no way for any
>device attached to the domain to opt out of the IOMMU_CACHE behavior.
> 
> An iommu driver should implement enforce_cache_coherency() so that by
> default domains allow the no-snoop optimization. This leaves it available
> to kernel drivers like i915. VFIO will call enforce_cache_coherency()
> before establishing any mappings and the domain should then permanently
> block no-snoop.
> 
> If enforce_cache_coherency() fails VFIO will communicate back through to
> KVM into the arch code via kvm_arch_register_noncoherent_dma()
> (only implemented by x86) which triggers a working wbinvd to be made
> available to the VM.
> 
> While other arches are certainly welcome to implement
> enforce_cache_coherency(), it is not clear there is any benefit in doing
> so.
> 
> After this series there are only two calls left to iommu_capable() with a
> bus argument which should help Robin's work here.
> 
> This is on github:
> https://github.com/jgunthorpe/linux/commits/intel_no_snoop
> 
> Cc: "Tian, Kevin" 
> Cc: Robin Murphy 
> Cc: Alex Williamson 
> Cc: Christoph Hellwig 
> Signed-off-by: Jason Gunthorpe 
> 
> Jason Gunthorpe (5):
>   iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with
> dev_is_dma_coherent()
>   vfio: Require that devices support DMA cache coherence
>   iommu: Introduce the domain op enforce_cache_coherency()
>   vfio: Move the Intel no-snoop control off of IOMMU_CACHE
>   iommu: Delete IOMMU_CAP_CACHE_COHERENCY
> 
>  drivers/infiniband/hw/usnic/usnic_uiom.c| 16 +--
>  drivers/iommu/amd/iommu.c   |  9 +--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  2 --
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   |  6 -
>  drivers/iommu/arm/arm-smmu/qcom_iommu.c |  6 -
>  drivers/iommu/fsl_pamu_domain.c |  6 -
>  drivers/iommu/intel/iommu.c | 15 ---
>  drivers/iommu/s390-iommu.c  |  2 --
>  drivers/vfio/vfio.c |  6 +
>  drivers/vfio/vfio_iommu_type1.c | 30 +
>  drivers/vhost/vdpa.c|  3 ++-
>  include/linux/intel-iommu.h |  1 +
>  include/linux/iommu.h   |  6 +++--
>  13 files changed, 58 insertions(+), 50 deletions(-)
> 
> 
> base-commit: 3123109284176b1532874591f7c81f3837bbdc17
> --
> 2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 27/27] direct-io: remove random prefetches

2022-04-06 Thread Christoph Hellwig
Randomly poking into block device internals for manual prefetches isn't
exactly a very maintainable thing to do.  And none of the performance
criticil direct I/O implementations still use this library function
anyway, so just drop it.

Signed-off-by: Christoph Hellwig 
---
 fs/direct-io.c | 32 
 1 file changed, 4 insertions(+), 28 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index aef06e607b405..840752006f601 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1115,11 +1115,10 @@ static inline int drop_refcount(struct dio *dio)
  * individual fields and will generate much worse code. This is important
  * for the whole file.
  */
-static inline ssize_t
-do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
- struct block_device *bdev, struct iov_iter *iter,
- get_block_t get_block, dio_iodone_t end_io,
- dio_submit_t submit_io, int flags)
+ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
+   struct block_device *bdev, struct iov_iter *iter,
+   get_block_t get_block, dio_iodone_t end_io,
+   dio_submit_t submit_io, int flags)
 {
unsigned i_blkbits = READ_ONCE(inode->i_blkbits);
unsigned blkbits = i_blkbits;
@@ -1334,29 +1333,6 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode 
*inode,
kmem_cache_free(dio_cache, dio);
return retval;
 }
-
-ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
-struct block_device *bdev, struct iov_iter *iter,
-get_block_t get_block,
-dio_iodone_t end_io, dio_submit_t submit_io,
-int flags)
-{
-   /*
-* The block device state is needed in the end to finally
-* submit everything.  Since it's likely to be cache cold
-* prefetch it here as first thing to hide some of the
-* latency.
-*
-* Attempt to prefetch the pieces we likely need later.
-*/
-   prefetch(>bd_disk->part_tbl);
-   prefetch(bdev->bd_disk->queue);
-   prefetch((char *)bdev->bd_disk->queue + SMP_CACHE_BYTES);
-
-   return do_blockdev_direct_IO(iocb, inode, bdev, iter, get_block,
-end_io, submit_io, flags);
-}
-
 EXPORT_SYMBOL(__blockdev_direct_IO);
 
 static __init int dio_init(void)
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 26/27] block: uncouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD

2022-04-06 Thread Christoph Hellwig
Secure erase is a very different operation from discard in that it is
a data integrity operation vs hint.  Fully split the limits and helper
infrastructure to make the separation more clear.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c|  2 +-
 block/blk-lib.c | 64 -
 block/blk-mq-debugfs.c  |  1 -
 block/blk-settings.c| 16 +++-
 block/fops.c|  2 +-
 block/ioctl.c   | 43 +++
 drivers/block/drbd/drbd_receiver.c  |  5 ++-
 drivers/block/rnbd/rnbd-clt.c   |  4 +-
 drivers/block/rnbd/rnbd-srv-dev.h   |  2 +-
 drivers/block/xen-blkback/blkback.c | 15 +++
 drivers/block/xen-blkback/xenbus.c  |  5 +--
 drivers/block/xen-blkfront.c|  5 ++-
 drivers/md/bcache/alloc.c   |  2 +-
 drivers/md/dm-table.c   |  8 ++--
 drivers/md/dm-thin.c|  4 +-
 drivers/md/md.c |  2 +-
 drivers/md/raid5-cache.c|  6 +--
 drivers/mmc/core/queue.c|  2 +-
 drivers/nvme/target/io-cmd-bdev.c   |  2 +-
 drivers/target/target_core_file.c   |  2 +-
 drivers/target/target_core_iblock.c |  2 +-
 fs/btrfs/extent-tree.c  |  4 +-
 fs/ext4/mballoc.c   |  2 +-
 fs/f2fs/file.c  | 16 
 fs/f2fs/segment.c   |  2 +-
 fs/jbd2/journal.c   |  2 +-
 fs/nilfs2/sufile.c  |  4 +-
 fs/nilfs2/the_nilfs.c   |  4 +-
 fs/ntfs3/super.c|  2 +-
 fs/xfs/xfs_discard.c|  2 +-
 fs/xfs/xfs_log_cil.c|  2 +-
 include/linux/blkdev.h  | 27 +++-
 mm/swapfile.c   |  6 +--
 33 files changed, 168 insertions(+), 99 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b5c3a8049134c..ee18b6a699bdf 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -824,7 +824,7 @@ void submit_bio_noacct(struct bio *bio)
goto not_supported;
break;
case REQ_OP_SECURE_ERASE:
-   if (!blk_queue_secure_erase(q))
+   if (!bdev_max_secure_erase_sectors(bdev))
goto not_supported;
break;
case REQ_OP_ZONE_APPEND:
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 43aa4d7fe859f..09b7e1200c0f4 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -36,26 +36,15 @@ static sector_t bio_discard_limit(struct block_device 
*bdev, sector_t sector)
 }
 
 int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
-   sector_t nr_sects, gfp_t gfp_mask, int flags,
-   struct bio **biop)
+   sector_t nr_sects, gfp_t gfp_mask, struct bio **biop)
 {
-   struct request_queue *q = bdev_get_queue(bdev);
struct bio *bio = *biop;
-   unsigned int op;
sector_t bs_mask;
 
if (bdev_read_only(bdev))
return -EPERM;
-
-   if (flags & BLKDEV_DISCARD_SECURE) {
-   if (!blk_queue_secure_erase(q))
-   return -EOPNOTSUPP;
-   op = REQ_OP_SECURE_ERASE;
-   } else {
-   if (!bdev_max_discard_sectors(bdev))
-   return -EOPNOTSUPP;
-   op = REQ_OP_DISCARD;
-   }
+   if (!bdev_max_discard_sectors(bdev))
+   return -EOPNOTSUPP;
 
/* In case the discard granularity isn't set by buggy device driver */
if (WARN_ON_ONCE(!bdev_discard_granularity(bdev))) {
@@ -77,7 +66,7 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
sector_t req_sects =
min(nr_sects, bio_discard_limit(bdev, sector));
 
-   bio = blk_next_bio(bio, bdev, 0, op, gfp_mask);
+   bio = blk_next_bio(bio, bdev, 0, REQ_OP_DISCARD, gfp_mask);
bio->bi_iter.bi_sector = sector;
bio->bi_iter.bi_size = req_sects << 9;
sector += req_sects;
@@ -103,21 +92,19 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
  * @sector:start sector
  * @nr_sects:  number of sectors to discard
  * @gfp_mask:  memory allocation flags (for bio_alloc)
- * @flags: BLKDEV_DISCARD_* flags to control behaviour
  *
  * Description:
  *Issue a discard request for the sectors in question.
  */
 int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
-   sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
+   sector_t nr_sects, gfp_t gfp_mask)
 {
struct bio *bio = NULL;
struct blk_plug plug;
int ret;
 
blk_start_plug();
-   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, flags,
-   );
+   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, );
if (!ret && bio) {
ret = submit_bio_wait(bio);
if (ret 

[PATCH 24/27] block: add a bdev_discard_granularity helper

2022-04-06 Thread Christoph Hellwig
Abstract away implementation details from file systems by providing a
block_device based helper to retreive the discard granularity.

Signed-off-by: Christoph Hellwig 
---
 block/blk-lib.c |  5 ++---
 drivers/block/drbd/drbd_nl.c|  9 +
 drivers/block/drbd/drbd_receiver.c  |  3 +--
 drivers/block/loop.c|  2 +-
 drivers/target/target_core_device.c |  3 +--
 fs/btrfs/ioctl.c| 12 
 fs/exfat/file.c |  3 +--
 fs/ext4/mballoc.c   |  6 +++---
 fs/f2fs/file.c  |  3 +--
 fs/fat/file.c   |  3 +--
 fs/gfs2/rgrp.c  |  7 +++
 fs/jfs/ioctl.c  |  3 +--
 fs/nilfs2/ioctl.c   |  4 ++--
 fs/ntfs3/file.c |  4 ++--
 fs/ntfs3/super.c|  6 ++
 fs/ocfs2/ioctl.c|  3 +--
 fs/xfs/xfs_discard.c|  4 ++--
 include/linux/blkdev.h  |  5 +
 18 files changed, 38 insertions(+), 47 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 8b4b66d3a9bfc..43aa4d7fe859f 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -12,8 +12,7 @@
 
 static sector_t bio_discard_limit(struct block_device *bdev, sector_t sector)
 {
-   unsigned int discard_granularity =
-   bdev_get_queue(bdev)->limits.discard_granularity;
+   unsigned int discard_granularity = bdev_discard_granularity(bdev);
sector_t granularity_aligned_sector;
 
if (bdev_is_partition(bdev))
@@ -59,7 +58,7 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
}
 
/* In case the discard granularity isn't set by buggy device driver */
-   if (WARN_ON_ONCE(!q->limits.discard_granularity)) {
+   if (WARN_ON_ONCE(!bdev_discard_granularity(bdev))) {
char dev_name[BDEVNAME_SIZE];
 
bdevname(bdev, dev_name);
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 8e28e0a8e5e41..94ac3737723a8 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1440,7 +1440,6 @@ static void sanitize_disk_conf(struct drbd_device 
*device, struct disk_conf *dis
   struct drbd_backing_dev *nbc)
 {
struct block_device *bdev = nbc->backing_bdev;
-   struct request_queue *q = bdev->bd_disk->queue;
 
if (disk_conf->al_extents < DRBD_AL_EXTENTS_MIN)
disk_conf->al_extents = DRBD_AL_EXTENTS_MIN;
@@ -1457,12 +1456,14 @@ static void sanitize_disk_conf(struct drbd_device 
*device, struct disk_conf *dis
if (disk_conf->rs_discard_granularity) {
int orig_value = disk_conf->rs_discard_granularity;
sector_t discard_size = bdev_max_discard_sectors(bdev) << 9;
+   unsigned int discard_granularity = 
bdev_discard_granularity(bdev);
int remainder;
 
-   if (q->limits.discard_granularity > 
disk_conf->rs_discard_granularity)
-   disk_conf->rs_discard_granularity = 
q->limits.discard_granularity;
+   if (discard_granularity > disk_conf->rs_discard_granularity)
+   disk_conf->rs_discard_granularity = discard_granularity;
 
-   remainder = disk_conf->rs_discard_granularity % 
q->limits.discard_granularity;
+   remainder = disk_conf->rs_discard_granularity %
+   discard_granularity;
disk_conf->rs_discard_granularity += remainder;
 
if (disk_conf->rs_discard_granularity > discard_size)
diff --git a/drivers/block/drbd/drbd_receiver.c 
b/drivers/block/drbd/drbd_receiver.c
index 8a4a47da56fe9..275c53c7b629e 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1511,7 +1511,6 @@ void drbd_bump_write_ordering(struct drbd_resource 
*resource, struct drbd_backin
 int drbd_issue_discard_or_zero_out(struct drbd_device *device, sector_t start, 
unsigned int nr_sectors, int flags)
 {
struct block_device *bdev = device->ldev->backing_bdev;
-   struct request_queue *q = bdev_get_queue(bdev);
sector_t tmp, nr;
unsigned int max_discard_sectors, granularity;
int alignment;
@@ -1521,7 +1520,7 @@ int drbd_issue_discard_or_zero_out(struct drbd_device 
*device, sector_t start, u
goto zero_out;
 
/* Zero-sector (unknown) and one-sector granularities are the same.  */
-   granularity = max(q->limits.discard_granularity >> 9, 1U);
+   granularity = max(bdev_discard_granularity(bdev) >> 9, 1U);
alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
 
max_discard_sectors = min(bdev_max_discard_sectors(bdev), (1U << 22));
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 4b919b75205a7..d5499795a1fec 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -759,7 

[PATCH 25/27] block: remove QUEUE_FLAG_DISCARD

2022-04-06 Thread Christoph Hellwig
Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.

The only places where needs special attention is the RAID5 driver,
which must clear discard support for security reasons by default,
even if the default stacking rules would allow for it.

Signed-off-by: Christoph Hellwig 
---
 arch/um/drivers/ubd_kern.c|  2 --
 block/blk-mq-debugfs.c|  1 -
 drivers/block/drbd/drbd_nl.c  | 15 ---
 drivers/block/loop.c  |  2 --
 drivers/block/nbd.c   |  3 ---
 drivers/block/null_blk/main.c |  1 -
 drivers/block/rbd.c   |  1 -
 drivers/block/rnbd/rnbd-clt.c |  2 --
 drivers/block/virtio_blk.c|  2 --
 drivers/block/xen-blkfront.c  |  2 --
 drivers/block/zram/zram_drv.c |  1 -
 drivers/md/bcache/super.c |  1 -
 drivers/md/dm-table.c |  5 +
 drivers/md/dm-thin.c  |  2 --
 drivers/md/dm.c   |  1 -
 drivers/md/md-linear.c|  9 -
 drivers/md/raid0.c|  7 ---
 drivers/md/raid1.c| 14 --
 drivers/md/raid10.c   | 14 --
 drivers/md/raid5.c| 12 
 drivers/mmc/core/queue.c  |  1 -
 drivers/mtd/mtd_blkdevs.c |  1 -
 drivers/nvme/host/core.c  |  6 ++
 drivers/s390/block/dasd_fba.c |  1 -
 drivers/scsi/sd.c |  2 --
 include/linux/blkdev.h|  2 --
 26 files changed, 7 insertions(+), 103 deletions(-)

diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index b03269faef714..085ffdf98e57e 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -483,7 +483,6 @@ static void ubd_handler(void)
if ((io_req->error == BLK_STS_NOTSUPP) && 
(req_op(io_req->req) == REQ_OP_DISCARD)) {
blk_queue_max_discard_sectors(io_req->req->q, 
0);

blk_queue_max_write_zeroes_sectors(io_req->req->q, 0);
-   blk_queue_flag_clear(QUEUE_FLAG_DISCARD, 
io_req->req->q);
}
blk_mq_end_request(io_req->req, io_req->error);
kfree(io_req);
@@ -803,7 +802,6 @@ static int ubd_open_dev(struct ubd *ubd_dev)
ubd_dev->queue->limits.discard_alignment = SECTOR_SIZE;
blk_queue_max_discard_sectors(ubd_dev->queue, UBD_MAX_REQUEST);
blk_queue_max_write_zeroes_sectors(ubd_dev->queue, 
UBD_MAX_REQUEST);
-   blk_queue_flag_set(QUEUE_FLAG_DISCARD, ubd_dev->queue);
}
blk_queue_flag_set(QUEUE_FLAG_NONROT, ubd_dev->queue);
return 0;
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index aa0349e9f083b..fd111c5001256 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -113,7 +113,6 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(FAIL_IO),
QUEUE_FLAG_NAME(NONROT),
QUEUE_FLAG_NAME(IO_STAT),
-   QUEUE_FLAG_NAME(DISCARD),
QUEUE_FLAG_NAME(NOXMERGES),
QUEUE_FLAG_NAME(ADD_RANDOM),
QUEUE_FLAG_NAME(SECERASE),
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 94ac3737723a8..0b3e43be6414d 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1230,30 +1230,16 @@ static void decide_on_discard_support(struct 
drbd_device *device,
 */
blk_queue_discard_granularity(q, 512);
q->limits.max_discard_sectors = drbd_max_discard_sectors(connection);
-   blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
q->limits.max_write_zeroes_sectors =
drbd_max_discard_sectors(connection);
return;
 
 not_supported:
-   blk_queue_flag_clear(QUEUE_FLAG_DISCARD, q);
blk_queue_discard_granularity(q, 0);
q->limits.max_discard_sectors = 0;
q->limits.max_write_zeroes_sectors = 0;
 }
 
-static void fixup_discard_if_not_supported(struct request_queue *q)
-{
-   /* To avoid confusion, if this queue does not support discard, clear
-* max_discard_sectors, which is what lsblk -D reports to the user.
-* Older kernels got this wrong in "stack limits".
-* */
-   if (!blk_queue_discard(q)) {
-   blk_queue_max_discard_sectors(q, 0);
-   blk_queue_discard_granularity(q, 0);
-   }
-}
-
 static void fixup_write_zeroes(struct drbd_device *device, struct 
request_queue *q)
 {
/* Fixup max_write_zeroes_sectors after blk_stack_limits():
@@ -1300,7 +1286,6 @@ static void drbd_setup_queue_param(struct drbd_device 
*device, struct drbd_backi
blk_stack_limits(>limits, >limits, 0);
disk_update_readahead(device->vdisk);
}
-   fixup_discard_if_not_supported(q);
fixup_write_zeroes(device, q);
 }
 
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d5499795a1fec..976cf987b3920 100644
--- a/drivers/block/loop.c
+++ 

[PATCH 23/27] block: add a bdev_max_discard_sectors helper

2022-04-06 Thread Christoph Hellwig
Add a helper to query the number of sectors support per each discard bio
based on the block device and use this helper to stop various places from
poking into the request_queue to see if discard is supported and if so how
much.  This mirrors what is done e.g. for write zeroes as well.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c|  2 +-
 block/blk-lib.c |  2 +-
 block/ioctl.c   |  3 +--
 drivers/block/drbd/drbd_main.c  |  2 +-
 drivers/block/drbd/drbd_nl.c| 12 +++-
 drivers/block/drbd/drbd_receiver.c  |  5 ++---
 drivers/block/loop.c|  9 +++--
 drivers/block/rnbd/rnbd-srv-dev.h   |  6 +-
 drivers/block/xen-blkback/xenbus.c  |  2 +-
 drivers/md/bcache/request.c |  4 ++--
 drivers/md/bcache/super.c   |  2 +-
 drivers/md/bcache/sysfs.c   |  2 +-
 drivers/md/dm-cache-target.c|  9 +
 drivers/md/dm-clone-target.c|  9 +
 drivers/md/dm-io.c  |  2 +-
 drivers/md/dm-log-writes.c  |  3 +--
 drivers/md/dm-raid.c|  9 ++---
 drivers/md/dm-table.c   |  4 +---
 drivers/md/dm-thin.c|  9 +
 drivers/md/dm.c |  2 +-
 drivers/md/md-linear.c  |  4 ++--
 drivers/md/raid0.c  |  2 +-
 drivers/md/raid1.c  |  6 +++---
 drivers/md/raid10.c |  8 
 drivers/md/raid5-cache.c|  2 +-
 drivers/target/target_core_device.c |  8 +++-
 fs/btrfs/extent-tree.c  |  4 ++--
 fs/btrfs/ioctl.c|  2 +-
 fs/exfat/file.c |  2 +-
 fs/exfat/super.c| 10 +++---
 fs/ext4/ioctl.c | 10 +++---
 fs/ext4/super.c | 10 +++---
 fs/f2fs/f2fs.h  |  3 +--
 fs/f2fs/segment.c   |  6 ++
 fs/fat/file.c   |  2 +-
 fs/fat/inode.c  | 10 +++---
 fs/gfs2/rgrp.c  |  2 +-
 fs/jbd2/journal.c   |  7 ++-
 fs/jfs/ioctl.c  |  2 +-
 fs/jfs/super.c  |  8 ++--
 fs/nilfs2/ioctl.c   |  2 +-
 fs/ntfs3/file.c |  2 +-
 fs/ntfs3/super.c|  2 +-
 fs/ocfs2/ioctl.c|  2 +-
 fs/xfs/xfs_discard.c|  2 +-
 fs/xfs/xfs_super.c  | 12 
 include/linux/blkdev.h  |  5 +
 mm/swapfile.c   | 17 ++---
 48 files changed, 87 insertions(+), 163 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 937bb6b863317..b5c3a8049134c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -820,7 +820,7 @@ void submit_bio_noacct(struct bio *bio)
 
switch (bio_op(bio)) {
case REQ_OP_DISCARD:
-   if (!blk_queue_discard(q))
+   if (!bdev_max_discard_sectors(bdev))
goto not_supported;
break;
case REQ_OP_SECURE_ERASE:
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 2ae32a722851c..8b4b66d3a9bfc 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -53,7 +53,7 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
return -EOPNOTSUPP;
op = REQ_OP_SECURE_ERASE;
} else {
-   if (!blk_queue_discard(q))
+   if (!bdev_max_discard_sectors(bdev))
return -EOPNOTSUPP;
op = REQ_OP_DISCARD;
}
diff --git a/block/ioctl.c b/block/ioctl.c
index ad3771b268b81..c2cd3ba5290ce 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -87,14 +87,13 @@ static int blk_ioctl_discard(struct block_device *bdev, 
fmode_t mode,
 {
uint64_t range[2];
uint64_t start, len;
-   struct request_queue *q = bdev_get_queue(bdev);
struct inode *inode = bdev->bd_inode;
int err;
 
if (!(mode & FMODE_WRITE))
return -EBADF;
 
-   if (!blk_queue_discard(q))
+   if (!bdev_max_discard_sectors(bdev))
return -EOPNOTSUPP;
 
if (copy_from_user(range, (void __user *)arg, sizeof(range)))
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 9d43aadde19ad..8fd89a1b0b7b3 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -942,7 +942,7 @@ int drbd_send_sizes(struct drbd_peer_device *peer_device, 
int trigger_reply, enu
cpu_to_be32(bdev_alignment_offset(bdev));
p->qlim->io_min = cpu_to_be32(bdev_io_min(bdev));
p->qlim->io_opt = cpu_to_be32(bdev_io_opt(bdev));
-   p->qlim->discard_enabled = blk_queue_discard(q);
+   p->qlim->discard_enabled = !!bdev_max_discard_sectors(bdev);
p->qlim->write_same_capable = 0;

[PATCH 22/27] block: refactor discard bio size limiting

2022-04-06 Thread Christoph Hellwig
Move all the logic to limit the discard bio size into a common helper
so that it is better documented.

Signed-off-by: Christoph Hellwig 
---
 block/blk-lib.c | 59 -
 block/blk.h | 14 
 2 files changed, 29 insertions(+), 44 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 237d60d8b5857..2ae32a722851c 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -10,6 +10,32 @@
 
 #include "blk.h"
 
+static sector_t bio_discard_limit(struct block_device *bdev, sector_t sector)
+{
+   unsigned int discard_granularity =
+   bdev_get_queue(bdev)->limits.discard_granularity;
+   sector_t granularity_aligned_sector;
+
+   if (bdev_is_partition(bdev))
+   sector += bdev->bd_start_sect;
+
+   granularity_aligned_sector =
+   round_up(sector, discard_granularity >> SECTOR_SHIFT);
+
+   /*
+* Make sure subsequent bios start aligned to the discard granularity if
+* it needs to be split.
+*/
+   if (granularity_aligned_sector != sector)
+   return granularity_aligned_sector - sector;
+
+   /*
+* Align the bio size to the discard granularity to make splitting the 
bio
+* at discard granularity boundaries easier in the driver if needed.
+*/
+   return round_down(UINT_MAX, discard_granularity) >> SECTOR_SHIFT;
+}
+
 int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, int flags,
struct bio **biop)
@@ -17,7 +43,7 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
struct request_queue *q = bdev_get_queue(bdev);
struct bio *bio = *biop;
unsigned int op;
-   sector_t bs_mask, part_offset = 0;
+   sector_t bs_mask;
 
if (bdev_read_only(bdev))
return -EPERM;
@@ -48,36 +74,9 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
if (!nr_sects)
return -EINVAL;
 
-   /* In case the discard request is in a partition */
-   if (bdev_is_partition(bdev))
-   part_offset = bdev->bd_start_sect;
-
while (nr_sects) {
-   sector_t granularity_aligned_lba, req_sects;
-   sector_t sector_mapped = sector + part_offset;
-
-   granularity_aligned_lba = round_up(sector_mapped,
-   q->limits.discard_granularity >> SECTOR_SHIFT);
-
-   /*
-* Check whether the discard bio starts at a discard_granularity
-* aligned LBA,
-* - If no: set (granularity_aligned_lba - sector_mapped) to
-*   bi_size of the first split bio, then the second bio will
-*   start at a discard_granularity aligned LBA on the device.
-* - If yes: use bio_aligned_discard_max_sectors() as the max
-*   possible bi_size of the first split bio. Then when this bio
-*   is split in device drive, the split ones are very probably
-*   to be aligned to discard_granularity of the device's queue.
-*/
-   if (granularity_aligned_lba == sector_mapped)
-   req_sects = min_t(sector_t, nr_sects,
- bio_aligned_discard_max_sectors(q));
-   else
-   req_sects = min_t(sector_t, nr_sects,
- granularity_aligned_lba - 
sector_mapped);
-
-   WARN_ON_ONCE((req_sects << 9) > UINT_MAX);
+   sector_t req_sects =
+   min(nr_sects, bio_discard_limit(bdev, sector));
 
bio = blk_next_bio(bio, bdev, 0, op, gfp_mask);
bio->bi_iter.bi_sector = sector;
diff --git a/block/blk.h b/block/blk.h
index 8ccbc6e076369..1fdc1d28e6d60 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -346,20 +346,6 @@ static inline unsigned int bio_allowed_max_sectors(struct 
request_queue *q)
return round_down(UINT_MAX, queue_logical_block_size(q)) >> 9;
 }
 
-/*
- * The max bio size which is aligned to q->limits.discard_granularity. This
- * is a hint to split large discard bio in generic block layer, then if device
- * driver needs to split the discard bio into smaller ones, their bi_size can
- * be very probably and easily aligned to discard_granularity of the device's
- * queue.
- */
-static inline unsigned int bio_aligned_discard_max_sectors(
-   struct request_queue *q)
-{
-   return round_down(UINT_MAX, q->limits.discard_granularity) >>
-   SECTOR_SHIFT;
-}
-
 /*
  * Internal io_context interface
  */
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 21/27] block: move {bdev, queue_limit}_discard_alignment out of line

2022-04-06 Thread Christoph Hellwig
No need to inline these fairly larger helpers.  Also fix the return value
to be unsigned, just like the field in struct queue_limits.

Signed-off-by: Christoph Hellwig 
---
 block/blk-settings.c   | 35 +++
 include/linux/blkdev.h | 34 +-
 2 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 94410a13c0dee..fd83d674afd0a 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -478,6 +478,30 @@ static int queue_limit_alignment_offset(struct 
queue_limits *lim,
return (granularity + lim->alignment_offset - alignment) % granularity;
 }
 
+static unsigned int queue_limit_discard_alignment(struct queue_limits *lim,
+   sector_t sector)
+{
+   unsigned int alignment, granularity, offset;
+
+   if (!lim->max_discard_sectors)
+   return 0;
+
+   /* Why are these in bytes, not sectors? */
+   alignment = lim->discard_alignment >> SECTOR_SHIFT;
+   granularity = lim->discard_granularity >> SECTOR_SHIFT;
+   if (!granularity)
+   return 0;
+
+   /* Offset of the partition start in 'granularity' sectors */
+   offset = sector_div(sector, granularity);
+
+   /* And why do we do this modulus *again* in blkdev_issue_discard()? */
+   offset = (granularity + alignment - offset) % granularity;
+
+   /* Turn it back into bytes, gaah */
+   return offset << SECTOR_SHIFT;
+}
+
 static unsigned int blk_round_down_sectors(unsigned int sectors, unsigned int 
lbs)
 {
sectors = round_down(sectors, lbs >> SECTOR_SHIFT);
@@ -924,3 +948,14 @@ int bdev_alignment_offset(struct block_device *bdev)
return q->limits.alignment_offset;
 }
 EXPORT_SYMBOL_GPL(bdev_alignment_offset);
+
+unsigned int bdev_discard_alignment(struct block_device *bdev)
+{
+   struct request_queue *q = bdev_get_queue(bdev);
+
+   if (bdev_is_partition(bdev))
+   return queue_limit_discard_alignment(>limits,
+   bdev->bd_start_sect);
+   return q->limits.discard_alignment;
+}
+EXPORT_SYMBOL_GPL(bdev_discard_alignment);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5a9b7aeda010b..34b1cfd067421 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1252,39 +1252,7 @@ bdev_zone_write_granularity(struct block_device *bdev)
 }
 
 int bdev_alignment_offset(struct block_device *bdev);
-
-static inline int queue_limit_discard_alignment(struct queue_limits *lim, 
sector_t sector)
-{
-   unsigned int alignment, granularity, offset;
-
-   if (!lim->max_discard_sectors)
-   return 0;
-
-   /* Why are these in bytes, not sectors? */
-   alignment = lim->discard_alignment >> SECTOR_SHIFT;
-   granularity = lim->discard_granularity >> SECTOR_SHIFT;
-   if (!granularity)
-   return 0;
-
-   /* Offset of the partition start in 'granularity' sectors */
-   offset = sector_div(sector, granularity);
-
-   /* And why do we do this modulus *again* in blkdev_issue_discard()? */
-   offset = (granularity + alignment - offset) % granularity;
-
-   /* Turn it back into bytes, gaah */
-   return offset << SECTOR_SHIFT;
-}
-
-static inline int bdev_discard_alignment(struct block_device *bdev)
-{
-   struct request_queue *q = bdev_get_queue(bdev);
-
-   if (bdev_is_partition(bdev))
-   return queue_limit_discard_alignment(>limits,
-   bdev->bd_start_sect);
-   return q->limits.discard_alignment;
-}
+unsigned int bdev_discard_alignment(struct block_device *bdev);
 
 static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev)
 {
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 20/27] block: use bdev_discard_alignment in part_discard_alignment_show

2022-04-06 Thread Christoph Hellwig
Use the bdev based alignment helper instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
 block/partitions/core.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/block/partitions/core.c b/block/partitions/core.c
index 240b3fff521e4..70dec1c78521d 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -206,11 +206,7 @@ static ssize_t part_alignment_offset_show(struct device 
*dev,
 static ssize_t part_discard_alignment_show(struct device *dev,
   struct device_attribute *attr, char 
*buf)
 {
-   struct block_device *bdev = dev_to_bdev(dev);
-
-   return sprintf(buf, "%u\n",
-   queue_limit_discard_alignment(_get_queue(bdev)->limits,
-   bdev->bd_start_sect));
+   return sprintf(buf, "%u\n", bdev_discard_alignment(dev_to_bdev(dev)));
 }
 
 static DEVICE_ATTR(partition, 0444, part_partition_show, NULL);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 19/27] block: remove queue_discard_alignment

2022-04-06 Thread Christoph Hellwig
Just use bdev_alignment_offset in disk_discard_alignment_show instead.
That helpers is the same except for an always false branch that doesn't
matter in this slow path.

Signed-off-by: Christoph Hellwig 
---
 block/genhd.c  | 2 +-
 include/linux/blkdev.h | 8 
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 712031ce19070..36532b9318419 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1019,7 +1019,7 @@ static ssize_t disk_discard_alignment_show(struct device 
*dev,
 {
struct gendisk *disk = dev_to_disk(dev);
 
-   return sprintf(buf, "%d\n", queue_discard_alignment(disk->queue));
+   return sprintf(buf, "%d\n", bdev_alignment_offset(disk->part0));
 }
 
 static ssize_t diskseq_show(struct device *dev,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0a1795ac26275..5a9b7aeda010b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1253,14 +1253,6 @@ bdev_zone_write_granularity(struct block_device *bdev)
 
 int bdev_alignment_offset(struct block_device *bdev);
 
-static inline int queue_discard_alignment(const struct request_queue *q)
-{
-   if (q->limits.discard_misaligned)
-   return -1;
-
-   return q->limits.discard_alignment;
-}
-
 static inline int queue_limit_discard_alignment(struct queue_limits *lim, 
sector_t sector)
 {
unsigned int alignment, granularity, offset;
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 18/27] block: move bdev_alignment_offset and queue_limit_alignment_offset out of line

2022-04-06 Thread Christoph Hellwig
No need to inline these fairly larger helpers.

Signed-off-by: Christoph Hellwig 
---
 block/blk-settings.c   | 23 +++
 include/linux/blkdev.h | 21 +
 2 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b83df3d2eebca..94410a13c0dee 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -468,6 +468,16 @@ void blk_queue_io_opt(struct request_queue *q, unsigned 
int opt)
 }
 EXPORT_SYMBOL(blk_queue_io_opt);
 
+static int queue_limit_alignment_offset(struct queue_limits *lim,
+   sector_t sector)
+{
+   unsigned int granularity = max(lim->physical_block_size, lim->io_min);
+   unsigned int alignment = sector_div(sector, granularity >> SECTOR_SHIFT)
+   << SECTOR_SHIFT;
+
+   return (granularity + lim->alignment_offset - alignment) % granularity;
+}
+
 static unsigned int blk_round_down_sectors(unsigned int sectors, unsigned int 
lbs)
 {
sectors = round_down(sectors, lbs >> SECTOR_SHIFT);
@@ -901,3 +911,16 @@ void blk_queue_set_zoned(struct gendisk *disk, enum 
blk_zoned_model model)
}
 }
 EXPORT_SYMBOL_GPL(blk_queue_set_zoned);
+
+int bdev_alignment_offset(struct block_device *bdev)
+{
+   struct request_queue *q = bdev_get_queue(bdev);
+
+   if (q->limits.misaligned)
+   return -1;
+   if (bdev_is_partition(bdev))
+   return queue_limit_alignment_offset(>limits,
+   bdev->bd_start_sect);
+   return q->limits.alignment_offset;
+}
+EXPORT_SYMBOL_GPL(bdev_alignment_offset);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d5346e72e3645..0a1795ac26275 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1251,26 +1251,7 @@ bdev_zone_write_granularity(struct block_device *bdev)
return queue_zone_write_granularity(bdev_get_queue(bdev));
 }
 
-static inline int queue_limit_alignment_offset(struct queue_limits *lim, 
sector_t sector)
-{
-   unsigned int granularity = max(lim->physical_block_size, lim->io_min);
-   unsigned int alignment = sector_div(sector, granularity >> SECTOR_SHIFT)
-   << SECTOR_SHIFT;
-
-   return (granularity + lim->alignment_offset - alignment) % granularity;
-}
-
-static inline int bdev_alignment_offset(struct block_device *bdev)
-{
-   struct request_queue *q = bdev_get_queue(bdev);
-
-   if (q->limits.misaligned)
-   return -1;
-   if (bdev_is_partition(bdev))
-   return queue_limit_alignment_offset(>limits,
-   bdev->bd_start_sect);
-   return q->limits.alignment_offset;
-}
+int bdev_alignment_offset(struct block_device *bdev);
 
 static inline int queue_discard_alignment(const struct request_queue *q)
 {
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 17/27] block: use bdev_alignment_offset in disk_alignment_offset_show

2022-04-06 Thread Christoph Hellwig
This does the same as the open coded variant except for an extra branch,
and allows to remove queue_alignment_offset entirely.

Signed-off-by: Christoph Hellwig 
---
 block/genhd.c  | 2 +-
 include/linux/blkdev.h | 8 
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index b8b6759d670f0..712031ce19070 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1010,7 +1010,7 @@ static ssize_t disk_alignment_offset_show(struct device 
*dev,
 {
struct gendisk *disk = dev_to_disk(dev);
 
-   return sprintf(buf, "%d\n", queue_alignment_offset(disk->queue));
+   return sprintf(buf, "%d\n", bdev_alignment_offset(disk->part0));
 }
 
 static ssize_t disk_discard_alignment_show(struct device *dev,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f8c50b77543eb..d5346e72e3645 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1251,14 +1251,6 @@ bdev_zone_write_granularity(struct block_device *bdev)
return queue_zone_write_granularity(bdev_get_queue(bdev));
 }
 
-static inline int queue_alignment_offset(const struct request_queue *q)
-{
-   if (q->limits.misaligned)
-   return -1;
-
-   return q->limits.alignment_offset;
-}
-
 static inline int queue_limit_alignment_offset(struct queue_limits *lim, 
sector_t sector)
 {
unsigned int granularity = max(lim->physical_block_size, lim->io_min);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 15/27] block: use bdev_alignment_offset in part_alignment_offset_show

2022-04-06 Thread Christoph Hellwig
Replace the open coded offset calculation with the proper helper.
This is an ABI change in that the -1 for a misaligned partition is
properly propagated, which can be considered a bug fix and maches
what is done on the whole device.

Signed-off-by: Christoph Hellwig 
---
 block/partitions/core.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/block/partitions/core.c b/block/partitions/core.c
index 2ef8dfa1e5c85..240b3fff521e4 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -200,11 +200,7 @@ static ssize_t part_ro_show(struct device *dev,
 static ssize_t part_alignment_offset_show(struct device *dev,
  struct device_attribute *attr, char 
*buf)
 {
-   struct block_device *bdev = dev_to_bdev(dev);
-
-   return sprintf(buf, "%u\n",
-   queue_limit_alignment_offset(_get_queue(bdev)->limits,
-   bdev->bd_start_sect));
+   return sprintf(buf, "%u\n", bdev_alignment_offset(dev_to_bdev(dev)));
 }
 
 static ssize_t part_discard_alignment_show(struct device *dev,
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 16/27] drbd: use bdev_alignment_offset instead of queue_alignment_offset

2022-04-06 Thread Christoph Hellwig
The bdev version does the right thing for partitions, so use that.

Fixes: 9104d31a759f ("drbd: introduce WRITE_SAME support")
Signed-off-by: Christoph Hellwig 
---
 drivers/block/drbd/drbd_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index d20d84ee7a88e..9d43aadde19ad 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -939,7 +939,7 @@ int drbd_send_sizes(struct drbd_peer_device *peer_device, 
int trigger_reply, enu
p->qlim->logical_block_size =
cpu_to_be32(bdev_logical_block_size(bdev));
p->qlim->alignment_offset =
-   cpu_to_be32(queue_alignment_offset(q));
+   cpu_to_be32(bdev_alignment_offset(bdev));
p->qlim->io_min = cpu_to_be32(bdev_io_min(bdev));
p->qlim->io_opt = cpu_to_be32(bdev_io_opt(bdev));
p->qlim->discard_enabled = blk_queue_discard(q);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 14/27] block: add a bdev_max_zone_append_sectors helper

2022-04-06 Thread Christoph Hellwig
Add a helper to check the max supported sectors for zone append based on
the block_device instead of having to poke into the block layer internal
request_queue.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/target/zns.c | 3 +--
 fs/zonefs/super.c | 3 +--
 include/linux/blkdev.h| 6 ++
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index e34718b095504..82b61acf7a72b 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -34,8 +34,7 @@ static int validate_conv_zones_cb(struct blk_zone *z,
 
 bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
 {
-   struct request_queue *q = ns->bdev->bd_disk->queue;
-   u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
+   u8 zasl = nvmet_zasl(bdev_max_zone_append_sectors(ns->bdev));
struct gendisk *bd_disk = ns->bdev->bd_disk;
int ret;
 
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 3614c7834007d..7a63807b736c4 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -678,13 +678,12 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, 
struct iov_iter *from)
struct inode *inode = file_inode(iocb->ki_filp);
struct zonefs_inode_info *zi = ZONEFS_I(inode);
struct block_device *bdev = inode->i_sb->s_bdev;
-   unsigned int max;
+   unsigned int max = bdev_max_zone_append_sectors(bdev);
struct bio *bio;
ssize_t size;
int nr_pages;
ssize_t ret;
 
-   max = queue_max_zone_append_sectors(bdev_get_queue(bdev));
max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize);
iov_iter_truncate(from, max);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a433798c3343e..f8c50b77543eb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1188,6 +1188,12 @@ static inline unsigned int 
queue_max_zone_append_sectors(const struct request_qu
return min(l->max_zone_append_sectors, l->max_sectors);
 }
 
+static inline unsigned int
+bdev_max_zone_append_sectors(struct block_device *bdev)
+{
+   return queue_max_zone_append_sectors(bdev_get_queue(bdev));
+}
+
 static inline unsigned queue_logical_block_size(const struct request_queue *q)
 {
int retval = 512;
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 13/27] block: add a bdev_stable_writes helper

2022-04-06 Thread Christoph Hellwig
Add a helper to check the stable writes flag based on the block_device
instead of having to poke into the block layer internal request_queue.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-table.c  | 4 +---
 fs/super.c | 2 +-
 include/linux/blkdev.h | 6 ++
 mm/swapfile.c  | 2 +-
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 5e38d0dd009d5..d46839faa0ca5 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1950,9 +1950,7 @@ static int device_requires_stable_pages(struct dm_target 
*ti,
struct dm_dev *dev, sector_t start,
sector_t len, void *data)
 {
-   struct request_queue *q = bdev_get_queue(dev->bdev);
-
-   return blk_queue_stable_writes(q);
+   return bdev_stable_writes(dev->bdev);
 }
 
 int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
diff --git a/fs/super.c b/fs/super.c
index f1d4a193602d6..60f57c7bc0a69 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1204,7 +1204,7 @@ static int set_bdev_super(struct super_block *s, void 
*data)
s->s_dev = s->s_bdev->bd_dev;
s->s_bdi = bdi_get(s->s_bdev->bd_disk->bdi);
 
-   if (blk_queue_stable_writes(s->s_bdev->bd_disk->queue))
+   if (bdev_stable_writes(s->s_bdev))
s->s_iflags |= SB_I_STABLE_WRITES;
return 0;
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 075b16d4560e7..a433798c3343e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1330,6 +1330,12 @@ static inline bool bdev_nonrot(struct block_device *bdev)
return blk_queue_nonrot(bdev_get_queue(bdev));
 }
 
+static inline bool bdev_stable_writes(struct block_device *bdev)
+{
+   return test_bit(QUEUE_FLAG_STABLE_WRITES,
+   _get_queue(bdev)->queue_flags);
+}
+
 static inline bool bdev_write_cache(struct block_device *bdev)
 {
return test_bit(QUEUE_FLAG_WC, _get_queue(bdev)->queue_flags);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index d5ab7ec4d92ca..4069f17a82c8e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3065,7 +3065,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, 
int, swap_flags)
goto bad_swap_unlock_inode;
}
 
-   if (p->bdev && blk_queue_stable_writes(p->bdev->bd_disk->queue))
+   if (p->bdev && bdev_stable_writes(p->bdev))
p->flags |= SWP_STABLE_WRITES;
 
if (p->bdev && p->bdev->bd_disk->fops->rw_page)
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 12/27] block: add a bdev_fua helper

2022-04-06 Thread Christoph Hellwig
Add a helper to check the FUA flag based on the block_device instead of
having to poke into the block layer internal request_queue.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/rnbd/rnbd-srv.c   | 3 +--
 drivers/target/target_core_iblock.c | 3 +--
 fs/iomap/direct-io.c| 3 +--
 include/linux/blkdev.h  | 6 +-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
index f8cc3c5fecb4b..beaef43a67b9d 100644
--- a/drivers/block/rnbd/rnbd-srv.c
+++ b/drivers/block/rnbd/rnbd-srv.c
@@ -533,7 +533,6 @@ static void rnbd_srv_fill_msg_open_rsp(struct 
rnbd_msg_open_rsp *rsp,
struct rnbd_srv_sess_dev *sess_dev)
 {
struct rnbd_dev *rnbd_dev = sess_dev->rnbd_dev;
-   struct request_queue *q = bdev_get_queue(rnbd_dev->bdev);
 
rsp->hdr.type = cpu_to_le16(RNBD_MSG_OPEN_RSP);
rsp->device_id =
@@ -560,7 +559,7 @@ static void rnbd_srv_fill_msg_open_rsp(struct 
rnbd_msg_open_rsp *rsp,
rsp->cache_policy = 0;
if (bdev_write_cache(rnbd_dev->bdev))
rsp->cache_policy |= RNBD_WRITEBACK;
-   if (blk_queue_fua(q))
+   if (bdev_fua(rnbd_dev->bdev))
rsp->cache_policy |= RNBD_FUA;
 }
 
diff --git a/drivers/target/target_core_iblock.c 
b/drivers/target/target_core_iblock.c
index 03013e85ffc03..c4a903b8a47fc 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -727,14 +727,13 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
 
if (data_direction == DMA_TO_DEVICE) {
struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
-   struct request_queue *q = bdev_get_queue(ib_dev->ibd_bd);
/*
 * Force writethrough using REQ_FUA if a volatile write cache
 * is not enabled, or if initiator set the Force Unit Access 
bit.
 */
opf = REQ_OP_WRITE;
miter_dir = SG_MITER_TO_SG;
-   if (test_bit(QUEUE_FLAG_FUA, >queue_flags)) {
+   if (bdev_fua(ib_dev->ibd_bd)) {
if (cmd->se_cmd_flags & SCF_FUA)
opf |= REQ_FUA;
else if (!bdev_write_cache(ib_dev->ibd_bd))
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index b08f5dc31780d..62da020d02a11 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -265,8 +265,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter 
*iter,
 * cache flushes on IO completion.
 */
if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
-   (dio->flags & IOMAP_DIO_WRITE_FUA) &&
-   blk_queue_fua(bdev_get_queue(iomap->bdev)))
+   (dio->flags & IOMAP_DIO_WRITE_FUA) && bdev_fua(iomap->bdev))
use_fua = true;
}
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 807a49aa5a27a..075b16d4560e7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -602,7 +602,6 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct 
request_queue *q);
 REQ_FAILFAST_DRIVER))
 #define blk_queue_quiesced(q)  test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
 #define blk_queue_pm_only(q)   atomic_read(&(q)->pm_only)
-#define blk_queue_fua(q)   test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
 #define blk_queue_registered(q)test_bit(QUEUE_FLAG_REGISTERED, 
&(q)->queue_flags)
 #define blk_queue_nowait(q)test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
 
@@ -1336,6 +1335,11 @@ static inline bool bdev_write_cache(struct block_device 
*bdev)
return test_bit(QUEUE_FLAG_WC, _get_queue(bdev)->queue_flags);
 }
 
+static inline bool bdev_fua(struct block_device *bdev)
+{
+   return test_bit(QUEUE_FLAG_FUA, _get_queue(bdev)->queue_flags);
+}
+
 static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
 {
struct request_queue *q = bdev_get_queue(bdev);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 11/27] block: add a bdev_write_cache helper

2022-04-06 Thread Christoph Hellwig
Add a helper to check the write cache flag based on the block_device
instead of having to poke into the block layer internal request_queue.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/rnbd/rnbd-srv.c   | 2 +-
 drivers/block/xen-blkback/xenbus.c  | 2 +-
 drivers/target/target_core_iblock.c | 8 ++--
 fs/btrfs/disk-io.c  | 3 +--
 include/linux/blkdev.h  | 5 +
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
index f04df6294650b..f8cc3c5fecb4b 100644
--- a/drivers/block/rnbd/rnbd-srv.c
+++ b/drivers/block/rnbd/rnbd-srv.c
@@ -558,7 +558,7 @@ static void rnbd_srv_fill_msg_open_rsp(struct 
rnbd_msg_open_rsp *rsp,
rsp->secure_discard =
cpu_to_le16(rnbd_dev_get_secure_discard(rnbd_dev));
rsp->cache_policy = 0;
-   if (test_bit(QUEUE_FLAG_WC, >queue_flags))
+   if (bdev_write_cache(rnbd_dev->bdev))
rsp->cache_policy |= RNBD_WRITEBACK;
if (blk_queue_fua(q))
rsp->cache_policy |= RNBD_FUA;
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index f09040435e2e5..8b691fe50475f 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -517,7 +517,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
vbd->type |= VDISK_REMOVABLE;
 
q = bdev_get_queue(bdev);
-   if (q && test_bit(QUEUE_FLAG_WC, >queue_flags))
+   if (bdev_write_cache(bdev))
vbd->flush_support = true;
 
if (q && blk_queue_secure_erase(q))
diff --git a/drivers/target/target_core_iblock.c 
b/drivers/target/target_core_iblock.c
index b41ee5c3b5b82..03013e85ffc03 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -737,7 +737,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
if (test_bit(QUEUE_FLAG_FUA, >queue_flags)) {
if (cmd->se_cmd_flags & SCF_FUA)
opf |= REQ_FUA;
-   else if (!test_bit(QUEUE_FLAG_WC, >queue_flags))
+   else if (!bdev_write_cache(ib_dev->ibd_bd))
opf |= REQ_FUA;
}
} else {
@@ -886,11 +886,7 @@ iblock_parse_cdb(struct se_cmd *cmd)
 
 static bool iblock_get_write_cache(struct se_device *dev)
 {
-   struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
-   struct block_device *bd = ib_dev->ibd_bd;
-   struct request_queue *q = bdev_get_queue(bd);
-
-   return test_bit(QUEUE_FLAG_WC, >queue_flags);
+   return bdev_write_cache(IBLOCK_DEV(dev)->ibd_bd);
 }
 
 static const struct target_backend_ops iblock_ops = {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b30309f187cf0..d80adee32128d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4247,8 +4247,7 @@ static void write_dev_flush(struct btrfs_device *device)
 * of simplicity, since this is a debug tool and not meant for use in
 * non-debug builds.
 */
-   struct request_queue *q = bdev_get_queue(device->bdev);
-   if (!test_bit(QUEUE_FLAG_WC, >queue_flags))
+   if (bdev_write_cache(device->bdev))
return;
 #endif
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3a9578e14a6b0..807a49aa5a27a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1331,6 +1331,11 @@ static inline bool bdev_nonrot(struct block_device *bdev)
return blk_queue_nonrot(bdev_get_queue(bdev));
 }
 
+static inline bool bdev_write_cache(struct block_device *bdev)
+{
+   return test_bit(QUEUE_FLAG_WC, _get_queue(bdev)->queue_flags);
+}
+
 static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
 {
struct request_queue *q = bdev_get_queue(bdev);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 10/27] block: add a bdev_nonrot helper

2022-04-06 Thread Christoph Hellwig
Add a helper to check the nonrot flag based on the block_device instead
of having to poke into the block layer internal request_queue.

Signed-off-by: Christoph Hellwig 
---
 block/ioctl.c   | 2 +-
 drivers/block/loop.c| 2 +-
 drivers/md/dm-table.c   | 4 +---
 drivers/md/md.c | 3 +--
 drivers/md/raid1.c  | 2 +-
 drivers/md/raid10.c | 2 +-
 drivers/md/raid5.c  | 2 +-
 drivers/target/target_core_file.c   | 3 +--
 drivers/target/target_core_iblock.c | 2 +-
 fs/btrfs/volumes.c  | 4 ++--
 fs/ext4/mballoc.c   | 2 +-
 include/linux/blkdev.h  | 5 +
 mm/swapfile.c   | 4 ++--
 13 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 4a86340133e46..ad3771b268b81 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -489,7 +489,7 @@ static int blkdev_common_ioctl(struct block_device *bdev, 
fmode_t mode,
queue_max_sectors(bdev_get_queue(bdev)));
return put_ushort(argp, max_sectors);
case BLKROTATIONAL:
-   return put_ushort(argp, 
!blk_queue_nonrot(bdev_get_queue(bdev)));
+   return put_ushort(argp, !bdev_nonrot(bdev));
case BLKRASET:
case BLKFRASET:
if(!capable(CAP_SYS_ADMIN))
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index a58595f5ee2c8..8d800d46e4985 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -903,7 +903,7 @@ static void loop_update_rotational(struct loop_device *lo)
 
/* not all filesystems (e.g. tmpfs) have a sb->s_bdev */
if (file_bdev)
-   nonrot = blk_queue_nonrot(bdev_get_queue(file_bdev));
+   nonrot = bdev_nonrot(file_bdev);
 
if (nonrot)
blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 03541cfc2317c..5e38d0dd009d5 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1820,9 +1820,7 @@ static int device_dax_write_cache_enabled(struct 
dm_target *ti,
 static int device_is_rotational(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
 {
-   struct request_queue *q = bdev_get_queue(dev->bdev);
-
-   return !blk_queue_nonrot(q);
+   return !bdev_nonrot(dev->bdev);
 }
 
 static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 309b3af906ad3..19636c2f2cda4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5991,8 +5991,7 @@ int md_run(struct mddev *mddev)
bool nonrot = true;
 
rdev_for_each(rdev, mddev) {
-   if (rdev->raid_disk >= 0 &&
-   !blk_queue_nonrot(bdev_get_queue(rdev->bdev))) {
+   if (rdev->raid_disk >= 0 && !bdev_nonrot(rdev->bdev)) {
nonrot = false;
break;
}
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 99d5464a51f81..d81b896855f9f 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -704,7 +704,7 @@ static int read_balance(struct r1conf *conf, struct r1bio 
*r1_bio, int *max_sect
/* At least two disks to choose from so failfast is OK 
*/
set_bit(R1BIO_FailFast, _bio->state);
 
-   nonrot = blk_queue_nonrot(bdev_get_queue(rdev->bdev));
+   nonrot = bdev_nonrot(rdev->bdev);
has_nonrot_disk |= nonrot;
pending = atomic_read(>nr_pending);
dist = abs(this_sector - conf->mirrors[disk].head_position);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index dfe7d62d3fbdd..7816c8b2e8087 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -796,7 +796,7 @@ static struct md_rdev *read_balance(struct r10conf *conf,
if (!do_balance)
break;
 
-   nonrot = blk_queue_nonrot(bdev_get_queue(rdev->bdev));
+   nonrot = bdev_nonrot(rdev->bdev);
has_nonrot_disk |= nonrot;
pending = atomic_read(>nr_pending);
if (min_pending > pending && nonrot) {
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 351d341a1ffa4..0bbae0e638666 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7242,7 +7242,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
rdev_for_each(rdev, mddev) {
if (test_bit(Journal, >flags))
continue;
-   if (blk_queue_nonrot(bdev_get_queue(rdev->bdev))) {
+   if (bdev_nonrot(rdev->bdev)) {
conf->batch_bio_dispatch = false;
break;
}
diff --git 

[PATCH 09/27] mm: use bdev_is_zoned in claim_swapfile

2022-04-06 Thread Christoph Hellwig
Use the bdev based helper instead of poking into the queue.

Signed-off-by: Christoph Hellwig 
---
 mm/swapfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 63c61f8b26118..4c7537162af5e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2761,7 +2761,7 @@ static int claim_swapfile(struct swap_info_struct *p, 
struct inode *inode)
 * write only restriction.  Hence zoned block devices are not
 * suitable for swapping.  Disallow them here.
 */
-   if (blk_queue_is_zoned(p->bdev->bd_disk->queue))
+   if (bdev_is_zoned(p->bdev))
return -EINVAL;
p->flags |= SWP_BLKDEV;
} else if (S_ISREG(inode->i_mode)) {
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 08/27] ntfs3: use bdev_logical_block_size instead of open coding it

2022-04-06 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/ntfs3/super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index 278dcf5024102..cd30e81abbce0 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -920,7 +920,7 @@ static int ntfs_fill_super(struct super_block *sb, struct 
fs_context *fc)
}
 
/* Parse boot. */
-   err = ntfs_init_from_boot(sb, rq ? queue_logical_block_size(rq) : 512,
+   err = ntfs_init_from_boot(sb, bdev_logical_block_size(bdev),
  bdev_nr_bytes(bdev));
if (err)
goto out;
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 07/27] btrfs: use bdev_max_active_zones instead of open coding it

2022-04-06 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 fs/btrfs/zoned.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index b7b5fac1c7790..5b85004d85d6c 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -350,7 +350,6 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, 
bool populate_cache)
struct btrfs_fs_info *fs_info = device->fs_info;
struct btrfs_zoned_device_info *zone_info = NULL;
struct block_device *bdev = device->bdev;
-   struct request_queue *queue = bdev_get_queue(bdev);
unsigned int max_active_zones;
unsigned int nactive;
sector_t nr_sectors;
@@ -410,7 +409,7 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, 
bool populate_cache)
if (!IS_ALIGNED(nr_sectors, zone_sectors))
zone_info->nr_zones++;
 
-   max_active_zones = queue_max_active_zones(queue);
+   max_active_zones = bdev_max_active_zones(bdev);
if (max_active_zones && max_active_zones < BTRFS_MIN_ACTIVE_ZONES) {
btrfs_err_in_rcu(fs_info,
 "zoned: %s: max active zones %u is too small, need at least %u active zones",
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 06/27] drbd: cleanup decide_on_discard_support

2022-04-06 Thread Christoph Hellwig
Sanitize the calling conventions and use a goto label to cleanup the
code flow.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/drbd/drbd_nl.c | 68 +++-
 1 file changed, 35 insertions(+), 33 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 02030c9c4d3b1..40bb0b356a6d6 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1204,38 +1204,42 @@ static unsigned int drbd_max_discard_sectors(struct 
drbd_connection *connection)
 }
 
 static void decide_on_discard_support(struct drbd_device *device,
-   struct request_queue *q,
-   struct request_queue *b,
-   bool discard_zeroes_if_aligned)
+   struct drbd_backing_dev *bdev)
 {
-   /* q = drbd device queue (device->rq_queue)
-* b = backing device queue 
(device->ldev->backing_bdev->bd_disk->queue),
-* or NULL if diskless
-*/
-   struct drbd_connection *connection = 
first_peer_device(device)->connection;
-   bool can_do = b ? blk_queue_discard(b) : true;
-
-   if (can_do && connection->cstate >= C_CONNECTED && 
!(connection->agreed_features & DRBD_FF_TRIM)) {
-   can_do = false;
-   drbd_info(connection, "peer DRBD too old, does not support 
TRIM: disabling discards\n");
-   }
-   if (can_do) {
-   /* We don't care for the granularity, really.
-* Stacking limits below should fix it for the local
-* device.  Whether or not it is a suitable granularity
-* on the remote device is not our problem, really. If
-* you care, you need to use devices with similar
-* topology on all peers. */
-   blk_queue_discard_granularity(q, 512);
-   q->limits.max_discard_sectors = 
drbd_max_discard_sectors(connection);
-   blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
-   q->limits.max_write_zeroes_sectors = 
drbd_max_discard_sectors(connection);
-   } else {
-   blk_queue_flag_clear(QUEUE_FLAG_DISCARD, q);
-   blk_queue_discard_granularity(q, 0);
-   q->limits.max_discard_sectors = 0;
-   q->limits.max_write_zeroes_sectors = 0;
+   struct drbd_connection *connection =
+   first_peer_device(device)->connection;
+   struct request_queue *q = device->rq_queue;
+
+   if (bdev && !blk_queue_discard(bdev->backing_bdev->bd_disk->queue))
+   goto not_supported;
+
+   if (connection->cstate >= C_CONNECTED &&
+   !(connection->agreed_features & DRBD_FF_TRIM)) {
+   drbd_info(connection,
+   "peer DRBD too old, does not support TRIM: disabling 
discards\n");
+   goto not_supported;
}
+
+   /*
+* We don't care for the granularity, really.
+*
+* Stacking limits below should fix it for the local device.  Whether or
+* not it is a suitable granularity on the remote device is not our
+* problem, really. If you care, you need to use devices with similar
+* topology on all peers.
+*/
+   blk_queue_discard_granularity(q, 512);
+   q->limits.max_discard_sectors = drbd_max_discard_sectors(connection);
+   blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
+   q->limits.max_write_zeroes_sectors =
+   drbd_max_discard_sectors(connection);
+   return;
+
+not_supported:
+   blk_queue_flag_clear(QUEUE_FLAG_DISCARD, q);
+   blk_queue_discard_granularity(q, 0);
+   q->limits.max_discard_sectors = 0;
+   q->limits.max_write_zeroes_sectors = 0;
 }
 
 static void fixup_discard_if_not_supported(struct request_queue *q)
@@ -1273,7 +1277,6 @@ static void drbd_setup_queue_param(struct drbd_device 
*device, struct drbd_backi
unsigned int max_segments = 0;
struct request_queue *b = NULL;
struct disk_conf *dc;
-   bool discard_zeroes_if_aligned = true;
 
if (bdev) {
b = bdev->backing_bdev->bd_disk->queue;
@@ -1282,7 +1285,6 @@ static void drbd_setup_queue_param(struct drbd_device 
*device, struct drbd_backi
rcu_read_lock();
dc = rcu_dereference(device->ldev->disk_conf);
max_segments = dc->max_bio_bvecs;
-   discard_zeroes_if_aligned = dc->discard_zeroes_if_aligned;
rcu_read_unlock();
 
blk_set_stacking_limits(>limits);
@@ -1292,7 +1294,7 @@ static void drbd_setup_queue_param(struct drbd_device 
*device, struct drbd_backi
/* This is the workaround for "bio would need to, but cannot, be split" 
*/
blk_queue_max_segments(q, max_segments ? max_segments : 
BLK_MAX_SEGMENTS);
blk_queue_segment_boundary(q, PAGE_SIZE-1);
-   decide_on_discard_support(device, q, b, discard_zeroes_if_aligned);
+   

[PATCH 05/27] drbd: use bdev based limit helpers in drbd_send_sizes

2022-04-06 Thread Christoph Hellwig
Use the bdev based limits helpers where they exist.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/drbd/drbd_main.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 74b1b2424efff..d20d84ee7a88e 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -924,7 +924,9 @@ int drbd_send_sizes(struct drbd_peer_device *peer_device, 
int trigger_reply, enu
 
memset(p, 0, packet_size);
if (get_ldev_if_state(device, D_NEGOTIATING)) {
-   struct request_queue *q = 
bdev_get_queue(device->ldev->backing_bdev);
+   struct block_device *bdev = device->ldev->backing_bdev;
+   struct request_queue *q = bdev_get_queue(bdev);
+
d_size = drbd_get_max_capacity(device->ldev);
rcu_read_lock();
u_size = rcu_dereference(device->ldev->disk_conf)->disk_size;
@@ -933,16 +935,15 @@ int drbd_send_sizes(struct drbd_peer_device *peer_device, 
int trigger_reply, enu
max_bio_size = queue_max_hw_sectors(q) << 9;
max_bio_size = min(max_bio_size, DRBD_MAX_BIO_SIZE);
p->qlim->physical_block_size =
-   cpu_to_be32(queue_physical_block_size(q));
+   cpu_to_be32(bdev_physical_block_size(bdev));
p->qlim->logical_block_size =
-   cpu_to_be32(queue_logical_block_size(q));
+   cpu_to_be32(bdev_logical_block_size(bdev));
p->qlim->alignment_offset =
cpu_to_be32(queue_alignment_offset(q));
-   p->qlim->io_min = cpu_to_be32(queue_io_min(q));
-   p->qlim->io_opt = cpu_to_be32(queue_io_opt(q));
+   p->qlim->io_min = cpu_to_be32(bdev_io_min(bdev));
+   p->qlim->io_opt = cpu_to_be32(bdev_io_opt(bdev));
p->qlim->discard_enabled = blk_queue_discard(q);
-   p->qlim->write_same_capable =
-   !!q->limits.max_write_same_sectors;
+   p->qlim->write_same_capable = 0;
put_ldev(device);
} else {
struct request_queue *q = device->rq_queue;
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 03/27] target: fix discard alignment on partitions

2022-04-06 Thread Christoph Hellwig
Use the proper bdev_discard_alignment helper that accounts for partition
offsets.

Fіxes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6")
Signed-off-by: Christoph Hellwig 
---
 drivers/target/target_core_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/target/target_core_device.c 
b/drivers/target/target_core_device.c
index 3a1ec705cd80b..16e775bcf4a7c 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -849,8 +849,8 @@ bool target_configure_unmap_from_queue(struct se_dev_attrib 
*attrib,
 */
attrib->max_unmap_block_desc_count = 1;
attrib->unmap_granularity = q->limits.discard_granularity / block_size;
-   attrib->unmap_granularity_alignment = q->limits.discard_alignment /
-   block_size;
+   attrib->unmap_granularity_alignment =
+   bdev_discard_alignment(bdev) / block_size;
return true;
 }
 EXPORT_SYMBOL(target_configure_unmap_from_queue);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 04/27] drbd: remove assign_p_sizes_qlim

2022-04-06 Thread Christoph Hellwig
Fold each branch into its only caller.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/drbd/drbd_main.c | 50 --
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 9676a1d214bc5..74b1b2424efff 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -903,31 +903,6 @@ void drbd_gen_and_send_sync_uuid(struct drbd_peer_device 
*peer_device)
}
 }
 
-/* communicated if (agreed_features & DRBD_FF_WSAME) */
-static void
-assign_p_sizes_qlim(struct drbd_device *device, struct p_sizes *p,
-   struct request_queue *q)
-{
-   if (q) {
-   p->qlim->physical_block_size = 
cpu_to_be32(queue_physical_block_size(q));
-   p->qlim->logical_block_size = 
cpu_to_be32(queue_logical_block_size(q));
-   p->qlim->alignment_offset = 
cpu_to_be32(queue_alignment_offset(q));
-   p->qlim->io_min = cpu_to_be32(queue_io_min(q));
-   p->qlim->io_opt = cpu_to_be32(queue_io_opt(q));
-   p->qlim->discard_enabled = blk_queue_discard(q);
-   p->qlim->write_same_capable = 0;
-   } else {
-   q = device->rq_queue;
-   p->qlim->physical_block_size = 
cpu_to_be32(queue_physical_block_size(q));
-   p->qlim->logical_block_size = 
cpu_to_be32(queue_logical_block_size(q));
-   p->qlim->alignment_offset = 0;
-   p->qlim->io_min = cpu_to_be32(queue_io_min(q));
-   p->qlim->io_opt = cpu_to_be32(queue_io_opt(q));
-   p->qlim->discard_enabled = 0;
-   p->qlim->write_same_capable = 0;
-   }
-}
-
 int drbd_send_sizes(struct drbd_peer_device *peer_device, int trigger_reply, 
enum dds_flags flags)
 {
struct drbd_device *device = peer_device->device;
@@ -957,14 +932,35 @@ int drbd_send_sizes(struct drbd_peer_device *peer_device, 
int trigger_reply, enu
q_order_type = drbd_queue_order_type(device);
max_bio_size = queue_max_hw_sectors(q) << 9;
max_bio_size = min(max_bio_size, DRBD_MAX_BIO_SIZE);
-   assign_p_sizes_qlim(device, p, q);
+   p->qlim->physical_block_size =
+   cpu_to_be32(queue_physical_block_size(q));
+   p->qlim->logical_block_size =
+   cpu_to_be32(queue_logical_block_size(q));
+   p->qlim->alignment_offset =
+   cpu_to_be32(queue_alignment_offset(q));
+   p->qlim->io_min = cpu_to_be32(queue_io_min(q));
+   p->qlim->io_opt = cpu_to_be32(queue_io_opt(q));
+   p->qlim->discard_enabled = blk_queue_discard(q);
+   p->qlim->write_same_capable =
+   !!q->limits.max_write_same_sectors;
put_ldev(device);
} else {
+   struct request_queue *q = device->rq_queue;
+
+   p->qlim->physical_block_size =
+   cpu_to_be32(queue_physical_block_size(q));
+   p->qlim->logical_block_size =
+   cpu_to_be32(queue_logical_block_size(q));
+   p->qlim->alignment_offset = 0;
+   p->qlim->io_min = cpu_to_be32(queue_io_min(q));
+   p->qlim->io_opt = cpu_to_be32(queue_io_opt(q));
+   p->qlim->discard_enabled = 0;
+   p->qlim->write_same_capable = 0;
+
d_size = 0;
u_size = 0;
q_order_type = QUEUE_ORDERED_NONE;
max_bio_size = DRBD_MAX_BIO_SIZE; /* ... multiple BIOs per 
peer_request */
-   assign_p_sizes_qlim(device, p, NULL);
}
 
if (peer_device->connection->agreed_pro_version <= 94)
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 02/27] target: pass a block_device to target_configure_unmap_from_queue

2022-04-06 Thread Christoph Hellwig
The target code is a consumer of the block layer and should generally
work on struct block_device.

Signed-off-by: Christoph Hellwig 
---
 drivers/target/target_core_device.c  | 5 +++--
 drivers/target/target_core_file.c| 7 ---
 drivers/target/target_core_iblock.c  | 2 +-
 include/target/target_core_backend.h | 4 ++--
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/target/target_core_device.c 
b/drivers/target/target_core_device.c
index fa866acef5bb2..3a1ec705cd80b 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -834,9 +834,10 @@ struct se_device *target_alloc_device(struct se_hba *hba, 
const char *name)
  * in ATA and we need to set TPE=1
  */
 bool target_configure_unmap_from_queue(struct se_dev_attrib *attrib,
-  struct request_queue *q)
+  struct block_device *bdev)
 {
-   int block_size = queue_logical_block_size(q);
+   struct request_queue *q = bdev_get_queue(bdev);
+   int block_size = bdev_logical_block_size(bdev);
 
if (!blk_queue_discard(q))
return false;
diff --git a/drivers/target/target_core_file.c 
b/drivers/target/target_core_file.c
index 8190b840065f3..8d191fdc33217 100644
--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -134,10 +134,11 @@ static int fd_configure_device(struct se_device *dev)
 */
inode = file->f_mapping->host;
if (S_ISBLK(inode->i_mode)) {
-   struct request_queue *q = bdev_get_queue(I_BDEV(inode));
+   struct block_device *bdev = I_BDEV(inode);
+   struct request_queue *q = bdev_get_queue(bdev);
unsigned long long dev_size;
 
-   fd_dev->fd_block_size = bdev_logical_block_size(I_BDEV(inode));
+   fd_dev->fd_block_size = bdev_logical_block_size(bdev);
/*
 * Determine the number of bytes from i_size_read() minus
 * one (1) logical sector from underlying struct block_device
@@ -150,7 +151,7 @@ static int fd_configure_device(struct se_device *dev)
dev_size, div_u64(dev_size, fd_dev->fd_block_size),
fd_dev->fd_block_size);
 
-   if (target_configure_unmap_from_queue(>dev_attrib, q))
+   if (target_configure_unmap_from_queue(>dev_attrib, bdev))
pr_debug("IFILE: BLOCK Discard support available,"
 " disabled by default\n");
/*
diff --git a/drivers/target/target_core_iblock.c 
b/drivers/target/target_core_iblock.c
index 87ede165ddba4..b886ce1770bfd 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -119,7 +119,7 @@ static int iblock_configure_device(struct se_device *dev)
dev->dev_attrib.hw_max_sectors = queue_max_hw_sectors(q);
dev->dev_attrib.hw_queue_depth = q->nr_requests;
 
-   if (target_configure_unmap_from_queue(>dev_attrib, q))
+   if (target_configure_unmap_from_queue(>dev_attrib, bd))
pr_debug("IBLOCK: BLOCK Discard support available,"
 " disabled by default\n");
 
diff --git a/include/target/target_core_backend.h 
b/include/target/target_core_backend.h
index 675f3a1fe6139..773963a1e0b53 100644
--- a/include/target/target_core_backend.h
+++ b/include/target/target_core_backend.h
@@ -14,7 +14,7 @@
 #define TRANSPORT_FLAG_PASSTHROUGH_ALUA0x2
 #define TRANSPORT_FLAG_PASSTHROUGH_PGR  0x4
 
-struct request_queue;
+struct block_device;
 struct scatterlist;
 
 struct target_backend_ops {
@@ -117,7 +117,7 @@ sense_reason_t passthrough_parse_cdb(struct se_cmd *cmd,
 bool target_sense_desc_format(struct se_device *dev);
 sector_t target_to_linux_sector(struct se_device *dev, sector_t lb);
 bool target_configure_unmap_from_queue(struct se_dev_attrib *attrib,
-  struct request_queue *q);
+  struct block_device *bdev);
 
 static inline bool target_dev_configured(struct se_device *se_dev)
 {
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 01/27] target: remove an incorrect unmap zeroes data deduction

2022-04-06 Thread Christoph Hellwig
For block devices the target code implements UNMAP as calls to
blkdev_issue_discard, which does not guarantee zeroing just because
Write Zeroes is supported.

Note that this does not affect the file backed path which uses
fallocate to punch holes.

Fixes: 2237498f0b5c ("target/iblock: Convert WRITE_SAME to 
blkdev_issue_zeroout")
Signed-off-by: Christoph Hellwig 
---
 drivers/target/target_core_device.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/target/target_core_device.c 
b/drivers/target/target_core_device.c
index 44bb380e7390c..fa866acef5bb2 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -850,7 +850,6 @@ bool target_configure_unmap_from_queue(struct se_dev_attrib 
*attrib,
attrib->unmap_granularity = q->limits.discard_granularity / block_size;
attrib->unmap_granularity_alignment = q->limits.discard_alignment /
block_size;
-   attrib->unmap_zeroes_data = !!(q->limits.max_write_zeroes_sectors);
return true;
 }
 EXPORT_SYMBOL(target_configure_unmap_from_queue);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


use block_device based APIs in block layer consumers

2022-04-06 Thread Christoph Hellwig
Hi Jens,

this series cleanups up the block layer API so that APIs consumed
by file systems are (almost) only struct block_devic based, so that
file systems don't have to poke into block layer internals like the
request_queue.

I also found a bunch of existing bugs related to partition offsets
and discard so these are fixed while going along.

Diffstat:
 arch/um/drivers/ubd_kern.c   |2 
 block/blk-core.c |4 -
 block/blk-lib.c  |  124 ---
 block/blk-mq-debugfs.c   |2 
 block/blk-settings.c |   74 
 block/blk.h  |   14 ---
 block/fops.c |2 
 block/genhd.c|4 -
 block/ioctl.c|   48 ++---
 block/partitions/core.c  |   12 ---
 drivers/block/drbd/drbd_main.c   |   53 +++---
 drivers/block/drbd/drbd_nl.c |   94 +++---
 drivers/block/drbd/drbd_receiver.c   |   13 +--
 drivers/block/loop.c |   15 +---
 drivers/block/nbd.c  |3 
 drivers/block/null_blk/main.c|1 
 drivers/block/rbd.c  |1 
 drivers/block/rnbd/rnbd-clt.c|6 -
 drivers/block/rnbd/rnbd-srv-dev.h|8 --
 drivers/block/rnbd/rnbd-srv.c|5 -
 drivers/block/virtio_blk.c   |2 
 drivers/block/xen-blkback/blkback.c  |   15 ++--
 drivers/block/xen-blkback/xenbus.c   |9 --
 drivers/block/xen-blkfront.c |7 -
 drivers/block/zram/zram_drv.c|1 
 drivers/md/bcache/alloc.c|2 
 drivers/md/bcache/request.c  |4 -
 drivers/md/bcache/super.c|3 
 drivers/md/bcache/sysfs.c|2 
 drivers/md/dm-cache-target.c |9 --
 drivers/md/dm-clone-target.c |9 --
 drivers/md/dm-io.c   |2 
 drivers/md/dm-log-writes.c   |3 
 drivers/md/dm-raid.c |9 --
 drivers/md/dm-table.c|   25 +--
 drivers/md/dm-thin.c |   15 
 drivers/md/dm.c  |3 
 drivers/md/md-linear.c   |   11 ---
 drivers/md/md.c  |5 -
 drivers/md/raid0.c   |7 -
 drivers/md/raid1.c   |   18 -
 drivers/md/raid10.c  |   20 -
 drivers/md/raid5-cache.c |8 +-
 drivers/md/raid5.c   |   14 +--
 drivers/mmc/core/queue.c |3 
 drivers/mtd/mtd_blkdevs.c|1 
 drivers/nvme/host/core.c |6 -
 drivers/nvme/target/io-cmd-bdev.c|2 
 drivers/nvme/target/zns.c|3 
 drivers/s390/block/dasd_fba.c|1 
 drivers/scsi/sd.c|2 
 drivers/target/target_core_device.c  |   19 ++---
 drivers/target/target_core_file.c|   10 +-
 drivers/target/target_core_iblock.c  |   17 +---
 fs/btrfs/disk-io.c   |3 
 fs/btrfs/extent-tree.c   |8 +-
 fs/btrfs/ioctl.c |   12 +--
 fs/btrfs/volumes.c   |4 -
 fs/btrfs/zoned.c |3 
 fs/direct-io.c   |   32 +
 fs/exfat/file.c  |5 -
 fs/exfat/super.c |   10 --
 fs/ext4/ioctl.c  |   10 --
 fs/ext4/mballoc.c|   10 +-
 fs/ext4/super.c  |   10 --
 fs/f2fs/f2fs.h   |3 
 fs/f2fs/file.c   |   19 ++---
 fs/f2fs/segment.c|8 --
 fs/fat/file.c|5 -
 fs/fat/inode.c   |   10 --
 fs/gfs2/rgrp.c   |7 -
 fs/iomap/direct-io.c |3 
 fs/jbd2/journal.c|9 --
 fs/jfs/ioctl.c   |5 -
 fs/jfs/super.c   |8 --
 fs/nilfs2/ioctl.c|6 -
 fs/nilfs2/sufile.c   |4 -
 fs/nilfs2/the_nilfs.c|4 -
 fs/ntfs3/file.c  |6 -
 fs/ntfs3/super.c |   10 +-
 fs/ocfs2/ioctl.c |5 -
 fs/super.c   |2 
 fs/xfs/xfs_discard.c |8 +-
 fs/xfs/xfs_log_cil.c |2 
 fs/xfs/xfs_super.c   |   12 +--
 fs/zonefs/super.c|3 
 include/linux/blkdev.h   |  112 +++
 include/target/target_core_backend.h |4 -
 mm/swapfile.c|   31 ++--
 89 files changed, 493 insertions(+), 652 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization