Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-03 Thread Jianxiong Gao
>
> Please try with this extra patch:
>
I have tried with the extra patch and it still fails to boot.
I have attached dmesg output for the error:

-dmesg starts here-
[6.357755] XFS (nvme0n1p2): Mounting V5 Filesystem
[6.430092] XFS (nvme0n1p2): Torn write (CRC failure) detected at
log block 0x20d0. Truncating head block from 0x20e0.
[6.442828] XFS (nvme0n1p2): Starting recovery (logdev: internal)
[7.272456] XFS (nvme0n1p2): Ending recovery (logdev: internal)
[7.610250] systemd-journald[434]: Received SIGTERM from PID 1 (systemd).
...
[   10.054121] systemd[755]:
/usr/lib/systemd/system-generators/systemd-rc-local-generator
terminated by signal ABRT.
[   10.726122] audit: type=1400 audit(1612370261.090:4): avc:  denied
{ search } for  pid=783 comm="systemd-sysctl" name="crash"
dev="nvme0n1p2" ino=50789805
scontext=system_u:system_r:systemd_sysctl_t:s0
tcontext=system_u:object_r:kdump_crash_t:s0 tclass=dir permissive=0
[   10.751764] audit: type=1400 audit(1612370261.090:5): avc:  denied
{ search } for  pid=783 comm="systemd-sysctl" name="crash"
dev="nvme0n1p2" ino=50789805
scontext=system_u:system_r:systemd_sysctl_t:s0
tcontext=system_u:object_r:kdump_crash_t:s0 tclass=dir permissive=0
[   12.366607] xfs filesystem being remounted at / supports timestamps
until 2038 (0x7fff)
[   12.376379] audit: type=1400 audit(1612370262.740:6): avc:  denied
{ write } for  pid=788 comm="systemd-remount" name="crash"
dev="nvme0n1p2" ino=50789805 scontext=system_u:system_r:init_t:s0
tcontext=system_u:object_r:kdump_crash_t:s0 tclass=dir permissive=0
[   12.413155] systemd-journald[781]: Received request to flush
runtime journal from PID 1
[   12.428917] audit: type=1400 audit(1612370262.793:7): avc:  denied
{ write } for  pid=815 comm="journalctl" name="crash" dev="nvme0n1p2"
ino=50789805 scontext=system_u:system_r:init_t:s0
tcontext=system_u:object_r:kdump_crash_t:s0 tclass=dir permissive=0
[   12.453006] audit: type=1400 audit(1612370262.799:8): avc:  denied
{ write } for  pid=817 comm="systemd-random-" name="crash"
dev="nvme0n1p2" ino=50789805 scontext=system_u:system_r:init_t:s0
tcontext=system_u:object_r:kdump_crash_t:s0 tclass=dir permissive=0
[   13.306030] plymouth[853]: segfault at 0 ip 7f2eabca8090 sp
7fffe94d8c08 error 6 in libc-2.28.so[7f2eabbcd000+1b9000]
[   13.318772] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00
[   78.782418] sed[911]: segfault at 0 ip 7fc3540da090 sp
7ffdb4373ff8 error 6 in libc-2.28.so[7fc353fff000+1b9000]
[   78.794007] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00
--dmesg ends here---

-- 
Jianxiong Gao


Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Jianxiong Gao
> Why is this setting being done and undone on each IO? Wouldn't it be
> more efficient to set it once during device initialization?

I agree that setting it once is the right way of doing it.

So I have changed the patch to enable the mask once in nvme_probe.

 drivers/nvme/host/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 81e6389b2042..4ce78373f98d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2630,6 +2630,9 @@ static void nvme_reset_work(struct work_struct *work)
 */
dma_set_max_seg_size(dev->dev, 0x);

+   if (dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
+   dev_warn(dev->dev, "dma_set_min_align_mask failed to
set offset\n");
+
mutex_unlock(>shutdown_lock);

/*

However on boot of the system, the following error happens occasionally.
The error seems related to Journal service. Whenever "Stopping Journal
Service..."
appears, the boot succeeds. Otherwise boot fails with the following message.

log start here--
[  OK  ] Started Journal Service.
[   10.774545] xfs filesystem being remounted at / supports timestamps
until 2038 (0x7fff)
[  OK  ] Started Remount Root and Kernel File Systems.
 Starting Flush Journal to Persistent Storage...
 Starting Load/Save Random Seed...
 Starting Create Static [   10.804340] systemd-journald[780]:
Received request to flush runtime journal from PID 1
Device Nodes in /dev...
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Started Create Static Device Nodes in /dev.
 Starting udev Kernel Device Manager...
[  OK  ] Reached target Local File Systems (Pre).
 Starting File System Check on /dev/disk/by-uuid/7281-17FC...
[  OK  ] Started File System Check on /dev/disk/by-uuid/7281-17FC.
 Mounting /boot/efi...
[  OK  ] Mounted /boo[   11.203461] systemd[1]: segfault at 2e0 ip
55b08607cc24 sp 7ffe13809090 error 4 in
systemd[55b08600+14]
t/efi.
[   11.216088] Code: 02 c7 44 24 10 fe ff ff ff 49 89 e4 89 06 48 8d
6c 24 08 48 8d 5c 24 10 48 c7 44 24 18 00 00 00 00 eb 10 0f 1f 00 48
8b 3c 24 <44> 39 b7 e0 02 00 00 74 3b 49 8b 7d 00 4c 89 e1 48 89 ea 48
89 de
---log ends here---

> Based on the Keith's comment it needs to be completely avoided in hot path.
>
> Did you get a chance to bisect the problem in the rc6 ? We need to know the
> root cause otherwise it might happen again after we add this patch.

I am now trying to bisect the problem.
I am not sure how the mapping offset could have caused the error.
Any suggestions are appreciated.

--
Jianxiong Gao


Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Jianxiong Gao
> Why is this setting being done and undone on each IO? Wouldn't it be
> more efficient to set it once during device initialization?
>
> And more importantly, this isn't thread safe: one CPU may be setting the
> device's dma alignment mask to 0 while another CPU is expecting it to be
> NVME_CTRL_PAGE_SIZE - 1.

 I was having trouble getting the OS booted when setting it once during
 initialization. However when I rebased to the latest rc6 this morning it
 seems to be working with setting the mask on probe. I am still testing out
 and will appreciate any idea what may cause the nvme driver to fail
 with a single mask.
-- 
Jianxiong Gao


Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Jianxiong Gao
On Mon, Feb 1, 2021 at 10:56 AM Andy Shevchenko
 wrote:
>
> On Mon, Feb 01, 2021 at 10:30:17AM -0800, Jianxiong Gao wrote:
> > NVMe driver relies on the address offset to function properly.
> > This patch adds the offset preserve mask to NVMe driver when mapping
> > via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
> > depends on the page size defined by CC.MPS register of NVMe
> > controller.
>
> ...
>
> >   if (is_pci_p2pdma_page(sg_page(iod->sg)))
> >   nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
> >   iod->nents, rq_dma_dir(req), 
> > DMA_ATTR_NO_WARN);
> >   else
> >   nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> >rq_dma_dir(req), 
> > DMA_ATTR_NO_WARN);
> > +
> > + offset_ret = dma_set_min_align_mask(dev->dev, 0);
> > + if (offset_ret) {
> > + dev_warn(dev->dev, "dma_set_min_align_mask failed to reset 
> > offset\n");
> > + goto out_free_sg;
> > + }
>
> Seems like rebasing effect which makes empty line goes in the middle of some
> other group of code lines.
>
> >   if (!nr_mapped)
> >   goto out_free_sg;
>
> Perhaps it should be here?
Yes you are correct, it should be

 else
  nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
   rq_dma_dir(req), DMA_ATTR_NO_WARN);
  if (!nr_mapped)
  goto out_free_sg;
+
+ offset_ret = dma_set_min_align_mask(dev->dev, 0);
+ if (offset_ret) {
+     dev_warn(dev->dev, "dma_set_min_align_mask failed to
reset offset\n");
+ goto out_free_sg;
+ }

Thanks for pointing it out.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>


-- 
Jianxiong Gao


[PATCH V2 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.

2021-02-01 Thread Jianxiong Gao
For devices that need to preserve address offset on mapping through
swiotlb, this patch adds offset preserving based on page_offset_mask
and keeps the offset if the mask is non zero. This is needed for
device drivers like NVMe.

Signed-off-by: Jianxiong Gao 
---
 kernel/dma/swiotlb.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e6100..eeb640df35f3 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,7 +468,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
unsigned long flags;
phys_addr_t tlb_addr;
-   unsigned int nslots, stride, index, wrap;
+   unsigned int nslots, stride, index, wrap, min_align_mask, page_offset;
int i;
unsigned long mask;
unsigned long offset_slots;
@@ -500,12 +500,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
: 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
 
+   min_align_mask = dma_get_min_align_mask(hwdev);
+   page_offset = orig_addr & min_align_mask;
+   alloc_size += page_offset;
+
/*
 * For mappings greater than or equal to a page, we limit the stride
 * (and hence alignment) to a page size.
 */
nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
-   if (alloc_size >= PAGE_SIZE)
+   if ((alloc_size >= PAGE_SIZE) || (min_align_mask > (1 << IO_TLB_SHIFT)))
stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
else
stride = 1;
@@ -583,6 +587,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
 */
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+   /*
+* When keeping the offset of the original data, we need to advance
+* the tlb_addr by the offset of orig_addr.
+*/
+   tlb_addr += page_offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
DMA_TO_DEVICE);
@@ -598,7 +607,11 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  enum dma_data_direction dir, unsigned long attrs)
 {
unsigned long flags;
-   int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> 
IO_TLB_SHIFT;
+   unsigned int num_page_offset_slabs;
+   unsigned int min_align_mask = dma_get_min_align_mask(hwdev);
+   int i, count;
+   int nslots = ALIGN(alloc_size + (tlb_addr & min_align_mask),
+   1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
@@ -610,6 +623,14 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
DMA_FROM_DEVICE);
 
+   /*
+* When dma_get_min_align_mask is used, we may have padded more slabs
+* when padding exceeds one slab. We need to move index back to the
+* beginning of the padding.
+*/
+   num_page_offset_slabs = (tlb_addr & min_align_mask) / (1 << 
IO_TLB_SHIFT);
+   index -= num_page_offset_slabs;
+
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
-- 
2.27.0



[PATCH V2 1/3] Adding page_offset_mask to device_dma_parameters

2021-02-01 Thread Jianxiong Gao
Some devices rely on the address offset in a page to function
correctly (NVMe driver as an example). These devices may use
a different page size than the Linux kernel. The address offset
has to be preserved upon mapping, and in order to do so, we
need to record the page_offset_mask first.

Signed-off-by: Jianxiong Gao 
---
 include/linux/device.h  |  1 +
 include/linux/dma-mapping.h | 17 +
 2 files changed, 18 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 1779f90eeb4c..7960bf516dd7 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -291,6 +291,7 @@ struct device_dma_parameters {
 * sg limitations.
 */
unsigned int max_segment_size;
+   unsigned int min_align_mask;
unsigned long segment_boundary_mask;
 };
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2e49996a8f39..27ec3cab8cbd 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -500,6 +500,23 @@ static inline int dma_set_seg_boundary(struct device *dev, 
unsigned long mask)
return -EIO;
 }
 
+static inline unsigned int dma_get_min_align_mask(struct device *dev)
+{
+   if (dev->dma_parms)
+   return dev->dma_parms->min_align_mask;
+   return 0;
+}
+
+static inline int dma_set_min_align_mask(struct device *dev,
+   unsigned int min_align_mask)
+{
+   if (dev->dma_parms) {
+   dev->dma_parms->min_align_mask = min_align_mask;
+   return 0;
+   }
+   return -EIO;
+}
+
 static inline int dma_get_cache_alignment(void)
 {
 #ifdef ARCH_DMA_MINALIGN
-- 
2.27.0



[PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Jianxiong Gao
NVMe driver relies on the address offset to function properly.
This patch adds the offset preserve mask to NVMe driver when mapping
via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
depends on the page size defined by CC.MPS register of NVMe
controller.

Signed-off-by: Jianxiong Gao 
---
 drivers/nvme/host/pci.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 81e6389b2042..30e45f7e0f75 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -580,12 +580,15 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct 
request *req)
 static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
+   if (dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
+   dev_warn(dev->dev, "dma_set_min_align_mask failed to set 
offset\n");
if (is_pci_p2pdma_page(sg_page(iod->sg)))
pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req));
else
dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+   if (dma_set_min_align_mask(dev->dev, 0))
+   dev_warn(dev->dev, "dma_set_min_align_mask failed to reset 
offset\n");
 }
 
 static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
@@ -842,7 +845,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
blk_status_t ret = BLK_STS_RESOURCE;
-   int nr_mapped;
+   int nr_mapped, offset_ret;
 
if (blk_rq_nr_phys_segments(req) == 1) {
struct bio_vec bv = req_bvec(req);
@@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
if (!iod->nents)
goto out_free_sg;
 
+   offset_ret = dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
+   if (offset_ret) {
+   dev_warn(dev->dev, "dma_set_min_align_mask failed to set 
offset\n");
+   goto out_free_sg;
+   }
+
if (is_pci_p2pdma_page(sg_page(iod->sg)))
nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
 rq_dma_dir(req), DMA_ATTR_NO_WARN);
+
+   offset_ret = dma_set_min_align_mask(dev->dev, 0);
+   if (offset_ret) {
+   dev_warn(dev->dev, "dma_set_min_align_mask failed to reset 
offset\n");
+   goto out_free_sg;
+   }
if (!nr_mapped)
goto out_free_sg;
 
-- 
2.27.0



[PATCH V2 0/3] SWIOTLB: Preserve swiotlb map offset when needed.

2021-02-01 Thread Jianxiong Gao
NVMe driver and other applications may depend on the data offset
to operate correctly. Currently when unaligned data is mapped via
SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. This
patch adds an option to make sure the mapped data preserves its
offset of the orginal addrss.

Without the patch when creating xfs formatted disk on NVMe backends,
with swiotlb=force in kernel boot option, creates the following error:
meta-data=/dev/nvme2n1   isize=512agcount=4, agsize=131072 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=1, sparse=0, rmapbt=0, refl
ink=0
data =   bsize=4096   blocks=524288, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal log   bsize=4096   blocks=2560, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
mkfs.xfs: pwrite failed: Input/output error

Jianxiong Gao (3):
  Adding page_offset_mask to device_dma_parameters
  Add swiotlb offset preserving mapping when
dma_dma_parameters->page_offset_mask is non zero.
  Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

 drivers/nvme/host/pci.c |  4 
 include/linux/device.h  |  1 +
 include/linux/dma-mapping.h | 17 +
 kernel/dma/swiotlb.c| 16 +++-
 4 files changed, 37 insertions(+), 1 deletion(-)

-- 
2.27.0



[PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-01-27 Thread Jianxiong Gao
NVMe driver relies on the address offset to function properly.
This patch adds the offset preserve mask to NVMe driver when mapping
via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
depends on the page size defined by CC.MPS register of NVMe
controller.

Signed-off-by: Jianxiong Gao 
---
 drivers/nvme/host/pci.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 856aa31931c1..0b23f04068be 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -580,12 +580,15 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct 
request *req)
 static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
+   if (dma_set_page_offset_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
+   dev_warn(dev->dev, "dma_set_page_offset_mask failed to set 
offset\n");
if (is_pci_p2pdma_page(sg_page(iod->sg)))
pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req));
else
dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+   if (dma_set_page_offset_mask(dev->dev, 0))
+   dev_warn(dev->dev, "dma_set_page_offset_mask failed to reset 
offset\n");
 }
 
 static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
@@ -842,7 +845,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
blk_status_t ret = BLK_STS_RESOURCE;
-   int nr_mapped;
+   int nr_mapped, offset_ret;
 
if (blk_rq_nr_phys_segments(req) == 1) {
struct bio_vec bv = req_bvec(req);
@@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
if (!iod->nents)
goto out_free_sg;
 
+   offset_ret = dma_set_page_offset_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 
1);
+   if (offset_ret) {
+   dev_warn(dev->dev, "dma_set_page_offset_mask failed to set 
offset\n");
+   goto out_free_sg;
+   }
+
if (is_pci_p2pdma_page(sg_page(iod->sg)))
nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
 rq_dma_dir(req), DMA_ATTR_NO_WARN);
+
+   offset_ret = dma_set_page_offset_mask(dev->dev, 0);
+   if (offset_ret) {
+   dev_warn(dev->dev, "dma_set_page_offset_mask failed to reset 
offset\n");
+   goto out_free_sg;
+   }
if (!nr_mapped)
goto out_free_sg;
 
-- 
2.27.0



[PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.*

2021-01-27 Thread Jianxiong Gao
NVMe driver and other applications may depend on the data offset
to operate correctly. Currently when unaligned data is mapped via
SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. This
patch adds an option to make sure the mapped data preserves its
offset of the orginal addrss.

Without the patch when creating xfs formatted disk on NVMe backends,
with swiotlb=force in kernel boot option, creates the following error:
meta-data=/dev/nvme2n1   isize=512agcount=4, agsize=131072 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=1, sparse=0, rmapbt=0, refl
ink=0
data =   bsize=4096   blocks=524288, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal log   bsize=4096   blocks=2560, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
mkfs.xfs: pwrite failed: Input/output error

Jianxiong Gao (3):
  Adding page_offset_mask to device_dma_parameters
  Add swiotlb offset preserving mapping when
dma_dma_parameters->page_offset_mask is non zero.
  Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

 drivers/nvme/host/pci.c |  4 
 include/linux/device.h  |  1 +
 include/linux/dma-mapping.h | 17 +
 kernel/dma/swiotlb.c| 16 +++-
 4 files changed, 37 insertions(+), 1 deletion(-)

-- 
2.27.0



[PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.

2021-01-27 Thread Jianxiong Gao
For devices that need to preserve address offset on mapping through
swiotlb, this patch adds offset preserving based on page_offset_mask
and keeps the offset if the mask is non zero. This is needed for
device drivers like NVMe.

Signed-off-by: Jianxiong Gao 
---
 kernel/dma/swiotlb.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e6100..4cab35f2c9bc 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,7 +468,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
unsigned long flags;
phys_addr_t tlb_addr;
-   unsigned int nslots, stride, index, wrap;
+   unsigned int nslots, stride, index, wrap, page_offset_mask, page_offset;
int i;
unsigned long mask;
unsigned long offset_slots;
@@ -500,12 +500,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
: 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
 
+   page_offset_mask = dma_get_page_offset_mask(hwdev);
+   page_offset = orig_addr & page_offset_mask;
+   alloc_size += page_offset;
+
/*
 * For mappings greater than or equal to a page, we limit the stride
 * (and hence alignment) to a page size.
 */
nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
-   if (alloc_size >= PAGE_SIZE)
+   if ((alloc_size >= PAGE_SIZE) || (page_offset_mask > (1 << 
IO_TLB_SHIFT)))
stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
else
stride = 1;
@@ -583,6 +587,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
 */
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+   /*
+* When keeping the offset of the original data, we need to advance
+* the tlb_addr by the offset of orig_addr.
+*/
+   tlb_addr += page_offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
DMA_TO_DEVICE);
@@ -598,7 +607,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  enum dma_data_direction dir, unsigned long attrs)
 {
unsigned long flags;
-   int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> 
IO_TLB_SHIFT;
+   unsigned int num_page_offset_slabs, page_offset_mask = 
dma_get_page_offset_mask(hwdev);
+   int i, count;
+   int nslots = ALIGN(alloc_size + tlb_addr & page_offset_mask, 1 << 
IO_TLB_SHIFT) >> IO_TLB_SHIFT;
int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
@@ -610,6 +621,14 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
DMA_FROM_DEVICE);
 
+   /*
+* When dma_get_page_offset_mask is used, we may have padded more slabs
+* when padding exceeds one slab. We need to move index back to the
+* beginning of the padding.
+*/
+   num_page_offset_slabs =  (tlb_addr & page_offset_mask) / (1 << 
IO_TLB_SHIFT);
+   index -= num_page_offset_slabs;
+
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
-- 
2.27.0



[PATCH 1/3] Adding page_offset_mask to device_dma_parameters

2021-01-27 Thread Jianxiong Gao
Some devices rely on the address offset in a page to function
correctly (NVMe driver as an example). These devices may use
a different page size than the Linux kernel. The address offset
has to be preserved upon mapping, and in order to do so, we
need to record the page_offset_mask first.

Signed-off-by: Jianxiong Gao 
---
 include/linux/device.h  |  1 +
 include/linux/dma-mapping.h | 17 +
 2 files changed, 18 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 1779f90eeb4c..f44e0659fc66 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -292,6 +292,7 @@ struct device_dma_parameters {
 */
unsigned int max_segment_size;
unsigned long segment_boundary_mask;
+   unsigned int page_offset_mask;
 };
 
 /**
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2e49996a8f39..5529a31fefba 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -500,6 +500,23 @@ static inline int dma_set_seg_boundary(struct device *dev, 
unsigned long mask)
return -EIO;
 }
 
+static inline unsigned int dma_get_page_offset_mask(struct device *dev)
+{
+   if (dev->dma_parms)
+   return dev->dma_parms->page_offset_mask;
+   return 0;
+}
+
+static inline int dma_set_page_offset_mask(struct device *dev,
+   unsigned int page_offset_mask)
+{
+   if (dev->dma_parms) {
+   dev->dma_parms->page_offset_mask = page_offset_mask;
+   return 0;
+   }
+   return -EIO;
+}
+
 static inline int dma_get_cache_alignment(void)
 {
 #ifdef ARCH_DMA_MINALIGN
-- 
2.27.0



[PATCH] [PATCH] Keep offset when mapping data via SWIOTLB.

2020-12-07 Thread Jianxiong Gao
NVMe driver and other applications depend on the data offset
to operate correctly. Currently when unaligned data is mapped via
SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. When
booting with --swiotlb=force option and using NVMe as interface,
running mkfs.xfs on Rhel fails because of the unalignment issue.
This patch makes sure the mapped data preserves
its offset of the orginal address. Tested on latest kernel that
this patch fixes the issue.

Signed-off-by: Jianxiong Gao 
Acked-by: David Rientjes 
---
 kernel/dma/swiotlb.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 781b9dca197c..56a35e71b3fd 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -483,6 +483,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
max_slots = mask + 1
? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
: 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+ 
+   /*
+* We need to keep the offset when mapping, so adding the offset
+* to the total set we need to allocate in SWIOTLB
+*/
+   alloc_size += offset_in_page(orig_addr);
 
/*
 * For mappings greater than or equal to a page, we limit the stride
@@ -567,6 +573,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
 */
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+   /*
+* When keeping the offset of the original data, we need to advance
+* the tlb_addr by the offset of orig_addr.
+*/
+   tlb_addr += orig_addr & (PAGE_SIZE - 1);
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
DMA_TO_DEVICE);
-- 
2.27.0




[PATCH] [PATCH] Adding offset keeping option when mapping data via SWIOTLB.

2020-11-23 Thread Jianxiong Gao
NVMe driver and other applications may depend on the data offset
to operate correctly. Currently when unaligned data is mapped via
SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. When
booting with --swiotlb=force option and using NVMe as interface,
running mkfs.xfs on Rhel fails because of the unalignment issue.
This patch adds an option to make sure the mapped data preserves
its offset of the orginal addrss. Tested on latest kernel that
this patch fixes the issue.

Signed-off-by: Jianxiong Gao 
Acked-by: David Rientjes 
---
 drivers/nvme/host/pci.c |  3 ++-
 include/linux/dma-mapping.h |  8 
 kernel/dma/swiotlb.c| 13 +
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 0578ff253c47..a366fb8a1ff0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -833,7 +833,8 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
-rq_dma_dir(req), DMA_ATTR_NO_WARN);
+   rq_dma_dir(req),
+   DMA_ATTR_NO_WARN|DMA_ATTR_SWIOTLB_KEEP_OFFSET);
if (!nr_mapped)
goto out;
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 956151052d45..e46d23d9fa20 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,6 +61,14 @@
  */
 #define DMA_ATTR_PRIVILEGED(1UL << 9)
 
+/*
+ * DMA_ATTR_SWIOTLB_KEEP_OFFSET: used to indicate that the buffer has to keep
+ * its offset when mapped via SWIOTLB. Some application functionality depends
+ * on the address offset, thus when buffers are mapped via SWIOTLB, the offset
+ * needs to be preserved.
+ */
+#define DMA_ATTR_SWIOTLB_KEEP_OFFSET   (1UL << 10)
+
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
  * be given to a device to use as a DMA source or target.  It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 781b9dca197c..f43d7be1342d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -483,6 +483,13 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
max_slots = mask + 1
? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
: 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+ 
+   /*
+* If we need to keep the offset when mapping, we need to add the offset
+* to the total set we need to allocate in SWIOTLB
+*/
+   if (attrs & DMA_ATTR_SWIOTLB_KEEP_OFFSET)
+   alloc_size += offset_in_page(orig_addr);
 
/*
 * For mappings greater than or equal to a page, we limit the stride
@@ -567,6 +574,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
 */
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+   /*
+* When keeping the offset of the original data, we need to advance
+* the tlb_addr by the offset of orig_addr.
+*/
+   if (attrs & DMA_ATTR_SWIOTLB_KEEP_OFFSET)
+   tlb_addr += orig_addr & (PAGE_SIZE - 1);
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
DMA_TO_DEVICE);
-- 
2.27.0