Re: [PATCH v6 00/21] Userspace P2PDMA with O_DIRECT NVMe devices
On 4/7/22 08:46, Logan Gunthorpe wrote: > Hi, > > This patchset continues my work to add userspace P2PDMA access using > O_DIRECT NVMe devices. This posting contains some minor fixes and a > rebase onto v5.18-rc1 which contains cleanup from Christoph around > free_zone_device_page() that helps to enable this patchset. The > previous posting was here[1]. > > The patchset enables userspace P2PDMA by allowing userspace to mmap() > allocated chunks of the CMB. The resulting VMA can be passed only > to O_DIRECT IO on NVMe backed files or block devices. A flag is added > to GUP() in Patch <>, then Patches <> through <> wire this flag up based > on whether the block queue indicates P2PDMA support. Patches <> > through <> enable the CMB to be mapped into userspace by mmaping > the nvme char device. > > This is relatively straightforward, however the one significant > problem is that, presently, pci_p2pdma_map_sg() requires a homogeneous > SGL with all P2PDMA pages or all regular pages. Enhancing GUP to > support enforcing this rule would require a huge hack that I don't > expect would be all that pallatable. So the first 13 patches add > support for P2PDMA pages to dma_map_sg[table]() to the dma-direct > and dma-iommu implementations. Thus systems without an IOMMU plus > Intel and AMD IOMMUs are supported. (Other IOMMU implementations would > then be unsupported, notably ARM and PowerPC but support would be added > when they convert to dma-iommu). > > dma_map_sgtable() is preferred when dealing with P2PDMA memory as it > will return -EREMOTEIO when the DMA device cannot map specific P2PDMA > pages based on the existing rules in calc_map_type_and_dist(). > > The other issue is dma_unmap_sg() needs a flag to determine whether a > given dma_addr_t was mapped regularly or as a PCI bus address. To allow > this, a third flag is added to the page_link field in struct > scatterlist. This effectively means support for P2PDMA will now depend > on CONFIG_64BIT. > > Feedback welcome. > > This series is based on v5.18-rc1. A git branch is available here: > >https://github.com/sbates130272/linux-p2pmem/ p2pdma_user_cmb_v6 > > Thanks, > > Logan > > [1] lkml.kernel.org/r/20220128002614.6136-1-log...@deltatee.com > > -- Do you have any plans to re-spin this ? -ck ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Error when running fio against nvme-of rdma target (mlx5 driver)
On 2/8/22 6:50 PM, Martin Oliveira wrote: > Hello, > > We have been hitting an error when running IO over our nvme-of setup, using > the mlx5 driver and we are wondering if anyone has seen anything similar/has > any suggestions. > > Both initiator and target are AMD EPYC 7502 machines connected over RDMA > using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a > single NVMe fabrics device, one physical SSD per namespace. > Thanks for reporting this, if you can bisect the problem on your setup it will help others to help you better. -ck ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 01/24] ext4/xfs: add page refcount helper
On 1/27/22 4:25 PM, Logan Gunthorpe wrote: > External email: Use caution opening links or attachments > > > From: Ralph Campbell > > There are several places where ZONE_DEVICE struct pages assume a reference > count == 1 means the page is idle and free. Instead of open coding this, > add a helper function to hide this detail. > > Signed-off-by: Ralph Campbell > Signed-off-by: Alex Sierra > Reviewed-by: Christoph Hellwig > Acked-by: Theodore Ts'o > Acked-by: Darrick J. Wong > --- Looks good. Reviewed-by: Chaitanya Kulkarni ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 12/24] nvme-pci: convert to using dma_map_sgtable()
On 1/27/22 4:26 PM, Logan Gunthorpe wrote: > External email: Use caution opening links or attachments > > > The dma_map operations now support P2PDMA pages directly. So remove > the calls to pci_p2pdma_[un]map_sg_attrs() and replace them with calls > to dma_map_sgtable(). > > dma_map_sgtable() returns more complete error codes than dma_map_sg() > and allows differentiating EREMOTEIO errors in case an unsupported > P2PDMA transfer is requested. When this happens, return BLK_STS_TARGET > so the request isn't retried. > > Signed-off-by: Logan Gunthorpe > Reviewed-by: Max Gurtovoy > --- Looks good. Reviewed-by: Chaitanya Kulkarni ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 11/23] nvme-pci: convert to using dma_map_sgtable()
> static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev, > - struct request *req, struct nvme_rw_command *cmd, int entries) > + struct request *req, struct nvme_rw_command *cmd) > { > struct nvme_iod *iod = blk_mq_rq_to_pdu(req); > struct dma_pool *pool; > struct nvme_sgl_desc *sg_list; > - struct scatterlist *sg = iod->sg; > + struct scatterlist *sg = iod->sgt.sgl; > + int entries = iod->sgt.nents; I don't see use of newly added entries variable anywhere in nvme_pci_setup_sgls(), what am I missing ? Also, type of entries variable should be unsigned int to match the iod->sgt.nents. > dma_addr_t sgl_dma; > int i = 0; > > @@ -848,7 +838,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, > struct request *req, > { > struct nvme_iod *iod = blk_mq_rq_to_pdu(req); > blk_status_t ret = BLK_STS_RESOURCE; > - int nr_mapped; > + int rc; > > if (blk_rq_nr_phys_segments(req) == 1) { > struct bio_vec bv = req_bvec(req); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 10/23] nvme-pci: check DMA ops when indicating support for PCI P2PDMA
On 11/17/21 1:53 PM, Logan Gunthorpe wrote: > Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to > replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops > flags can be checked for PCI P2PDMA support. > > Signed-off-by: Logan Gunthorpe > --- Looks good. Reviewed-by: Chaitanya Kulkarni ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 04/23] PCI/P2PDMA: Expose pci_p2pdma_map_type()
On 11/17/21 1:53 PM, Logan Gunthorpe wrote: > pci_p2pdma_map_type() will be needed by the dma-iommu map_sg > implementation because it will need to determine the mapping type > ahead of actually doing the mapping to create the actual IOMMU mapping. > > Prototypes for this helper are added to dma-map-ops.h as they are only > useful to dma map implementations and don't need to pollute the public > pci-p2pdma header > > Signed-off-by: Logan Gunthorpe > Acked-by: Bjorn Helgaas > Reviewed-by: Jason Gunthorpe > --- The documentation looks much better from reviewer point of view. Looks good. Reviewed-by: Chaitanya Kulkarni ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 03/23] PCI/P2PDMA: Attempt to set map_type if it has not been set
On 11/17/21 1:53 PM, Logan Gunthorpe wrote: > Attempt to find the mapping type for P2PDMA pages on the first > DMA map attempt if it has not been done ahead of time. > > Previously, the mapping type was expected to be calculated ahead of > time, but if pages are to come from userspace then there's no > way to ensure the path was checked ahead of time. > > This change will calculate the mapping type if it hasn't pre-calculated > so it is no longer invalid to call pci_p2pdma_map_sg() before the mapping > type is calculated, so drop the WARN_ON when that is he case. > > Signed-off-by: Logan Gunthorpe > Acked-by: Bjorn Helgaas > --- Perhaps a comment would be nice in the default case in pci_p2pdma_map_sg_attrs() where you have removed the WARN_ON_ONCE(). Either way, looks good. Reviewed-by: Chaitanya Kulkarni ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 02/23] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
On 11/17/21 1:53 PM, Logan Gunthorpe wrote: > Make use of the third free LSB in scatterlist's page_link on 64bit systems. > > The extra bit will be used by dma_[un]map_sg_p2pdma() to determine when a > given SGL segments dma_address points to a PCI bus address. > dma_unmap_sg_p2pdma() will need to perform different cleanup when a > segment is marked as a bus address. > > Create a CONFIG_NEED_SG_DMA_BUS_ADDR_FLAG bool which depends on > CONFIG_64BIT (so there is space in the page link for the new flag). > CONFIG_PCI_P2PDMA will then depend on this so this means PCI P2PDMA will > require CONFIG_64BIT. This should be acceptable as the majority of P2PDMA > use cases are restricted to newer root complexes and roughly require the > extra address space for memory BARs used in the transactions. > > Signed-off-by: Logan Gunthorpe Looks good. Reviewed-by: Chaitanya Kulkarni ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 01/23] lib/scatterlist: cleanup macros into static inline functions
On 11/17/21 1:53 PM, Logan Gunthorpe wrote: > Convert the sg_is_chain(), sg_is_last() and sg_chain_ptr() macros > into static inline functions. There's no reason for these to be macros > and static inline are generally preferred these days. > > Also introduce the SG_PAGE_LINK_MASK define so the P2PDMA work, which is > adding another bit to this mask, can do so more easily. > > Suggested-by: Jason Gunthorpe > Signed-off-by: Logan Gunthorpe > --- Looks good. Reviewed-by: Chaitanya Kulkarni ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 09/20] nvme-pci: check DMA ops when indicating support for PCI P2PDMA
>> >> Is this new ops only needed for the PCIe transport ? or do you have >> following patches to use this op for the other transports ? > > No, I don't think this will make sense for transports that are not based > on PCI devices. > >> If it is only needed for the PCIe then we need to find a way to >> not add this somehow... > > I don't see how we can do that. The core code needs to know whether the > transport supports this and must have an operation to query it. > Okay. > Logan > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 09/20] nvme-pci: check DMA ops when indicating support for PCI P2PDMA
Logan, > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 7efb31b87f37..916750a54f60 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -3771,7 +3771,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, > unsigned nsid, > blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue); > > blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue); > - if (ctrl->ops->flags & NVME_F_PCI_P2PDMA) > + if (ctrl->ops->supports_pci_p2pdma && > + ctrl->ops->supports_pci_p2pdma(ctrl)) > blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue); > > ns->ctrl = ctrl; > diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h > index 9871c0c9374c..fb9bfc52a6d7 100644 > --- a/drivers/nvme/host/nvme.h > +++ b/drivers/nvme/host/nvme.h > @@ -477,7 +477,6 @@ struct nvme_ctrl_ops { > unsigned int flags; > #define NVME_F_FABRICS (1 << 0) > #define NVME_F_METADATA_SUPPORTED (1 << 1) > -#define NVME_F_PCI_P2PDMA (1 << 2) > int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val); > int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val); > int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val); > @@ -485,6 +484,7 @@ struct nvme_ctrl_ops { > void (*submit_async_event)(struct nvme_ctrl *ctrl); > void (*delete_ctrl)(struct nvme_ctrl *ctrl); > int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size); > + bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl); > }; > Is this new ops only needed for the PCIe transport ? or do you have following patches to use this op for the other transports ? If it is only needed for the PCIe then we need to find a way to not add this somehow... ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 01/20] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
Logan, > +/* > + * bit 2 is the third free bit in the page_link on 64bit systems which > + * is used by dma_unmap_sg() to determine if the dma_address is a PCI > + * bus address when doing P2PDMA. > + * Note: CONFIG_PCI_P2PDMA depends on CONFIG_64BIT because of this. > + */ > + > +#ifdef CONFIG_PCI_P2PDMA > +#define SG_DMA_PCI_P2PDMA 0x04UL > +#else > +#define SG_DMA_PCI_P2PDMA 0x00UL > +#endif > + > +#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END | SG_DMA_PCI_P2PDMA) > + You are doing two things in one patch :- 1. Introducing a new macro to replace the current macros. 2. Adding a new member to those macros. shouldn't this be split into two patches where you introduce a macro SG_PAGE_LINK_MASK (SG_CHAIN | SG_END) in prep patch and update the SF_PAGE_LINK_MASK with SG_DMA_PCI_P2PDMA with related code? OR Is there a reason why it is not split ? -ck ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 01/20] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
> +/** > + * sg_unmark_pci_p2pdma - Unmark the scatterlist entry for PCI p2pdma > + * @sg: SG entryScatterlist > + * > + * Description: > + * Clears the PCI P2PDMA mark > + **/ nit:- Probably want to add '.' above. > +static inline void sg_dma_unmark_pci_p2pdma(struct scatterlist *sg) > +{ > + sg->page_link &= ~SG_DMA_PCI_P2PDMA; > +} > + > /** >* sg_phys - Return physical address of an sg entry >* @sg: SG entry > -- > 2.30.2 > either ways with or without split, looks good. Reviewed-by: Chaitanya Kulkarni ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu