Re: [PATCH v6 00/21] Userspace P2PDMA with O_DIRECT NVMe devices

2022-05-16 Thread Chaitanya Kulkarni via iommu
On 4/7/22 08:46, Logan Gunthorpe wrote:
> Hi,
> 
> This patchset continues my work to add userspace P2PDMA access using
> O_DIRECT NVMe devices. This posting contains some minor fixes and a
> rebase onto v5.18-rc1 which contains cleanup from Christoph around
> free_zone_device_page() that helps to enable this patchset. The
> previous posting was here[1].
> 
> The patchset enables userspace P2PDMA by allowing userspace to mmap()
> allocated chunks of the CMB. The resulting VMA can be passed only
> to O_DIRECT IO on NVMe backed files or block devices. A flag is added
> to GUP() in Patch <>, then Patches <> through <> wire this flag up based
> on whether the block queue indicates P2PDMA support. Patches <>
> through <> enable the CMB to be mapped into userspace by mmaping
> the nvme char device.
> 
> This is relatively straightforward, however the one significant
> problem is that, presently, pci_p2pdma_map_sg() requires a homogeneous
> SGL with all P2PDMA pages or all regular pages. Enhancing GUP to
> support enforcing this rule would require a huge hack that I don't
> expect would be all that pallatable. So the first 13 patches add
> support for P2PDMA pages to dma_map_sg[table]() to the dma-direct
> and dma-iommu implementations. Thus systems without an IOMMU plus
> Intel and AMD IOMMUs are supported. (Other IOMMU implementations would
> then be unsupported, notably ARM and PowerPC but support would be added
> when they convert to dma-iommu).
> 
> dma_map_sgtable() is preferred when dealing with P2PDMA memory as it
> will return -EREMOTEIO when the DMA device cannot map specific P2PDMA
> pages based on the existing rules in calc_map_type_and_dist().
> 
> The other issue is dma_unmap_sg() needs a flag to determine whether a
> given dma_addr_t was mapped regularly or as a PCI bus address. To allow
> this, a third flag is added to the page_link field in struct
> scatterlist. This effectively means support for P2PDMA will now depend
> on CONFIG_64BIT.
> 
> Feedback welcome.
> 
> This series is based on v5.18-rc1. A git branch is available here:
> 
>https://github.com/sbates130272/linux-p2pmem/  p2pdma_user_cmb_v6
> 
> Thanks,
> 
> Logan
> 
> [1] lkml.kernel.org/r/20220128002614.6136-1-log...@deltatee.com
> 
> --


Do you have any plans to re-spin this ?

-ck


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Error when running fio against nvme-of rdma target (mlx5 driver)

2022-02-09 Thread Chaitanya Kulkarni via iommu
On 2/8/22 6:50 PM, Martin Oliveira wrote:
> Hello,
> 
> We have been hitting an error when running IO over our nvme-of setup, using 
> the mlx5 driver and we are wondering if anyone has seen anything similar/has 
> any suggestions.
> 
> Both initiator and target are AMD EPYC 7502 machines connected over RDMA 
> using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a 
> single NVMe fabrics device, one physical SSD per namespace.
> 

Thanks for reporting this, if you can bisect the problem on your setup
it will help others to help you better.

-ck
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 01/24] ext4/xfs: add page refcount helper

2022-01-28 Thread Chaitanya Kulkarni via iommu
On 1/27/22 4:25 PM, Logan Gunthorpe wrote:
> External email: Use caution opening links or attachments
> 
> 
> From: Ralph Campbell 
> 
> There are several places where ZONE_DEVICE struct pages assume a reference
> count == 1 means the page is idle and free. Instead of open coding this,
> add a helper function to hide this detail.
> 
> Signed-off-by: Ralph Campbell 
> Signed-off-by: Alex Sierra 
> Reviewed-by: Christoph Hellwig 
> Acked-by: Theodore Ts'o 
> Acked-by: Darrick J. Wong 
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 12/24] nvme-pci: convert to using dma_map_sgtable()

2022-01-28 Thread Chaitanya Kulkarni via iommu
On 1/27/22 4:26 PM, Logan Gunthorpe wrote:
> External email: Use caution opening links or attachments
> 
> 
> The dma_map operations now support P2PDMA pages directly. So remove
> the calls to pci_p2pdma_[un]map_sg_attrs() and replace them with calls
> to dma_map_sgtable().
> 
> dma_map_sgtable() returns more complete error codes than dma_map_sg()
> and allows differentiating EREMOTEIO errors in case an unsupported
> P2PDMA transfer is requested. When this happens, return BLK_STS_TARGET
> so the request isn't retried.
> 
> Signed-off-by: Logan Gunthorpe 
> Reviewed-by: Max Gurtovoy 
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 11/23] nvme-pci: convert to using dma_map_sgtable()

2021-12-13 Thread Chaitanya Kulkarni via iommu


>   static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev,
> - struct request *req, struct nvme_rw_command *cmd, int entries)
> + struct request *req, struct nvme_rw_command *cmd)
>   {
>   struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>   struct dma_pool *pool;
>   struct nvme_sgl_desc *sg_list;
> - struct scatterlist *sg = iod->sg;
> + struct scatterlist *sg = iod->sgt.sgl;
> + int entries = iod->sgt.nents;

I don't see use of newly added entries variable anywhere in
nvme_pci_setup_sgls(), what am I missing ?

Also, type of entries variable should be unsigned int to match
the iod->sgt.nents.

>   dma_addr_t sgl_dma;
>   int i = 0;
>   
> @@ -848,7 +838,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
> struct request *req,
>   {
>   struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>   blk_status_t ret = BLK_STS_RESOURCE;
> - int nr_mapped;
> + int rc;
>   
>   if (blk_rq_nr_phys_segments(req) == 1) {
>   struct bio_vec bv = req_bvec(req);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 10/23] nvme-pci: check DMA ops when indicating support for PCI P2PDMA

2021-12-13 Thread Chaitanya Kulkarni via iommu
On 11/17/21 1:53 PM, Logan Gunthorpe wrote:
> Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
> replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
> flags can be checked for PCI P2PDMA support.
> 
> Signed-off-by: Logan Gunthorpe 
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 04/23] PCI/P2PDMA: Expose pci_p2pdma_map_type()

2021-12-13 Thread Chaitanya Kulkarni via iommu
On 11/17/21 1:53 PM, Logan Gunthorpe wrote:
> pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
> implementation because it will need to determine the mapping type
> ahead of actually doing the mapping to create the actual IOMMU mapping.
> 
> Prototypes for this helper are added to dma-map-ops.h as they are only
> useful to dma map implementations and don't need to pollute the public
> pci-p2pdma header
> 
> Signed-off-by: Logan Gunthorpe 
> Acked-by: Bjorn Helgaas 
> Reviewed-by: Jason Gunthorpe 
> ---

The documentation looks much better from reviewer point of view.

Looks good.

Reviewed-by: Chaitanya Kulkarni 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 03/23] PCI/P2PDMA: Attempt to set map_type if it has not been set

2021-12-13 Thread Chaitanya Kulkarni via iommu
On 11/17/21 1:53 PM, Logan Gunthorpe wrote:
> Attempt to find the mapping type for P2PDMA pages on the first
> DMA map attempt if it has not been done ahead of time.
> 
> Previously, the mapping type was expected to be calculated ahead of
> time, but if pages are to come from userspace then there's no
> way to ensure the path was checked ahead of time.
> 
> This change will calculate the mapping type if it hasn't pre-calculated
> so it is no longer invalid to call pci_p2pdma_map_sg() before the mapping
> type is calculated, so drop the WARN_ON when that is he case.
> 
> Signed-off-by: Logan Gunthorpe 
> Acked-by: Bjorn Helgaas 
> ---

Perhaps a comment would be nice in the default case in
pci_p2pdma_map_sg_attrs() where you have removed the WARN_ON_ONCE().

Either way, looks good.

Reviewed-by: Chaitanya Kulkarni 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 02/23] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL

2021-12-13 Thread Chaitanya Kulkarni via iommu
On 11/17/21 1:53 PM, Logan Gunthorpe wrote:
> Make use of the third free LSB in scatterlist's page_link on 64bit systems.
> 
> The extra bit will be used by dma_[un]map_sg_p2pdma() to determine when a
> given SGL segments dma_address points to a PCI bus address.
> dma_unmap_sg_p2pdma() will need to perform different cleanup when a
> segment is marked as a bus address.
> 
> Create a CONFIG_NEED_SG_DMA_BUS_ADDR_FLAG bool which depends on
> CONFIG_64BIT (so there is space in the page link for the new flag).
> CONFIG_PCI_P2PDMA will then depend on this so this means PCI P2PDMA will
> require CONFIG_64BIT. This should be acceptable as the majority of P2PDMA
> use cases are restricted to newer root complexes and roughly require the
> extra address space for memory BARs used in the transactions.
> 
> Signed-off-by: Logan Gunthorpe 

Looks good.

Reviewed-by: Chaitanya Kulkarni 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 01/23] lib/scatterlist: cleanup macros into static inline functions

2021-12-13 Thread Chaitanya Kulkarni via iommu
On 11/17/21 1:53 PM, Logan Gunthorpe wrote:
> Convert the sg_is_chain(), sg_is_last() and sg_chain_ptr() macros
> into static inline functions. There's no reason for these to be macros
> and static inline are generally preferred these days.
> 
> Also introduce the SG_PAGE_LINK_MASK define so the P2PDMA work, which is
> adding another bit to this mask, can do so more easily.
> 
> Suggested-by: Jason Gunthorpe 
> Signed-off-by: Logan Gunthorpe 
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni 



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 09/20] nvme-pci: check DMA ops when indicating support for PCI P2PDMA

2021-09-30 Thread Chaitanya Kulkarni via iommu


>>
>> Is this new ops only needed for the PCIe transport ? or do you have
>> following patches to use this op for the other transports ?
> 
> No, I don't think this will make sense for transports that are not based
> on PCI devices.
> 
>> If it is only needed for the PCIe then we need to find a way to
>> not add this somehow...
> 
> I don't see how we can do that. The core code needs to know whether the
> transport supports this and must have an operation to query it.
> 

Okay.

> Logan
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 09/20] nvme-pci: check DMA ops when indicating support for PCI P2PDMA

2021-09-30 Thread Chaitanya Kulkarni via iommu
Logan,

> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 7efb31b87f37..916750a54f60 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3771,7 +3771,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
> unsigned nsid,
>  blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);
> 
>  blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
> -   if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
> +   if (ctrl->ops->supports_pci_p2pdma &&
> +   ctrl->ops->supports_pci_p2pdma(ctrl))
>  blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
> 
>  ns->ctrl = ctrl;
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 9871c0c9374c..fb9bfc52a6d7 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -477,7 +477,6 @@ struct nvme_ctrl_ops {
>  unsigned int flags;
>   #define NVME_F_FABRICS (1 << 0)
>   #define NVME_F_METADATA_SUPPORTED  (1 << 1)
> -#define NVME_F_PCI_P2PDMA  (1 << 2)
>  int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
>  int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
>  int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
> @@ -485,6 +484,7 @@ struct nvme_ctrl_ops {
>  void (*submit_async_event)(struct nvme_ctrl *ctrl);
>  void (*delete_ctrl)(struct nvme_ctrl *ctrl);
>  int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
> +   bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl);
>   };
> 

Is this new ops only needed for the PCIe transport ? or do you have 
following patches to use this op for the other transports ?

If it is only needed for the PCIe then we need to find a way to
not add this somehow...

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 01/20] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL

2021-09-30 Thread Chaitanya Kulkarni via iommu
Logan,

> +/*
> + * bit 2 is the third free bit in the page_link on 64bit systems which
> + * is used by dma_unmap_sg() to determine if the dma_address is a PCI
> + * bus address when doing P2PDMA.
> + * Note: CONFIG_PCI_P2PDMA depends on CONFIG_64BIT because of this.
> + */
> +
> +#ifdef CONFIG_PCI_P2PDMA
> +#define SG_DMA_PCI_P2PDMA  0x04UL
> +#else
> +#define SG_DMA_PCI_P2PDMA  0x00UL
> +#endif
> +
> +#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END | SG_DMA_PCI_P2PDMA)
> +

You are doing two things in one patch :-
1. Introducing a new macro to replace the current macros.
2. Adding a new member to those macros.

shouldn't this be split into two patches where you introduce a
macro SG_PAGE_LINK_MASK (SG_CHAIN | SG_END) in prep patch and
update the SF_PAGE_LINK_MASK with SG_DMA_PCI_P2PDMA with related
code?

OR

Is there a reason why it is not split ?

-ck
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 01/20] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL

2021-09-30 Thread Chaitanya Kulkarni via iommu
> +/**
> + * sg_unmark_pci_p2pdma - Unmark the scatterlist entry for PCI p2pdma
> + * @sg: SG entryScatterlist
> + *
> + * Description:
> + *   Clears the PCI P2PDMA mark
> + **/
nit:- Probably want to add '.' above.
> +static inline void sg_dma_unmark_pci_p2pdma(struct scatterlist *sg)
> +{
> +   sg->page_link &= ~SG_DMA_PCI_P2PDMA;
> +}
> +
>   /**
>* sg_phys - Return physical address of an sg entry
>* @sg: SG entry
> --
> 2.30.2
> 

either ways with or without split, looks good.

Reviewed-by: Chaitanya Kulkarni 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu