Re: [PATCH v2] dmaengine: k3dma: use the correct HiSilicon copyright

2021-04-01 Thread Zhangfei Gao




On 2021/4/1 下午7:50, Hao Fang wrote:

s/Hisilicon/HiSilicon/g.
It should use capital S, according to the official website.

Signed-off-by: Hao Fang 


Thanks for the patch.

Acked-by:  Zhangfei Gao 

---
V2:
  -remove the terms of use link.
---
  drivers/dma/k3dma.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/k3dma.c b/drivers/dma/k3dma.c
index d0b2e60..ecdaada9 100644
--- a/drivers/dma/k3dma.c
+++ b/drivers/dma/k3dma.c
@@ -1,7 +1,7 @@
  // SPDX-License-Identifier: GPL-2.0-only
  /*
   * Copyright (c) 2013 - 2015 Linaro Ltd.
- * Copyright (c) 2013 Hisilicon Limited.
+ * Copyright (c) 2013 HiSilicon Limited.
   */
  #include 
  #include 
@@ -1039,6 +1039,6 @@ static struct platform_driver k3_pdma_driver = {
  
  module_platform_driver(k3_pdma_driver);
  
-MODULE_DESCRIPTION("Hisilicon k3 DMA Driver");

+MODULE_DESCRIPTION("HiSilicon k3 DMA Driver");
  MODULE_ALIAS("platform:k3dma");
  MODULE_LICENSE("GPL v2");




Re: [PATCH] mmc: dw_mmc-k3: use the correct HiSilicon copyright

2021-03-31 Thread Zhangfei Gao
On Tue, Mar 30, 2021 at 2:46 PM Hao Fang  wrote:
>
> s/Hisilicon/HiSilicon/g.
> It should use capital S, according to
> https://www.hisilicon.com/en/terms-of-use.
>
> Signed-off-by: Hao Fang 

Thanks for the fix.

Acked-by: Zhangfei Gao 

> ---
>  drivers/mmc/host/dw_mmc-k3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/host/dw_mmc-k3.c b/drivers/mmc/host/dw_mmc-k3.c
> index 29d2494..0311a37 100644
> --- a/drivers/mmc/host/dw_mmc-k3.c
> +++ b/drivers/mmc/host/dw_mmc-k3.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0-or-later
>  /*
>   * Copyright (c) 2013 Linaro Ltd.
> - * Copyright (c) 2013 Hisilicon Limited.
> + * Copyright (c) 2013 HiSilicon Limited.
>   */
>
>  #include 
> --
> 2.8.1
>


Re: [PATCH v2 3/3] PCI: set dma-can-stall for HiSilicon chip

2021-03-08 Thread Zhangfei Gao

Hi, Krzysztof

On 2021/3/8 上午1:54, Krzysztof Wilczyński wrote:

Hi,

[...]

Property dma-can-stall depends on patchset
https://lore.kernel.org/linux-iommu/20210108145217.2254447-1-jean-phili...@linaro.org/

[...]

If you plan to post another version of this patch to include the above
link into the commit message or reference to the commit itself, as
Jean-Philippe's series can already be included in the mainline (since it
has been a while now from when this series was originally posted), then
I have a favour to ask - would you also be able to also capitalise the
subject line (so that it's consistent) and change "chip" to "chips"
since there are two you mention in the commit message.



Have sent another version with the changes, thanks for the remind.
I was waiting for Jean's patchest,
https://lore.kernel.org/linux-iommu/20210302092644.2553014-1-jean-phili...@linaro.org/
Though the quirks patches can be applied and build directly on 5.12-rc1.

Thanks


[PATCH v3 3/3] PCI: Set dma-can-stall for HiSilicon chips

2021-03-08 Thread Zhangfei Gao
HiSilicon KunPeng920 and KunPeng930 have devices appear as PCI but are
actually on the AMBA bus. These fake PCI devices can support SVA via
SMMU stall feature, by setting dma-can-stall for ACPI platforms.

Property dma-can-stall depends on patchset
https://lore.kernel.org/linux-iommu/20210302092644.2553014-1-jean-phili...@linaro.org/

Signed-off-by: Zhangfei Gao 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Zhou Wang 
---
 drivers/pci/quirks.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 873d27f..b866cdf 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1827,10 +1827,23 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_HUAWEI, 
0x1610, PCI_CLASS_BRIDGE_PCI
 
 static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
 {
+   struct property_entry properties[] = {
+   PROPERTY_ENTRY_BOOL("dma-can-stall"),
+   {},
+   };
+
if (pdev->revision != 0x21 && pdev->revision != 0x30)
return;
 
pdev->pasid_no_tlp = 1;
+
+   /*
+* Set the dma-can-stall property on ACPI platforms. Device tree
+* can set it directly.
+*/
+   if (!pdev->dev.of_node &&
+   device_add_properties(>dev, properties))
+   pci_warn(pdev, "could not add stall property");
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);
-- 
2.9.5



[PATCH v3 2/3] PCI: Add a quirk to set pasid_no_tlp for HiSilicon chips

2021-03-08 Thread Zhangfei Gao
HiSilicon KunPeng920 and KunPeng930 have devices appear as PCI but are
actually on the AMBA bus. These fake PCI devices have PASID capability
though not supporting TLP.

Add a quirk to set pasid_no_tlp for these devices.

Signed-off-by: Zhangfei Gao 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Zhou Wang 
---
 drivers/pci/quirks.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e..873d27f 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1825,6 +1825,20 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
PCI_DEVICE_ID_INTEL_E7525_MCH,  quir
 
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_HUAWEI, 0x1610, 
PCI_CLASS_BRIDGE_PCI, 8, quirk_pcie_mch);
 
+static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
+{
+   if (pdev->revision != 0x21 && pdev->revision != 0x30)
+   return;
+
+   pdev->pasid_no_tlp = 1;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa255, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa256, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa258, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa259, quirk_huawei_pcie_sva);
+
 /*
  * It's possible for the MSI to get corrupted if SHPC and ACPI are used
  * together on certain PXH-based systems.
-- 
2.9.5



[PATCH v3 1/3] PCI: PASID can be enabled without TLP prefix

2021-03-08 Thread Zhangfei Gao
A PASID-like feature is implemented on AMBA without using TLP prefixes
and these devices have PASID capability though not supporting TLP.
Adding a pasid_no_tlp bit for "PASID works without TLP prefixes" and
pci_enable_pasid() checks pasid_no_tlp as well as eetlp_prefix_path.

Suggested-by: Bjorn Helgaas 
Signed-off-by: Zhangfei Gao 
---
 drivers/pci/ats.c   | 2 +-
 include/linux/pci.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 793d381..88f981b 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -380,7 +380,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
if (WARN_ON(pdev->pasid_enabled))
return -EBUSY;
 
-   if (!pdev->eetlp_prefix_path)
+   if (!pdev->eetlp_prefix_path && !pdev->pasid_no_tlp)
return -EINVAL;
 
if (!pasid)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 86c799c..1daa943 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -388,6 +388,7 @@ struct pci_dev {
   supported from root to here */
u16 l1ss;   /* L1SS Capability pointer */
 #endif
+   unsigned intpasid_no_tlp:1; /* PASID works without TLP 
Prefix */
unsigned inteetlp_prefix_path:1;/* End-to-End TLP Prefix */
 
pci_channel_state_t error_state;/* Current connectivity state */
-- 
2.9.5



[PATCH v3 0/3] PCI: Add a quirk to enable SVA for HiSilicon chip

2021-03-08 Thread Zhangfei Gao
HiSilicon KunPeng920 and KunPeng930 have devices appear as PCI but are
actually on the AMBA bus. These fake PCI devices have PASID capability
though not supporting TLP.

Add a quirk to set pasid_no_tlp and dma-can-stall for these devices.

v3:
Rebase to Linux 5.12-rc1
Change commit msg adding:
Property dma-can-stall depends on patchset
https://lore.kernel.org/linux-iommu/20210302092644.2553014-1-jean-phili...@linaro.org/

By the way the patchset can directly applied on 5.12-rc1 and build successfully 
though
without the dependent patchset.

v2:
Add a new pci_dev bit: pasid_no_tlp, suggested by Bjorn 
"Apparently these devices have a PASID capability.  I think you should
add a new pci_dev bit that is specific to this idea of "PASID works
without TLP prefixes" and then change pci_enable_pasid() to look at
that bit as well as eetlp_prefix_path."
https://lore.kernel.org/linux-pci/20210112170230.GA1838341@bjorn-Precision-5520/

Zhangfei Gao (3):
  PCI: PASID can be enabled without TLP prefix
  PCI: Add a quirk to set pasid_no_tlp for HiSilicon chips
  PCI: Set dma-can-stall for HiSilicon chips

 drivers/pci/ats.c|  2 +-
 drivers/pci/quirks.c | 27 +++
 include/linux/pci.h  |  1 +
 3 files changed, 29 insertions(+), 1 deletion(-)

-- 
2.9.5



[PATCH v2 2/3] PCI: Add a quirk to set pasid_no_tlp for HiSilicon chip

2021-01-18 Thread Zhangfei Gao
HiSilicon KunPeng920 and KunPeng930 have devices appear as PCI but are
actually on the AMBA bus. These fake PCI devices have PASID capability
though not supporting TLP.

Add a quirk to set pasid_no_tlp for these devices.

Signed-off-by: Zhangfei Gao 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Zhou Wang 
---
 drivers/pci/quirks.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e..873d27f 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1825,6 +1825,20 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
PCI_DEVICE_ID_INTEL_E7525_MCH,  quir
 
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_HUAWEI, 0x1610, 
PCI_CLASS_BRIDGE_PCI, 8, quirk_pcie_mch);
 
+static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
+{
+   if (pdev->revision != 0x21 && pdev->revision != 0x30)
+   return;
+
+   pdev->pasid_no_tlp = 1;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa255, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa256, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa258, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa259, quirk_huawei_pcie_sva);
+
 /*
  * It's possible for the MSI to get corrupted if SHPC and ACPI are used
  * together on certain PXH-based systems.
-- 
2.7.4



[PATCH v2 3/3] PCI: set dma-can-stall for HiSilicon chip

2021-01-18 Thread Zhangfei Gao
HiSilicon KunPeng920 and KunPeng930 have devices appear as PCI but are
actually on the AMBA bus. These fake PCI devices can support SVA via
SMMU stall feature, by setting dma-can-stall for ACPI platforms.

Signed-off-by: Zhangfei Gao 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Zhou Wang 
---
Property dma-can-stall depends on patchset
https://lore.kernel.org/linux-iommu/20210108145217.2254447-1-jean-phili...@linaro.org/

 drivers/pci/quirks.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 873d27f..b866cdf 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1827,10 +1827,23 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_HUAWEI, 
0x1610, PCI_CLASS_BRIDGE_PCI
 
 static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
 {
+   struct property_entry properties[] = {
+   PROPERTY_ENTRY_BOOL("dma-can-stall"),
+   {},
+   };
+
if (pdev->revision != 0x21 && pdev->revision != 0x30)
return;
 
pdev->pasid_no_tlp = 1;
+
+   /*
+* Set the dma-can-stall property on ACPI platforms. Device tree
+* can set it directly.
+*/
+   if (!pdev->dev.of_node &&
+   device_add_properties(>dev, properties))
+   pci_warn(pdev, "could not add stall property");
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);
-- 
2.7.4



[PATCH v2 1/3] PCI: PASID can be enabled without TLP prefix

2021-01-18 Thread Zhangfei Gao
A PASID-like feature is implemented on AMBA without using TLP prefixes
and these devices have PASID capability though not supporting TLP.
Adding a pasid_no_tlp bit for "PASID works without TLP prefixes" and
pci_enable_pasid() checks pasid_no_tlp as well as eetlp_prefix_path.

Suggested-by: Bjorn Helgaas 
Signed-off-by: Zhangfei Gao 
---
 drivers/pci/ats.c   | 2 +-
 include/linux/pci.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index e36d601..b67b1b1 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -386,7 +386,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
if (WARN_ON(pdev->pasid_enabled))
return -EBUSY;
 
-   if (!pdev->eetlp_prefix_path)
+   if (!pdev->eetlp_prefix_path && !pdev->pasid_no_tlp)
return -EINVAL;
 
if (!pasid)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index f1f26f8..ac1c735 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -388,6 +388,7 @@ struct pci_dev {
   supported from root to here */
u16 l1ss;   /* L1SS Capability pointer */
 #endif
+   unsigned intpasid_no_tlp:1; /* PASID works without TLP 
Prefix */
unsigned inteetlp_prefix_path:1;/* End-to-End TLP Prefix */
 
pci_channel_state_t error_state;/* Current connectivity state */
-- 
2.7.4



[PATCH v2 0/3] PCI: Add a quirk to enable SVA for HiSilicon chip

2021-01-18 Thread Zhangfei Gao
HiSilicon KunPeng920 and KunPeng930 have devices appear as PCI but are
actually on the AMBA bus. These fake PCI devices have PASID capability
though not supporting TLP.

Add a quirk to set pasid_no_tlp and dma-can-stall for these devices.

v2:
Add a new pci_dev bit: pasid_no_tlp, suggested by Bjorn 
"Apparently these devices have a PASID capability.  I think you should
add a new pci_dev bit that is specific to this idea of "PASID works
without TLP prefixes" and then change pci_enable_pasid() to look at
that bit as well as eetlp_prefix_path."
https://lore.kernel.org/linux-pci/20210112170230.GA1838341@bjorn-Precision-5520/

Zhangfei Gao (3):
  PCI: PASID can be enabled without TLP prefix
  PCI: Add a quirk to set pasid_no_tlp for HiSilicon chip
  PCI: set dma-can-stall for HiSilicon chip

 drivers/pci/ats.c|  2 +-
 drivers/pci/quirks.c | 27 +++
 include/linux/pci.h  |  1 +
 3 files changed, 29 insertions(+), 1 deletion(-)

-- 
2.7.4



Re: [PATCH] PCI: Add a quirk to enable SVA for HiSilicon chip

2021-01-13 Thread Zhangfei Gao

Hi, Bjorn

Thanks for the suggestion.

On 2021/1/13 上午1:02, Bjorn Helgaas wrote:

On Tue, Jan 12, 2021 at 02:49:52PM +0800, Zhangfei Gao wrote:

HiSilicon KunPeng920 and KunPeng930 have devices appear as PCI but are
actually on the AMBA bus. These fake PCI devices can not support tlp
and have to enable SMMU stall mode to use the SVA feature.

Add a quirk to set dma-can-stall property and enable tlp for these devices.

s/tlp/TLP/

I don't think "enable TLP" really captures what's going on here.  You
must be referring to the fact that you set pdev->eetlp_prefix_path.

That is normally set by pci_configure_eetlp_prefix() if the Device
Capabilities 2 register has the End-End TLP Prefix Supported bit set
*and* all devices in the upstream path also have it set.

The only place we currently test eetlp_prefix_path is in
pci_enable_pasid().  In PCIe, PASID is implemented using the PASID TLP
prefix, so we only enable PASID if TLP prefixes are supported.

If I understand correctly, a PASID-like feature is implemented on AMBA
without using TLP prefixes, and setting eetlp_prefix_path makes that
work.

Yes, that's the requirement.


I don't think you should do this by setting eetlp_prefix_path because
TLP prefixes are used for other features, e.g., TPH.  Setting
eetlp_prefix_path implies these devices can also support things like
TLP, and I don't think that's necessarily true.

Thanks for the remainder.


Apparently these devices have a PASID capability.  I think you should
add a new pci_dev bit that is specific to this idea of "PASID works
without TLP prefixes" and then change pci_enable_pasid() to look at
that bit as well as eetlp_prefix_path.

That's great, this solution is much simpler.
we can set the bit before pci_enable_pasid.


It seems like dma-can-stall is a separate thing from PASID?  If so,
this should be two separate patches.

If they can be separated, I would probably make the PASID thing the
first patch, and then the "dma-can-stall" can be on its own as a
broken DT workaround (if that's what it is) and it's easier to remove
that if it become obsolete.


Signed-off-by: Zhangfei Gao 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Zhou Wang 
---
Property dma-can-stall depends on patchset
https://lore.kernel.org/linux-iommu/20210108145217.2254447-1-jean-phili...@linaro.org/

drivers/pci/quirks.c | 25 +
  1 file changed, 25 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e..a27f327 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1825,6 +1825,31 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
PCI_DEVICE_ID_INTEL_E7525_MCH,  quir
  
  DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_HUAWEI, 0x1610, PCI_CLASS_BRIDGE_PCI, 8, quirk_pcie_mch);
  
+static void quirk_huawei_pcie_sva(struct pci_dev *pdev)

+{
+   struct property_entry properties[] = {
+   PROPERTY_ENTRY_BOOL("dma-can-stall"),
+   {},
+   };
+
+   if ((pdev->revision != 0x21) && (pdev->revision != 0x30))
+   return;
+
+   pdev->eetlp_prefix_path = 1;
+
+   /* Device-tree can set the stall property */
+   if (!pdev->dev.of_node &&
+   device_add_properties(>dev, properties))

Does this mean "dma-can-stall" *can* be set via DT, and if it is, this
quirk is not needed?  So is this quirk basically a workaround for an
old or broken DT?
The quirk is still needed for uefi case, since uefi can not describe the 
endpoints (peripheral devices).



+   pci_warn(pdev, "could not add stall property");
+}
+

Remove this blank line to follow the style of the rest of the file.


+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa255, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa256, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa258, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa259, quirk_huawei_pcie_sva);
+
  /*
   * It's possible for the MSI to get corrupted if SHPC and ACPI are used
   * together on certain PXH-based systems.


How about changes like this

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c

index 68f53f7..886ea26 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2466,6 +2466,9 @@ static int arm_smmu_enable_pasid(struct 
arm_smmu_master *master)

 if (num_pasids <= 0)
     return num_pasids;

+    if (master->stall_enabled)
+        pdev->pasid_no_tlp = 1;
+
 ret = pci_enable_pasid(pdev, features);
 if (ret) {
     dev_err(>dev, "Failed to enable PASID\n");
@@ -2860,6 +2863,11 @@ static struct iommu

[PATCH] PCI: Add a quirk to enable SVA for HiSilicon chip

2021-01-11 Thread Zhangfei Gao
HiSilicon KunPeng920 and KunPeng930 have devices appear as PCI but are
actually on the AMBA bus. These fake PCI devices can not support tlp
and have to enable SMMU stall mode to use the SVA feature.

Add a quirk to set dma-can-stall property and enable tlp for these devices.

Signed-off-by: Zhangfei Gao 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Zhou Wang 
---
Property dma-can-stall depends on patchset
https://lore.kernel.org/linux-iommu/20210108145217.2254447-1-jean-phili...@linaro.org/

drivers/pci/quirks.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e..a27f327 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1825,6 +1825,31 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
PCI_DEVICE_ID_INTEL_E7525_MCH,  quir
 
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_HUAWEI, 0x1610, 
PCI_CLASS_BRIDGE_PCI, 8, quirk_pcie_mch);
 
+static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
+{
+   struct property_entry properties[] = {
+   PROPERTY_ENTRY_BOOL("dma-can-stall"),
+   {},
+   };
+
+   if ((pdev->revision != 0x21) && (pdev->revision != 0x30))
+   return;
+
+   pdev->eetlp_prefix_path = 1;
+
+   /* Device-tree can set the stall property */
+   if (!pdev->dev.of_node &&
+   device_add_properties(>dev, properties))
+   pci_warn(pdev, "could not add stall property");
+}
+
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa255, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa256, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa258, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa259, quirk_huawei_pcie_sva);
+
 /*
  * It's possible for the MSI to get corrupted if SHPC and ACPI are used
  * together on certain PXH-based systems.
-- 
2.7.4



Re: [PATCH] uacce: Use kobj_to_dev() instead of container_of()

2020-08-19 Thread Zhangfei Gao




On 2020/8/20 上午10:16, Tian Tao wrote:

Use kobj_to_dev() instead of container_of()

Signed-off-by: Tian Tao 
Reviewed-by: Zhou Wang 

Acked-by: Zhangfei Gao 

Thanks

---
  drivers/misc/uacce/uacce.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index a5b8dab..a9da7b1 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -370,7 +370,7 @@ static struct attribute *uacce_dev_attrs[] = {
  static umode_t uacce_dev_is_visible(struct kobject *kobj,
struct attribute *attr, int n)
  {
-   struct device *dev = container_of(kobj, struct device, kobj);
+   struct device *dev = kobj_to_dev(kobj);
struct uacce_device *uacce = to_uacce_device(dev);
  
  	if (((attr == _attr_region_mmio_size.attr) &&




Re: [PATCH] uacce: fix some coding styles

2020-07-30 Thread Zhangfei Gao




On 2020/7/30 下午2:13, Kai Ye wrote:

1. delete some redundant code.
2. modify the module author information.

Signed-off-by: Kai Ye 

Thanks Kai

Acked-by: Zhangfei Gao 

Thanks


Re: [PATCH] uacce: fix some coding styles

2020-07-21 Thread Zhangfei Gao




On 2020/7/20 下午3:18, Kai Ye wrote:

1. add some parameter check.
2. delete some redundant code.
3. modify the module author information.

Signed-off-by: Kai Ye 
Reviewed-by: Zhou Wang 

Thanks Kai.

---
  drivers/misc/uacce/uacce.c | 28 +---
  1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 107028e..2e1af58 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -63,8 +63,12 @@ static long uacce_fops_unl_ioctl(struct file *filep,
 unsigned int cmd, unsigned long arg)
  {
struct uacce_queue *q = filep->private_data;
-   struct uacce_device *uacce = q->uacce;
+   struct uacce_device *uacce;
+
+   if (WARN_ON(!q))
+   return -EINVAL;
WARN_ON should not be used in uacce, instead error can be printed in 
user space driver.
Error should not be printed in kernel log as pasid can be used by unpriv 
user.


And I think we do not need check filep->private_data.
The fd is double checked in __fget_files.

Thanks


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-23 Thread Zhangfei Gao

Hi, Joerg

On 2020/6/22 下午7:55, Joerg Roedel wrote:

On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:

+++ b/drivers/iommu/iommu.c
@@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
fwnode_handle *iommu_fwnode,
     fwspec->iommu_fwnode = iommu_fwnode;
     fwspec->ops = ops;
     dev_iommu_fwspec_set(dev, fwspec);
+
+   if (dev_is_pci(dev))
+   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
+

That's not going to fly, I don't think we should run the fixups twice,
and they should not be run from IOMMU code. Is the only reason for this
second pass that iommu_fwspec is not yet allocated when it runs the
first time? I ask because it might be easier to just allocate the struct
earlier then.

Thanks for looking this.

Yes, it is the only reason calling fixup secondly after iommu_fwspec is 
allocated.


The first time fixup final is very early in pci_bus_add_device.
If allocating iommu_fwspec earlier, it maybe in pci_alloc_dev.
And assigning ops still in iommu_fwspec_init.
Have tested it works.
Not sure is it acceptable?

Alternatively, adding can_stall to struct pci_dev is simple but ugly too,
since pci does not know stall now.


Thanks





Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-18 Thread Zhangfei Gao

Hi, Bjorn

On 2020/6/16 上午7:52, Bjorn Helgaas wrote:

On Sat, Jun 13, 2020 at 10:30:56PM +0800, Zhangfei Gao wrote:

On 2020/6/11 下午9:44, Bjorn Helgaas wrote:

+++ b/drivers/iommu/iommu.c

@@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
fwnode_handle *iommu_fwnode,
 fwspec->iommu_fwnode = iommu_fwnode;
 fwspec->ops = ops;
 dev_iommu_fwspec_set(dev, fwspec);
+
+   if (dev_is_pci(dev))
+   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
+

Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
Will send this when 5.8-rc1 is open.

Wait, this whole fixup approach seems wrong to me.  No matter how you
do the fixup, it's still a fixup, which means it requires ongoing
maintenance.  Surely we don't want to have to add the Vendor/Device ID
for every new AMBA device that comes along, do we?


Here the fake pci device has standard PCI cfg space, but physical
implementation is base on AMBA
They can provide pasid feature.
However,
1, does not support tlp since they are not real pci devices.
2. does not support pri, instead support stall (provided by smmu)
And stall is not a pci feature, so it is not described in struct pci_dev,
but in struct iommu_fwspec.
So we use this fixup to tell pci system that the devices can support stall,
and hereby support pasid.

This did not answer my question.  Are you proposing that we update a
quirk every time a new AMBA device is released?  I don't think that
would be a good model.

Yes, you are right, but we do not have any better idea yet.
Currently we have three fake pci devices, which support stall and pasid.
We have to let pci system know the device can support pasid, because of
stall feature, though not support pri.
Do you have any other ideas?

It sounds like the best way would be to allocate a PCI capability for it, so
detection can be done through config space, at least in future devices,
or possibly after a firmware update if the config space in your system
is controlled by firmware somewhere.  Once there is a proper mechanism
to do this, using fixups to detect the early devices that don't use that
should be uncontroversial. I have no idea what the process or timeline
is to add new capabilities into the PCIe specification, or if this one
would be acceptable to the PCI SIG at all.

That sounds like a possibility.  The spec already defines a
Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
be a candidate.

Will investigate this, thanks Bjorn

FWIW, there's also a Vendor-Specific Capability that can appear in the
first 256 bytes of config space (the Vendor-Specific Extended
Capability must appear in the "Extended Configuration Space" from
0x100-0xfff).

Unfortunately our silicon does not have either Vendor-Specific Capability or
Vendor-Specific Extended Capability.

Studied commit 8531e283bee66050734fb0e89d53e85fd5ce24a4
Looks this method requires adding member (like can_stall) to struct pci_dev,
looks difficult.

The problem is that we don't want to add device IDs every time a new
chip comes out.  Adding one or two device IDs for silicon that's
already released is not a problem as long as you have a strategy for
*future* devices so they don't require a quirk.


If detection cannot be done through PCI config space, the next best
alternative is to pass auxiliary data through firmware. On DT based
machines, you can list non-hotpluggable PCIe devices and add custom
properties that could be read during device enumeration. I assume
ACPI has something similar, but I have not done that.

Yes, thanks Arnd

ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate.  I
like this better than a PCI capability because the property you need
to expose is not a PCI property.

_DSM may not workable, since it is working in runtime.
We need stall information in init stage, neither too early (after allocation
of iommu_fwspec)
nor too late (before arm_smmu_add_device ).

I'm not aware of a restriction on when _DSM can be evaluated.  I'm
looking at ACPI v6.3, sec 9.1.1.  Are you seeing something different?

DSM method seems requires vendor specific guid, and code would be vendor
specific.

_DSM indeed requires a vendor-specific UUID, precisely *because*
vendors are free to define their own functionality without requiring
changes to the ACPI spec.  From the spec (ACPI v6.3, sec 9.1.1):

   New UUIDs may also be created by OEMs and IHVs for custom devices
   and other interface or device governing bodies (e.g. the PCI SIG),
   as long as the UUID is different from other published UUIDs.

Have studied _DSM method, two issues we met comparing using quirk.

1. Need change definition of either pci_host_bridge or pci_dev, like 
adding member can_stall,

while pci system does not know stall now.

a, pci devices do not have uuid: uuid need be described in dsdt, while 
pci devices are not

[PATCH] uacce: remove uacce_vma_fault

2020-06-15 Thread Zhangfei Gao
Fix NULL pointer error if removing uacce's parent module during app's
running. SIGBUS is already reported by do_page_fault, so uacce_vma_fault
is not needed. If providing vma_fault, vmf->page has to be filled as well,
required by __do_fault.

Reported-by: Jean-Philippe Brucker 
Signed-off-by: Zhangfei Gao 
---
 drivers/misc/uacce/uacce.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 107028e..aa91f69 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -179,14 +179,6 @@ static int uacce_fops_release(struct inode *inode, struct 
file *filep)
return 0;
 }
 
-static vm_fault_t uacce_vma_fault(struct vm_fault *vmf)
-{
-   if (vmf->flags & (FAULT_FLAG_MKWRITE | FAULT_FLAG_WRITE))
-   return VM_FAULT_SIGBUS;
-
-   return 0;
-}
-
 static void uacce_vma_close(struct vm_area_struct *vma)
 {
struct uacce_queue *q = vma->vm_private_data;
@@ -199,7 +191,6 @@ static void uacce_vma_close(struct vm_area_struct *vma)
 }
 
 static const struct vm_operations_struct uacce_vm_ops = {
-   .fault = uacce_vma_fault,
.close = uacce_vma_close,
 };
 
-- 
2.7.4



[PATCH v2] crypto: hisilicon - fix strncpy warning with strscpy

2020-06-14 Thread Zhangfei Gao
Use strscpy to fix the warning
warning: 'strncpy' specified bound 64 equals destination size

Reported-by: kernel test robot 
Signed-off-by: Zhangfei Gao 
---
v2: Use strscpy instead of strlcpy since better truncation handling
suggested by Herbert
Rebase to 5.8-rc1

 drivers/crypto/hisilicon/qm.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index 9bb263cec6c3..8ac293afa8ab 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -2179,8 +2179,12 @@ static int qm_alloc_uacce(struct hisi_qm *qm)
.flags = UACCE_DEV_SVA,
.ops = _qm_ops,
};
+   int ret;
 
-   strncpy(interface.name, pdev->driver->name, sizeof(interface.name));
+   ret = strscpy(interface.name, pdev->driver->name,
+ sizeof(interface.name));
+   if (ret < 0)
+   return -ENAMETOOLONG;
 
uacce = uacce_alloc(>dev, );
if (IS_ERR(uacce))
-- 
2.23.0



Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-13 Thread Zhangfei Gao




On 2020/6/11 下午9:44, Bjorn Helgaas wrote:

+++ b/drivers/iommu/iommu.c

@@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
fwnode_handle *iommu_fwnode,
fwspec->iommu_fwnode = iommu_fwnode;
fwspec->ops = ops;
dev_iommu_fwspec_set(dev, fwspec);
+
+   if (dev_is_pci(dev))
+   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
+

Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
Will send this when 5.8-rc1 is open.

Wait, this whole fixup approach seems wrong to me.  No matter how you
do the fixup, it's still a fixup, which means it requires ongoing
maintenance.  Surely we don't want to have to add the Vendor/Device ID
for every new AMBA device that comes along, do we?


Here the fake pci device has standard PCI cfg space, but physical
implementation is base on AMBA
They can provide pasid feature.
However,
1, does not support tlp since they are not real pci devices.
2. does not support pri, instead support stall (provided by smmu)
And stall is not a pci feature, so it is not described in struct pci_dev,
but in struct iommu_fwspec.
So we use this fixup to tell pci system that the devices can support stall,
and hereby support pasid.

This did not answer my question.  Are you proposing that we update a
quirk every time a new AMBA device is released?  I don't think that
would be a good model.

Yes, you are right, but we do not have any better idea yet.
Currently we have three fake pci devices, which support stall and pasid.
We have to let pci system know the device can support pasid, because of
stall feature, though not support pri.
Do you have any other ideas?

It sounds like the best way would be to allocate a PCI capability for it, so
detection can be done through config space, at least in future devices,
or possibly after a firmware update if the config space in your system
is controlled by firmware somewhere.  Once there is a proper mechanism
to do this, using fixups to detect the early devices that don't use that
should be uncontroversial. I have no idea what the process or timeline
is to add new capabilities into the PCIe specification, or if this one
would be acceptable to the PCI SIG at all.

That sounds like a possibility.  The spec already defines a
Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
be a candidate.

Will investigate this, thanks Bjorn

FWIW, there's also a Vendor-Specific Capability that can appear in the
first 256 bytes of config space (the Vendor-Specific Extended
Capability must appear in the "Extended Configuration Space" from
0x100-0xfff).

Unfortunately our silicon does not have either Vendor-Specific Capability or
Vendor-Specific Extended Capability.

Studied commit 8531e283bee66050734fb0e89d53e85fd5ce24a4
Looks this method requires adding member (like can_stall) to struct 
pci_dev, looks difficult.





If detection cannot be done through PCI config space, the next best
alternative is to pass auxiliary data through firmware. On DT based
machines, you can list non-hotpluggable PCIe devices and add custom
properties that could be read during device enumeration. I assume
ACPI has something similar, but I have not done that.

Yes, thanks Arnd

ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate.  I
like this better than a PCI capability because the property you need
to expose is not a PCI property.

_DSM may not workable, since it is working in runtime.
We need stall information in init stage, neither too early (after allocation
of iommu_fwspec)
nor too late (before arm_smmu_add_device ).

I'm not aware of a restriction on when _DSM can be evaluated.  I'm
looking at ACPI v6.3, sec 9.1.1.  Are you seeing something different?
DSM method seems requires vendor specific guid, and code would be vendor 
specific.

Except adding uuid to some spec like pci_acpi_dsm_guid.
obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), _acpi_dsm_guid, 1,
IGNORE_PCI_BOOT_CONFIG_DSM, NULL);


By the way, It would be a long time if we need modify either pcie
spec or acpi spec.  Can we use pci_fixup_device in iommu_fwspec_init
first, it is relatively simple and meet the requirement of platform
device using pasid, and they are already in product.

Neither the PCI Vendor-Specific Capability nor the ACPI _DSM requires
a spec change.  Both can be completely vendor-defined.

Adding vendor-specific code to common files looks a bit ugly.

Thanks



Re: [RFC PATCH] PCI: Remove End-End TLP as PASID dependency

2020-06-13 Thread Zhangfei Gao




On 2020/6/12 上午1:41, Bjorn Helgaas wrote:

[+cc Sinan]

On Wed, Jun 10, 2020 at 12:18:14PM +0800, Zhangfei Gao wrote:

Some platform devices appear as PCI and have PCI cfg space,
but are actually on the AMBA bus.
They can support PASID via smmu stall feature, but does not
support tlp since they are not real pci devices.
So remove tlp as a PASID dependency.

When you iterate on this, pay attention to things like:

   - Wrap paragraphs to 75 columns or so, so they fill the whole line
 but don't overflow when "git show" adds 4 spaces.

   - Leave a blank line between paragraphs.

   - Capitalize consistently: "SMMU", "PCI", "TLP".

   - Provide references to relevant spec sections, e.g., for the SMMU
 stall feature.

OK, Thanks Bjorn



Signed-off-by: Zhangfei Gao 
---
  drivers/pci/ats.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 390e92f..8e31278 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -344,9 +344,6 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
if (WARN_ON(pdev->pasid_enabled))
return -EBUSY;
  
-	if (!pdev->eetlp_prefix_path)

-   return -EINVAL;

No.  This would mean we might enable PASID on actual PCIe devices when
it is not safe to do so, as Jean-Philippe pointed out.

You cannot break actual PCIe devices just to make your non-PCIe device
work.

These devices do not support PASID as defined in the PCIe spec.  They
might support something *like* PASID, and you might be able to make
parts of the PCI core work with them, but you're going to have to deal
with the parts that don't follow the PCIe spec on your own.  That
might be quirks, it might be some sort of AMBA adaptation shim, I
don't know.  But it's not the responsibility of the PCI core to adapt
to them.

Understand now.
Will continue use quirk for this.

Thanks


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-10 Thread Zhangfei Gao




On 2020/6/10 上午12:49, Bjorn Helgaas wrote:

On Tue, Jun 09, 2020 at 11:15:06AM +0200, Arnd Bergmann wrote:

On Tue, Jun 9, 2020 at 6:02 AM Zhangfei Gao  wrote:

On 2020/6/9 上午12:41, Bjorn Helgaas wrote:

On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:

On 2020/6/6 上午7:19, Bjorn Helgaas wrote:

+++ b/drivers/iommu/iommu.c
@@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
fwnode_handle *iommu_fwnode,
   fwspec->iommu_fwnode = iommu_fwnode;
   fwspec->ops = ops;
   dev_iommu_fwspec_set(dev, fwspec);
+
+   if (dev_is_pci(dev))
+   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
+

Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
Will send this when 5.8-rc1 is open.

Wait, this whole fixup approach seems wrong to me.  No matter how you
do the fixup, it's still a fixup, which means it requires ongoing
maintenance.  Surely we don't want to have to add the Vendor/Device ID
for every new AMBA device that comes along, do we?


Here the fake pci device has standard PCI cfg space, but physical
implementation is base on AMBA
They can provide pasid feature.
However,
1, does not support tlp since they are not real pci devices.
2. does not support pri, instead support stall (provided by smmu)
And stall is not a pci feature, so it is not described in struct pci_dev,
but in struct iommu_fwspec.
So we use this fixup to tell pci system that the devices can support stall,
and hereby support pasid.

This did not answer my question.  Are you proposing that we update a
quirk every time a new AMBA device is released?  I don't think that
would be a good model.

Yes, you are right, but we do not have any better idea yet.
Currently we have three fake pci devices, which support stall and pasid.
We have to let pci system know the device can support pasid, because of
stall feature, though not support pri.
Do you have any other ideas?

It sounds like the best way would be to allocate a PCI capability for it, so
detection can be done through config space, at least in future devices,
or possibly after a firmware update if the config space in your system
is controlled by firmware somewhere.  Once there is a proper mechanism
to do this, using fixups to detect the early devices that don't use that
should be uncontroversial. I have no idea what the process or timeline
is to add new capabilities into the PCIe specification, or if this one
would be acceptable to the PCI SIG at all.

That sounds like a possibility.  The spec already defines a
Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
be a candidate.

Will investigate this, thanks Bjorn



If detection cannot be done through PCI config space, the next best
alternative is to pass auxiliary data through firmware. On DT based
machines, you can list non-hotpluggable PCIe devices and add custom
properties that could be read during device enumeration. I assume
ACPI has something similar, but I have not done that.

Yes, thanks Arnd

ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate.  I
like this better than a PCI capability because the property you need
to expose is not a PCI property.

_DSM may not workable, since it is working in runtime.
We need stall information in init stage, neither too early (after 
allocation of iommu_fwspec)

nor too late (before arm_smmu_add_device ).

By the way,
It would be a long time if we need modify either pcie spec or acpi spec.
Can we use pci_fixup_device in iommu_fwspec_init first, it is relatively 
simple
and meet the requirement of platform device using pasid, and they are 
already in product.


Thanks



Re: [RFC PATCH] PCI: Remove End-End TLP as PASID dependency

2020-06-10 Thread Zhangfei Gao




On 2020/6/10 下午3:46, Jean-Philippe Brucker wrote:

On Wed, Jun 10, 2020 at 12:18:14PM +0800, Zhangfei Gao wrote:

Some platform devices appear as PCI and have PCI cfg space,
but are actually on the AMBA bus.
They can support PASID via smmu stall feature, but does not
support tlp since they are not real pci devices.
So remove tlp as a PASID dependency.

Signed-off-by: Zhangfei Gao 
---
  drivers/pci/ats.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 390e92f..8e31278 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -344,9 +344,6 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
if (WARN_ON(pdev->pasid_enabled))
return -EBUSY;
  
-	if (!pdev->eetlp_prefix_path)

-   return -EINVAL;
-

This check is useful, and follows the PCI specification (4.0r1.0
2.2.10.2 End-End TLP Prefix Processing: "Software should ensure that TLPs
containing End-End TLP Prefixes are not sent to components that do not
support them.")

Thanks Jean,


Why not set the eetlp_prefix_path bit from a PCI quirk?  Unlike the stall
problem from the other thread, this one looks like a simple design mistake
that can be fixed easily in future iterations of the platform: just set
the "End-End TLP Prefix Supported" bit in the Device Capability 2 Register
of all bridges.

Yes, we can still set eetlp_prefix_path bit from a PCI quirk.

And we also have considered adding this bit in Device Capability 2 
Register in future silicon.
But we hesitated that it does reflect the real function: from register, 
it can support tlp, but in fact, it does not.


Thanks



[RFC PATCH] PCI: Remove End-End TLP as PASID dependency

2020-06-09 Thread Zhangfei Gao
Some platform devices appear as PCI and have PCI cfg space,
but are actually on the AMBA bus.
They can support PASID via smmu stall feature, but does not
support tlp since they are not real pci devices.
So remove tlp as a PASID dependency.

Signed-off-by: Zhangfei Gao 
---
 drivers/pci/ats.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 390e92f..8e31278 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -344,9 +344,6 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
if (WARN_ON(pdev->pasid_enabled))
return -EBUSY;
 
-   if (!pdev->eetlp_prefix_path)
-   return -EINVAL;
-
if (!pasid)
return -EINVAL;
 
-- 
2.7.4



Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-08 Thread Zhangfei Gao

Hi, Bjorn

On 2020/6/9 上午12:41, Bjorn Helgaas wrote:

On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:

On 2020/6/6 上午7:19, Bjorn Helgaas wrote:

On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:

On 2020/6/2 上午1:41, Bjorn Helgaas wrote:

On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:

On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:

Is this slowdown significant?  We already iterate over every device
when applying PCI_FIXUP_FINAL quirks, so if we used the existing
PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
adding two more iterations to the loop in pci_do_fixups() that tries
to match quirks against the current device.  I doubt that would be a
measurable slowdown.

I don't know how significant it is, but I remember people complaining
about adding new PCI quirks because it takes too long for them to run
them all. That was in the discussion about the quirk disabling ATS on
AMD Stoney systems.

So it probably depends on how many PCI devices are in the system whether
it causes any measureable slowdown.

I found this [1] from Paul Menzel, which was a slowdown caused by
quirk_usb_early_handoff().  I think the real problem is individual
quirks that take a long time.

The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
course, they're only run for matching devices anyway.  So I'd rather
keep them as PCI_FIXUP_FINAL than add a whole new phase.


Thanks Bjorn for taking time for this.
If so, it would be much simpler.

+++ b/drivers/iommu/iommu.c
@@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
fwnode_handle *iommu_fwnode,
      fwspec->iommu_fwnode = iommu_fwnode;
      fwspec->ops = ops;
      dev_iommu_fwspec_set(dev, fwspec);
+
+   if (dev_is_pci(dev))
+   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
+

Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
Will send this when 5.8-rc1 is open.

Wait, this whole fixup approach seems wrong to me.  No matter how you
do the fixup, it's still a fixup, which means it requires ongoing
maintenance.  Surely we don't want to have to add the Vendor/Device ID
for every new AMBA device that comes along, do we?


Here the fake pci device has standard PCI cfg space, but physical
implementation is base on AMBA
They can provide pasid feature.
However,
1, does not support tlp since they are not real pci devices.
2. does not support pri, instead support stall (provided by smmu)
And stall is not a pci feature, so it is not described in struct pci_dev,
but in struct iommu_fwspec.
So we use this fixup to tell pci system that the devices can support stall,
and hereby support pasid.

This did not answer my question.  Are you proposing that we update a
quirk every time a new AMBA device is released?  I don't think that
would be a good model.

Yes, you are right, but we do not have any better idea yet.
Currently we have three fake pci devices, which support stall and pasid.
We have to let pci system know the device can support pasid, because of 
stall feature, though not support pri.

Do you have any other ideas?

Thanks


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-07 Thread Zhangfei Gao

Hi, Bjorn

On 2020/6/6 上午7:19, Bjorn Helgaas wrote:

On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:

On 2020/6/2 上午1:41, Bjorn Helgaas wrote:

On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:

On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:

Is this slowdown significant?  We already iterate over every device
when applying PCI_FIXUP_FINAL quirks, so if we used the existing
PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
adding two more iterations to the loop in pci_do_fixups() that tries
to match quirks against the current device.  I doubt that would be a
measurable slowdown.

I don't know how significant it is, but I remember people complaining
about adding new PCI quirks because it takes too long for them to run
them all. That was in the discussion about the quirk disabling ATS on
AMD Stoney systems.

So it probably depends on how many PCI devices are in the system whether
it causes any measureable slowdown.

I found this [1] from Paul Menzel, which was a slowdown caused by
quirk_usb_early_handoff().  I think the real problem is individual
quirks that take a long time.

The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
course, they're only run for matching devices anyway.  So I'd rather
keep them as PCI_FIXUP_FINAL than add a whole new phase.


Thanks Bjorn for taking time for this.
If so, it would be much simpler.

+++ b/drivers/iommu/iommu.c
@@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
fwnode_handle *iommu_fwnode,
     fwspec->iommu_fwnode = iommu_fwnode;
     fwspec->ops = ops;
     dev_iommu_fwspec_set(dev, fwspec);
+
+   if (dev_is_pci(dev))
+   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
+

Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
Will send this when 5.8-rc1 is open.

Wait, this whole fixup approach seems wrong to me.  No matter how you
do the fixup, it's still a fixup, which means it requires ongoing
maintenance.  Surely we don't want to have to add the Vendor/Device ID
for every new AMBA device that comes along, do we?


Here the fake pci device has standard PCI cfg space, but physical 
implementation is base on AMBA

They can provide pasid feature.
However,
1, does not support tlp since they are not real pci devices.
2. does not support pri, instead support stall (provided by smmu)
And stall is not a pci feature, so it is not described in struct 
pci_dev, but in struct iommu_fwspec.
So we use this fixup to tell pci system that the devices can support 
stall, and hereby support pasid.


Thanks


Re: [PATCH] crypto: hisilicon - fix strncpy warning with strlcpy

2020-06-05 Thread Zhangfei Gao




On 2020/6/5 下午11:49, Eric Biggers wrote:

On Fri, Jun 05, 2020 at 11:26:20PM +0800, Zhangfei Gao wrote:


On 2020/6/5 下午8:17, Herbert Xu wrote:

On Fri, Jun 05, 2020 at 05:34:32PM +0800, Zhangfei Gao wrote:

Will add a check after the copy.

      strlcpy(interface.name, pdev->driver->name, sizeof(interface.name));
      if (strlen(pdev->driver->name) != strlen(interface.name))
      return -EINVAL;

You don't need to do strlen.  The function strlcpy returns the
length of the source string.

Better yet use strscpy which will even return an error for you.



Yes, good idea, we can use strscpy.

+   int ret;

-   strncpy(interface.name, pdev->driver->name, sizeof(interface.name));
+   ret = strscpy(interface.name, pdev->driver->name,
sizeof(interface.name));
+   if (ret < 0)
+   return ret;

You might want to use -ENAMETOOLONG instead of the strscpy return value of
-E2BIG.

Yes, make sense, thanks Eric



Re: [PATCH] crypto: hisilicon - fix strncpy warning with strlcpy

2020-06-05 Thread Zhangfei Gao




On 2020/6/5 下午8:17, Herbert Xu wrote:

On Fri, Jun 05, 2020 at 05:34:32PM +0800, Zhangfei Gao wrote:

Will add a check after the copy.

     strlcpy(interface.name, pdev->driver->name, sizeof(interface.name));
     if (strlen(pdev->driver->name) != strlen(interface.name))
     return -EINVAL;

You don't need to do strlen.  The function strlcpy returns the
length of the source string.

Better yet use strscpy which will even return an error for you.



Yes, good idea, we can use strscpy.

+   int ret;

-   strncpy(interface.name, pdev->driver->name, sizeof(interface.name));
+   ret = strscpy(interface.name, pdev->driver->name, 
sizeof(interface.name));

+   if (ret < 0)
+   return ret;

Will resend later, thanks Herbert.




Re: [PATCH] crypto: hisilicon - fix strncpy warning with strlcpy

2020-06-05 Thread Zhangfei Gao




On 2020/6/4 下午2:50, Herbert Xu wrote:

On Thu, Jun 04, 2020 at 02:44:16PM +0800, Zhangfei Gao wrote:

I think it is fine.
1. Currently the name size is 64, bigger enough.
Simply grep in driver name, 64 should be enough.
We can make it larger when there is a request.
2. it does not matter what the name is, since it is just an interface.
cat /sys/class/uacce/hisi_zip-0/flags
cat /sys/class/uacce/his-0/flags
should be both fine to app only they can be distinguished.
3. It maybe a hard restriction to fail just because of a long name.

I think we should err on the side of caution.  IOW, unless you
know that you need it to succeed when it exceeds the limit, then
you should just make it fail.

Thanks Herbert
Will add a check after the copy.

    strlcpy(interface.name, pdev->driver->name, 
sizeof(interface.name));

    if (strlen(pdev->driver->name) != strlen(interface.name))
    return -EINVAL;

Will resend the fix after rc1 is open.

Thanks



Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-04 Thread Zhangfei Gao




On 2020/6/2 上午1:41, Bjorn Helgaas wrote:

On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:

On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:

Is this slowdown significant?  We already iterate over every device
when applying PCI_FIXUP_FINAL quirks, so if we used the existing
PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
adding two more iterations to the loop in pci_do_fixups() that tries
to match quirks against the current device.  I doubt that would be a
measurable slowdown.

I don't know how significant it is, but I remember people complaining
about adding new PCI quirks because it takes too long for them to run
them all. That was in the discussion about the quirk disabling ATS on
AMD Stoney systems.

So it probably depends on how many PCI devices are in the system whether
it causes any measureable slowdown.

I found this [1] from Paul Menzel, which was a slowdown caused by
quirk_usb_early_handoff().  I think the real problem is individual
quirks that take a long time.

The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
course, they're only run for matching devices anyway.  So I'd rather
keep them as PCI_FIXUP_FINAL than add a whole new phase.


Thanks Bjorn for taking time for this.
If so, it would be much simpler.

+++ b/drivers/iommu/iommu.c
@@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct 
fwnode_handle *iommu_fwnode,

    fwspec->iommu_fwnode = iommu_fwnode;
    fwspec->ops = ops;
    dev_iommu_fwspec_set(dev, fwspec);
+
+   if (dev_is_pci(dev))
+   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
+

Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
Will send this when 5.8-rc1 is open.

Thanks


Re: [PATCH] crypto: hisilicon - fix strncpy warning with strlcpy

2020-06-04 Thread Zhangfei Gao




On 2020/6/4 下午2:18, Herbert Xu wrote:

On Thu, Jun 04, 2020 at 02:10:37PM +0800, Zhangfei Gao wrote:

Should this even allow truncation? Perhaps it'd be better to fail
in case of an overrun?

I think we do not need consider overrun, since it at most copy size-1 bytes
to dest.
 From the manual: strlcpy()
    This  function  is  similar  to  strncpy(), but it copies at most
size-1 bytes to dest, always adds a terminating null
    byte,
And simple tested with smaller SIZE of interface.name,  only SIZE-1 is
copied, so it is safe.
-#define UACCE_MAX_NAME_SIZE    64
+#define UACCE_MAX_NAME_SIZE    4

That's not what I meant.  As it is if you do exceed the limit the
name is silently truncated.  Wouldn't it be better to fail the
allocation instead?

I think it is fine.
1. Currently the name size is 64, bigger enough.
Simply grep in driver name, 64 should be enough.
We can make it larger when there is a request.
2. it does not matter what the name is, since it is just an interface.
cat /sys/class/uacce/hisi_zip-0/flags
cat /sys/class/uacce/his-0/flags
should be both fine to app only they can be distinguished.
3. It maybe a hard restriction to fail just because of a long name.

What do you think.

Thanks


Re: [PATCH] crypto: hisilicon - fix strncpy warning with strlcpy

2020-06-04 Thread Zhangfei Gao




On 2020/6/4 上午11:39, Herbert Xu wrote:

On Thu, Jun 04, 2020 at 11:32:04AM +0800, Zhangfei Gao wrote:

Use strlcpy to fix the warning
warning: 'strncpy' specified bound 64 equals destination size
  [-Wstringop-truncation]

Reported-by: kernel test robot 
Signed-off-by: Zhangfei Gao 
---
  drivers/crypto/hisilicon/qm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index f795fb5..224f3e2 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -1574,7 +1574,7 @@ static int qm_alloc_uacce(struct hisi_qm *qm)
.ops = _qm_ops,
};
  
-	strncpy(interface.name, pdev->driver->name, sizeof(interface.name));

+   strlcpy(interface.name, pdev->driver->name, sizeof(interface.name));

Should this even allow truncation? Perhaps it'd be better to fail
in case of an overrun?
I think we do not need consider overrun, since it at most copy size-1 
bytes to dest.

From the manual: strlcpy()
   This  function  is  similar  to  strncpy(), but it copies at 
most size-1 bytes to dest, always adds a terminating null

   byte,
And simple tested with smaller SIZE of interface.name,  only SIZE-1 is 
copied, so it is safe.

-#define UACCE_MAX_NAME_SIZE    64
+#define UACCE_MAX_NAME_SIZE    4

Thanks


[PATCH] crypto: hisilicon - fix strncpy warning with strlcpy

2020-06-03 Thread Zhangfei Gao
Use strlcpy to fix the warning
warning: 'strncpy' specified bound 64 equals destination size
 [-Wstringop-truncation]

Reported-by: kernel test robot 
Signed-off-by: Zhangfei Gao 
---
 drivers/crypto/hisilicon/qm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index f795fb5..224f3e2 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -1574,7 +1574,7 @@ static int qm_alloc_uacce(struct hisi_qm *qm)
.ops = _qm_ops,
};
 
-   strncpy(interface.name, pdev->driver->name, sizeof(interface.name));
+   strlcpy(interface.name, pdev->driver->name, sizeof(interface.name));
 
uacce = uacce_alloc(>dev, );
if (IS_ERR(uacce))
-- 
2.7.4



Re: [PATCH 2/2] iommu: calling pci_fixup_iommu in iommu_fwspec_init

2020-05-28 Thread Zhangfei Gao




On 2020/5/27 下午5:01, Greg Kroah-Hartman wrote:

On Tue, May 26, 2020 at 07:49:09PM +0800, Zhangfei Gao wrote:

Calling pci_fixup_iommu in iommu_fwspec_init, which alloc
iommu_fwnode. Some platform devices appear as PCI but are
actually on the AMBA bus, and they need fixup in
drivers/pci/quirks.c handling iommu_fwnode.
So calling pci_fixup_iommu after iommu_fwnode is allocated.

Signed-off-by: Zhangfei Gao 
---
  drivers/iommu/iommu.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7b37542..fb84c42 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct 
fwnode_handle *iommu_fwnode,
fwspec->iommu_fwnode = iommu_fwnode;
fwspec->ops = ops;
dev_iommu_fwspec_set(dev, fwspec);
+
+   if (dev_is_pci(dev))
+   pci_fixup_device(pci_fixup_iommu, to_pci_dev(dev));

Why can't the caller do this as it "knows" it is a PCI device at that
point in time, right?

Putting fixup here is because
1. iommu_fwspec has been allocated
2. iommu_fwspec_init will be called by of_pci_iommu_init and 
iort_pci_iommu_init, covering both acpi and dt


Thanks


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-05-28 Thread Zhangfei Gao

Hi, Bjorn

On 2020/5/28 上午2:18, Bjorn Helgaas wrote:

On Tue, May 26, 2020 at 07:49:07PM +0800, Zhangfei Gao wrote:

Some platform devices appear as PCI but are actually on the AMBA bus,
and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
down iommu probing as all devices in fixup final list will be
reprocessed, suggested by Joerg, [1]

Is this slowdown significant?  We already iterate over every device
when applying PCI_FIXUP_FINAL quirks, so if we used the existing
PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
adding two more iterations to the loop in pci_do_fixups() that tries
to match quirks against the current device.  I doubt that would be a
measurable slowdown.
I do not notice the difference when compared fixup_iommu and fixup_final 
via get_jiffies_64,

since in our platform no other pci fixup is registered.

Here the plan is adding pci_fixup_device in iommu_fwspec_init,
so if using fixup_final the iteration will be done again here.




For example:
Hisilicon platform device need fixup in
drivers/pci/quirks.c handling fwspec->can_stall, which is introduced in [2]

+static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
+{
+struct iommu_fwspec *fwspec;
+
+pdev->eetlp_prefix_path = 1;
+fwspec = dev_iommu_fwspec_get(>dev);
+if (fwspec)
+fwspec->can_stall = 1;
+}
+
+DECLARE_PCI_FIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_iFIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);

[1] https://www.spinics.net/lists/iommu/msg44591.html
[2] https://www.spinics.net/lists/linux-pci/msg94559.html

If you reference these in the commit logs, please use lore.kernel.org
links instead of spinics.

Got it, thanks Bjorn.





Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-05-27 Thread Zhangfei Gao




On 2020/5/27 下午5:53, Arnd Bergmann wrote:

On Wed, May 27, 2020 at 11:00 AM Greg Kroah-Hartman
 wrote:

On Tue, May 26, 2020 at 07:49:07PM +0800, Zhangfei Gao wrote:

Some platform devices appear as PCI but are actually on the AMBA bus,

Why would these devices not just show up on the AMBA bus and use all of
that logic instead of being a PCI device and having to go through odd
fixes like this?

There is a general move to having hardware be discoverable even with
ARM processors. Having on-chip devices be discoverable using PCI config
space is how x86 SoCs usually do it, and that is generally a good thing
as it means we don't need to describe them in DT

I guess as the hardware designers are still learning about it, this is not
always done correctly. In general, we can also describe PCI devices on
DT and do fixups during the probing there, but I suspect that won't work
as easily using ACPI probing, so the fixup is keyed off the hardware ID,
again as is common for x86 on-chip devices.

  

Yes, thanks Arnd :)

In order to use pasid, io page fault has to be supported,
either by PCI PRI feature (from pci device) or stall mode from smmu 
(platform device).
Here is letting system know the platform device can support smmu stall 
mode, as a result support pasid.

While stall is not a pci capability, so we use a fixup here.

Thanks



Re: [PATCH 1/2] PCI: Introduce PCI_FIXUP_IOMMU

2020-05-26 Thread Zhangfei Gao

Hi, Christoph

On 2020/5/26 下午10:46, Christoph Hellwig wrote:

On Tue, May 26, 2020 at 07:49:08PM +0800, Zhangfei Gao wrote:

Some platform devices appear as PCI but are actually on the AMBA bus,
and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
down iommu probing as all devices in fixup final list will be
reprocessed.

Who is going to use this?  I don't see a single user in the series.

We will add iommu fixup in drivers/pci/quirks.c, handling

fwspec->can_stall, which is introduced in

https://www.spinics.net/lists/linux-pci/msg94559.html

Unfortunately, the patch does not catch v5.8, so we have to wait.
And we want to check whether this is a right method to solve this issue.

Thanks



Re: [PATCH 0/2] Let pci_fixup_final access iommu_fwnode

2020-05-26 Thread Zhangfei Gao




On 2020/5/25 下午9:43, Joerg Roedel wrote:

On Tue, May 12, 2020 at 12:08:29PM +0800, Zhangfei Gao wrote:

Some platform devices appear as PCI but are
actually on the AMBA bus, and they need fixup in
drivers/pci/quirks.c handling iommu_fwnode.
So calling pci_fixup_final after iommu_fwnode is allocated.

For example:
Hisilicon platform device need fixup in
drivers/pci/quirks.c

+static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
+{
+   struct iommu_fwspec *fwspec;
+
+   pdev->eetlp_prefix_path = 1;
+   fwspec = dev_iommu_fwspec_get(>dev);
+   if (fwspec)
+   fwspec->can_stall = 1;
+}
+
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);

I don't think it is a great idea to hook this into PCI_FIXUP_FINAL. The
fixup list needs to be processed for every device, which will slow down
probing.

So either we introduce something like PCI_FIXUP_IOMMU, if this is
entirely PCI specific. If it needs to be generic we need some fixup
infrastructure in the IOMMU code itself.


Thanks Joerg for the good suggestion.
I am trying to introduce PCI_FIXUP_IOMMU in
https://lkml.org/lkml/2020/5/26/366

Thanks


[PATCH 2/2] iommu: calling pci_fixup_iommu in iommu_fwspec_init

2020-05-26 Thread Zhangfei Gao
Calling pci_fixup_iommu in iommu_fwspec_init, which alloc
iommu_fwnode. Some platform devices appear as PCI but are
actually on the AMBA bus, and they need fixup in
drivers/pci/quirks.c handling iommu_fwnode.
So calling pci_fixup_iommu after iommu_fwnode is allocated.

Signed-off-by: Zhangfei Gao 
---
 drivers/iommu/iommu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7b37542..fb84c42 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct 
fwnode_handle *iommu_fwnode,
fwspec->iommu_fwnode = iommu_fwnode;
fwspec->ops = ops;
dev_iommu_fwspec_set(dev, fwspec);
+
+   if (dev_is_pci(dev))
+   pci_fixup_device(pci_fixup_iommu, to_pci_dev(dev));
+
return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_fwspec_init);
-- 
2.7.4



[PATCH 1/2] PCI: Introduce PCI_FIXUP_IOMMU

2020-05-26 Thread Zhangfei Gao
Some platform devices appear as PCI but are actually on the AMBA bus,
and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
down iommu probing as all devices in fixup final list will be
reprocessed.

Suggested-by: Joerg Roedel 
Signed-off-by: Zhangfei Gao 
---
 drivers/pci/quirks.c  | 7 +++
 include/asm-generic/vmlinux.lds.h | 3 +++
 include/linux/pci.h   | 8 
 3 files changed, 18 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index ca9ed57..b037034 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -83,6 +83,8 @@ extern struct pci_fixup __start_pci_fixups_header[];
 extern struct pci_fixup __end_pci_fixups_header[];
 extern struct pci_fixup __start_pci_fixups_final[];
 extern struct pci_fixup __end_pci_fixups_final[];
+extern struct pci_fixup __start_pci_fixups_iommu[];
+extern struct pci_fixup __end_pci_fixups_iommu[];
 extern struct pci_fixup __start_pci_fixups_enable[];
 extern struct pci_fixup __end_pci_fixups_enable[];
 extern struct pci_fixup __start_pci_fixups_resume[];
@@ -118,6 +120,11 @@ void pci_fixup_device(enum pci_fixup_pass pass, struct 
pci_dev *dev)
end = __end_pci_fixups_final;
break;
 
+   case pci_fixup_iommu:
+   start = __start_pci_fixups_iommu;
+   end = __end_pci_fixups_iommu;
+   break;
+
case pci_fixup_enable:
start = __start_pci_fixups_enable;
end = __end_pci_fixups_enable;
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 71e387a..3da32d8 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -411,6 +411,9 @@
__start_pci_fixups_final = .;   \
KEEP(*(.pci_fixup_final))   \
__end_pci_fixups_final = .; \
+   __start_pci_fixups_iommu = .;   \
+   KEEP(*(.pci_fixup_iommu))   \
+   __end_pci_fixups_iommu = .; \
__start_pci_fixups_enable = .;  \
KEEP(*(.pci_fixup_enable))  \
__end_pci_fixups_enable = .;\
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 83ce1cd..0d5fbf8 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1892,6 +1892,7 @@ enum pci_fixup_pass {
pci_fixup_early,/* Before probing BARs */
pci_fixup_header,   /* After reading configuration header */
pci_fixup_final,/* Final phase of device fixups */
+   pci_fixup_iommu,/* After iommu_fwspec_init */
pci_fixup_enable,   /* pci_enable_device() time */
pci_fixup_resume,   /* pci_device_resume() */
pci_fixup_suspend,  /* pci_device_suspend() */
@@ -1934,6 +1935,10 @@ enum pci_fixup_pass {
 class_shift, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_final, \
hook, vendor, device, class, class_shift, hook)
+#define DECLARE_PCI_FIXUP_CLASS_IOMMU(vendor, device, class,   \
+class_shift, hook) \
+   DECLARE_PCI_FIXUP_SECTION(.pci_fixup_iommu, \
+   hook, vendor, device, class, class_shift, hook)
 #define DECLARE_PCI_FIXUP_CLASS_ENABLE(vendor, device, class,  \
 class_shift, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_enable,\
@@ -1964,6 +1969,9 @@ enum pci_fixup_pass {
 #define DECLARE_PCI_FIXUP_FINAL(vendor, device, hook)  \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_final, \
hook, vendor, device, PCI_ANY_ID, 0, hook)
+#define DECLARE_PCI_FIXUP_IOMMU(vendor, device, hook)  \
+   DECLARE_PCI_FIXUP_SECTION(.pci_fixup_iommu, \
+   hook, vendor, device, PCI_ANY_ID, 0, hook)
 #define DECLARE_PCI_FIXUP_ENABLE(vendor, device, hook) \
DECLARE_PCI_FIXUP_SECTION(.pci_fixup_enable,\
hook, vendor, device, PCI_ANY_ID, 0, hook)
-- 
2.7.4



[PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-05-26 Thread Zhangfei Gao
Some platform devices appear as PCI but are actually on the AMBA bus,
and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
down iommu probing as all devices in fixup final list will be
reprocessed, suggested by Joerg, [1]

For example:
Hisilicon platform device need fixup in
drivers/pci/quirks.c handling fwspec->can_stall, which is introduced in [2]

+static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
+{
+struct iommu_fwspec *fwspec;
+
+pdev->eetlp_prefix_path = 1;
+fwspec = dev_iommu_fwspec_get(>dev);
+if (fwspec)
+fwspec->can_stall = 1;
+}
+
+DECLARE_PCI_FIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_iFIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva); 

[1] https://www.spinics.net/lists/iommu/msg44591.html
[2] https://www.spinics.net/lists/linux-pci/msg94559.html

Zhangfei Gao (2):
  PCI: Introduce PCI_FIXUP_IOMMU
  iommu: calling pci_fixup_iommu in iommu_fwspec_init

 drivers/iommu/iommu.c | 4 
 drivers/pci/quirks.c  | 7 +++
 include/asm-generic/vmlinux.lds.h | 3 +++
 include/linux/pci.h   | 8 
 4 files changed, 22 insertions(+)

-- 
2.7.4



Re: [PATCH 0/2] Let pci_fixup_final access iommu_fwnode

2020-05-21 Thread Zhangfei Gao

Hi, Joerg

On 2020/5/12 下午12:08, Zhangfei Gao wrote:

Some platform devices appear as PCI but are
actually on the AMBA bus, and they need fixup in
drivers/pci/quirks.c handling iommu_fwnode.
So calling pci_fixup_final after iommu_fwnode is allocated.

For example:
Hisilicon platform device need fixup in
drivers/pci/quirks.c

+static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
+{
+   struct iommu_fwspec *fwspec;
+
+   pdev->eetlp_prefix_path = 1;
+   fwspec = dev_iommu_fwspec_get(>dev);
+   if (fwspec)
+   fwspec->can_stall = 1;
+}
+
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);
  


Zhangfei Gao (2):
   iommu/of: Let pci_fixup_final access iommu_fwnode
   ACPI/IORT: Let pci_fixup_final access iommu_fwnode

  drivers/acpi/arm64/iort.c | 1 +
  drivers/iommu/of_iommu.c  | 1 +
  2 files changed, 2 insertions(+)


Would you mind give any suggestion?

We need access fwspec->can_stall describing the platform device (a fake 
pcie) can support stall feature.

can_stall will be used arm_smmu_add_device [1].
And stall is not a pci feature, so no such member in struct pci_dev.

iommu_fwnode is allocated in iommu_fwspec_init, from of_pci_iommu_init 
or iort_pci_iommu_init.
The pci_fixup_device(pci_fixup_final, dev) in pci_bus_add_device is too 
early that  iommu_fwnode

is not allocated.
The pci_fixup_device(pci_fixup_enable, dev) in do_pci_enable_device is 
too late after


arm_smmu_add_device.


So the idea here is calling pci_fixup_device(pci_fixup_final) after
of_pci_iommu_init and iort_pci_iommu_init, where iommu_fwnode is allocated.



[1] https://www.spinics.net/lists/linux-pci/msg94559.html

Thanks



[PATCH 0/2] Let pci_fixup_final access iommu_fwnode

2020-05-11 Thread Zhangfei Gao
Some platform devices appear as PCI but are
actually on the AMBA bus, and they need fixup in
drivers/pci/quirks.c handling iommu_fwnode.
So calling pci_fixup_final after iommu_fwnode is allocated.

For example: 
Hisilicon platform device need fixup in 
drivers/pci/quirks.c

+static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
+{
+   struct iommu_fwspec *fwspec;
+
+   pdev->eetlp_prefix_path = 1;
+   fwspec = dev_iommu_fwspec_get(>dev);
+   if (fwspec)
+   fwspec->can_stall = 1;
+}
+
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);
 

Zhangfei Gao (2):
  iommu/of: Let pci_fixup_final access iommu_fwnode
  ACPI/IORT: Let pci_fixup_final access iommu_fwnode

 drivers/acpi/arm64/iort.c | 1 +
 drivers/iommu/of_iommu.c  | 1 +
 2 files changed, 2 insertions(+)

-- 
2.7.4



[PATCH 1/2] iommu/of: Let pci_fixup_final access iommu_fwnode

2020-05-11 Thread Zhangfei Gao
Calling pci_fixup_final after of_pci_iommu_init, which alloc
iommu_fwnode. Some platform devices appear as PCI but are
actually on the AMBA bus, and they need fixup in
drivers/pci/quirks.c handling iommu_fwnode.
So calling pci_fixup_final after iommu_fwnode is allocated.

Signed-off-by: Zhangfei Gao 
---
 drivers/iommu/of_iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 20738aac..c1b58c4 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -188,6 +188,7 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
pci_request_acs();
err = pci_for_each_dma_alias(to_pci_dev(dev),
 of_pci_iommu_init, );
+   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
} else if (dev_is_fsl_mc(dev)) {
err = of_fsl_mc_iommu_init(to_fsl_mc_device(dev), master_np);
} else {
-- 
2.7.4



[PATCH 2/2] ACPI/IORT: Let pci_fixup_final access iommu_fwnode

2020-05-11 Thread Zhangfei Gao
Calling pci_fixup_final after iommu_fwspec_init, which alloc
iommu_fwnode. Some platform devices appear as PCI but are
actually on the AMBA bus, and they need fixup in
drivers/pci/quirks.c handling iommu_fwnode.
So calling pci_fixup_final after iommu_fwnode is allocated.

Signed-off-by: Zhangfei Gao 
---
 drivers/acpi/arm64/iort.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 7d04424..02e361d 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1027,6 +1027,7 @@ const struct iommu_ops *iort_iommu_configure(struct 
device *dev)
info.node = node;
err = pci_for_each_dma_alias(to_pci_dev(dev),
 iort_pci_iommu_init, );
+   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
 
fwspec = dev_iommu_fwspec_get(dev);
if (fwspec && iort_pci_rc_supports_ats(node))
-- 
2.7.4



[PATCH v6 1/3] uacce: Add documents for uacce

2019-10-16 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) is
a kernel module targets to provide Shared Virtual Addressing (SVA)
between the accelerator and process.

This patch add document to explain how it works.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/misc-devices/uacce.rst | 297 +++
 1 file changed, 297 insertions(+)
 create mode 100644 Documentation/misc-devices/uacce.rst

diff --git a/Documentation/misc-devices/uacce.rst 
b/Documentation/misc-devices/uacce.rst
new file mode 100644
index 000..05c1e09
--- /dev/null
+++ b/Documentation/misc-devices/uacce.rst
@@ -0,0 +1,297 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Introduction of Uacce
+=
+
+Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
+provide Shared Virtual Addressing (SVA) between accelerators and processes.
+So accelerator can access any data structure of the main cpu.
+This differs from the data sharing between cpu and io device, which share
+data content rather than address.
+Because of the unified address, hardware and user space of process can
+share the same virtual address in the communication.
+Uacce takes the hardware accelerator as a heterogeneous processor, while
+IOMMU share the same CPU page tables and as a result the same translation
+from va to pa.
+
+__   __
+   |  | |  |
+   |  User application (CPU)  | |   Hardware Accelerator   |
+   |__| |__|
+
+| |
+| va  | va
+V V
+ ____
+|  |  |  |
+|   MMU|  |  IOMMU   |
+|__|  |__|
+| |
+| |
+V pa  V pa
+___
+   |   |
+   |  Memory   |
+   |___|
+
+
+
+Architecture
+
+
+Uacce is the kernel module, taking charge of iommu and address sharing.
+The user drivers and libraries are called WarpDrive.
+
+A virtual concept, queue, is used for the communication. It provides a
+FIFO-like interface. And it maintains a unified address space between the
+application and all involved hardware.
+
+ ___  

+|   |   user API | 
   |
+| WarpDrive library | >  |  user 
driver   |
+|___|
||
+ ||
+ ||
+ | queue fd   |
+ ||
+ ||
+ v|
+ ___ _|
+|   |   | |   | 
mmap memory
+| Other framework   |   |  uacce  |   | 
r/w interface
+| crypto/nic/others |   |_|   |
+|___| |
+ |   ||
+ | register  | register   |
+ |   ||
+ |   ||
+ |_   __  |
+ |   | | |  | |
+  -  |  Device Driver  | |  IOMMU   | |
+ |_| |__| |
+ ||
+ |  

[PATCH v6 0/3] Add uacce module for Accelerator

2019-10-16 Thread Zhangfei Gao
Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Because of unified address, hardware and user space of process can share
the same virtual address in the communication.

Uacce is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support. 
We have keep verifying with Jean's sva/current [2]
We also keep verifying with Eric's SMMUv3 Nested Stage patch [3]

This series and related zip & qm driver
https://github.com/Linaro/linux-kernel-warpdrive/tree/5.4-rc1-uacce-v6

The library and user application:
https://github.com/Linaro/warpdrive/tree/wdprd-v1-upstream

References:
[1] http://jpbrucker.net/sva/
[2] 
http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/current
[3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

Change History:
v6:
Change sys qfrs_size to different file, suggested by Jonathan
Fix crypto daily build issue and based on crypto code base, also 5.4-rc1.

v5: 
Add an example patch using the uacce interface, suggested by Greg
0003-crypto-hisilicon-register-zip-engine-to-uacce.patch

v4:
Based on 5.4-rc1
Considering other driver integrating uacce, 
if uacce not compiled, uacce_register return error and uacce_unregister is 
empty.
Simplify uacce flag: UACCE_DEV_SVA.
Address Greg's comments: 
Fix state machine, remove potential syslog triggered from user space etc.

v3:
Recommended by Greg, use sturct uacce_device instead of struct uacce,
and use struct *cdev in struct uacce_device, as a result, 
cdev can be released by itself when refcount decreased to 0.
So the two structures are decoupled and self-maintained by themsleves.
Also add dev.release for put_device.

v2:
Address comments from Greg and Jonathan
Modify interface uacce_register
Drop noiommu mode first

v1:
1. Rebase to 5.3-rc1
2. Build on iommu interface
3. Verifying with Jean's sva and Eric's nested mode iommu.
4. User library has developed a lot: support zlib, openssl etc.
5. Move to misc first

RFC3:
https://lkml.org/lkml/2018/11/12/1951

RFC2:
https://lwn.net/Articles/763990/


Background of why Uacce:
Von Neumann processor is not good at general data manipulation.
It is designed for control-bound rather than data-bound application.
The latter need less control path facility and more/specific ALUs.
So there are more and more heterogeneous processors, such as
encryption/decryption accelerators, TPUs, or
EDGE (Explicated Data Graph Execution) processors, introduced to gain
better performance or power efficiency for particular applications
these days.

There are generally two ways to make use of these heterogeneous processors:

The first is to make them co-processors, just like FPU.
This is good for some application but it has its own cons:
It changes the ISA set permanently.
You must save all state elements when the process is switched out.
But most data-bound processors have a huge set of state elements.
It makes the kernel scheduler more complex.

The second is Accelerator.
It is taken as a IO device from the CPU's point of view
(but it need not to be physically). The process, running on CPU,
hold a context of the accelerator and send instructions to it as if
it calls a function or thread running with FPU.
The context is bound with the processor itself.
So the state elements remain in the hardware context until
the context is released.

We believe this is the core feature of an "Accelerator" vs. Co-processor
or other heterogeneous processors.

The intention of Uacce is to provide the basic facility to backup
this scenario. Its first step is to make sure the accelerator and process
can share the same address space. So the accelerator ISA can directly
address any data structure of the main CPU.
This differs from the data sharing between CPU and IO device,
which share data content rather than address.
So it is different comparing to the other DMA libraries.

In the future, we may add more facility to support linking accelerator
library to the main application, or managing the accelerator context as
special thread.
But no matter how, this can be a solid start point for new processor
to be used as an "accelerator" as this is the essential requirement.

Kenneth Lee (2):
  uacce: Add documents for uacce
  uacce: add uacce driver

Zhangfei Gao (1):
  crypto: hisilicon - register zip engine to uacce

 Documentation/ABI/testing/sysfs-driver-uacce |  65 ++
 Documentation/misc-devices/uacce.rst | 297 
 drivers/crypto/hisilicon/qm.c| 254 ++-
 drivers/crypto/hisilicon/qm.h|  13 +-
 drivers/crypto/hisilicon/zip/zip_main.c  |  39 +-
 drivers/misc/Kconfig |   1 +
 drivers/misc/Makefile 

[PATCH v5 3/3] crypto: hisilicon - register zip engine to uacce

2019-10-14 Thread Zhangfei Gao
qm using uacce as an example, will resubmit after uacce is merged.

Signed-off-by: Zhangfei Gao 
Signed-off-by: Zhou Wang 
---
 drivers/crypto/hisilicon/qm.c   | 254 ++--
 drivers/crypto/hisilicon/qm.h   |  13 +-
 drivers/crypto/hisilicon/zip/zip_main.c |  39 ++---
 include/uapi/misc/uacce/qm.h|  15 ++
 4 files changed, 285 insertions(+), 36 deletions(-)
 create mode 100644 include/uapi/misc/uacce/qm.h

diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index f975c39..60067d8 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -9,6 +9,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include "qm.h"
 
 /* eq/aeq irq enable */
@@ -459,17 +462,22 @@ static void qm_cq_head_update(struct hisi_qp *qp)
 
 static void qm_poll_qp(struct hisi_qp *qp, struct hisi_qm *qm)
 {
-   struct qm_cqe *cqe = qp->cqe + qp->qp_status.cq_head;
-
-   if (qp->req_cb) {
-   while (QM_CQE_PHASE(cqe) == qp->qp_status.cqc_phase) {
-   dma_rmb();
-   qp->req_cb(qp, qp->sqe + qm->sqe_size * cqe->sq_head);
-   qm_cq_head_update(qp);
-   cqe = qp->cqe + qp->qp_status.cq_head;
-   qm_db(qm, qp->qp_id, QM_DOORBELL_CMD_CQ,
- qp->qp_status.cq_head, 0);
-   atomic_dec(>qp_status.used);
+   struct qm_cqe *cqe;
+
+   if (qp->event_cb) {
+   qp->event_cb(qp);
+   } else {
+   cqe = qp->cqe + qp->qp_status.cq_head;
+
+   if (qp->req_cb) {
+   while (QM_CQE_PHASE(cqe) == qp->qp_status.cqc_phase) {
+   dma_rmb();
+   qp->req_cb(qp, qp->sqe + qm->sqe_size *
+  cqe->sq_head);
+   qm_cq_head_update(qp);
+   cqe = qp->cqe + qp->qp_status.cq_head;
+   atomic_dec(>qp_status.used);
+   }
}
 
/* set c_flag */
@@ -1391,6 +1399,221 @@ static void hisi_qm_cache_wb(struct hisi_qm *qm)
}
 }
 
+static void qm_qp_event_notifier(struct hisi_qp *qp)
+{
+   wake_up_interruptible(>uacce_q->wait);
+}
+
+static int hisi_qm_get_available_instances(struct uacce_device *uacce)
+{
+   int i, ret;
+   struct hisi_qm *qm = uacce->priv;
+
+   read_lock(>qps_lock);
+   for (i = 0, ret = 0; i < qm->qp_num; i++)
+   if (!qm->qp_array[i])
+   ret++;
+   read_unlock(>qps_lock);
+
+   return ret;
+}
+
+static int hisi_qm_uacce_get_queue(struct uacce_device *uacce,
+  unsigned long arg,
+  struct uacce_queue *q)
+{
+   struct hisi_qm *qm = uacce->priv;
+   struct hisi_qp *qp;
+   u8 alg_type = 0;
+
+   qp = hisi_qm_create_qp(qm, alg_type);
+   if (IS_ERR(qp))
+   return PTR_ERR(qp);
+
+   q->priv = qp;
+   q->uacce = uacce;
+   qp->uacce_q = q;
+   qp->event_cb = qm_qp_event_notifier;
+   qp->pasid = arg;
+
+   return 0;
+}
+
+static void hisi_qm_uacce_put_queue(struct uacce_queue *q)
+{
+   struct hisi_qp *qp = q->priv;
+
+   /*
+* As put_queue is only called in uacce_mode=1, and only one queue can
+* be used in this mode. we flush all sqc cache back in put queue.
+*/
+   hisi_qm_cache_wb(qp->qm);
+
+   /* need to stop hardware, but can not support in v1 */
+   hisi_qm_release_qp(qp);
+}
+
+/* map sq/cq/doorbell to user space */
+static int hisi_qm_uacce_mmap(struct uacce_queue *q,
+ struct vm_area_struct *vma,
+ struct uacce_qfile_region *qfr)
+{
+   struct hisi_qp *qp = q->priv;
+   struct hisi_qm *qm = qp->qm;
+   size_t sz = vma->vm_end - vma->vm_start;
+   struct pci_dev *pdev = qm->pdev;
+   struct device *dev = >dev;
+   unsigned long vm_pgoff;
+   int ret;
+
+   switch (qfr->type) {
+   case UACCE_QFRT_MMIO:
+   if (qm->ver == QM_HW_V2) {
+   if (sz > PAGE_SIZE * (QM_DOORBELL_PAGE_NR +
+   QM_DOORBELL_SQ_CQ_BASE_V2 / PAGE_SIZE))
+   return -EINVAL;
+   } else {
+   if (sz > PAGE_SIZE * QM_DOORBELL_PAGE_NR)
+   return -EINVAL;
+   }
+
+   vma->vm_flags |= VM_IO;
+
+   return remap_pfn_range(vma, vma->vm_start,
+  qm->phys_base >> PAGE_SHIFT,
+   

[PATCH v5 2/3] uacce: add uacce driver

2019-10-14 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Since unified address, hardware and user space of process can share the
same virtual address in the communication.

Uacce create a chrdev for every registration, the queue is allocated to
the process when the chrdev is opened. Then the process can access the
hardware resource by interact with the queue file. By mmap the queue
file space to user space, the process can directly put requests to the
hardware without syscall to the kernel space.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/ABI/testing/sysfs-driver-uacce |  47 ++
 drivers/misc/Kconfig |   1 +
 drivers/misc/Makefile|   1 +
 drivers/misc/uacce/Kconfig   |  13 +
 drivers/misc/uacce/Makefile  |   2 +
 drivers/misc/uacce/uacce.c   | 974 +++
 include/linux/uacce.h| 167 +
 include/uapi/misc/uacce/uacce.h  |  34 +
 8 files changed, 1239 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 include/uapi/misc/uacce/uacce.h

diff --git a/Documentation/ABI/testing/sysfs-driver-uacce 
b/Documentation/ABI/testing/sysfs-driver-uacce
new file mode 100644
index 000..b1a2c60
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-uacce
@@ -0,0 +1,47 @@
+What:   /sys/class/uacce/hisi_zip-/id
+Date:   Oct 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the device.
+
+What:   /sys/class/uacce/hisi_zip-/api
+Date:   Oct 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Api of the device, used by application to match the correct 
driver
+
+What:   /sys/class/uacce/hisi_zip-/flags
+Date:   Oct 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Attributes of the device, see UACCE_DEV_xxx flag defined in 
uacce.h
+
+What:   /sys/class/uacce/hisi_zip-/available_instances
+Date:   Oct 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Available instances left of the device
+
+What:   /sys/class/uacce/hisi_zip-/algorithms
+Date:   Oct 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Algorithms supported by this accelerator
+
+What:   /sys/class/uacce/hisi_zip-/qfrs_size
+Date:   Oct 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Page size of each queue file regions
+
+What:   /sys/class/uacce/hisi_zip-/numa_distance
+Date:   Oct 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Distance of device node to cpu node
+
+What:   /sys/class/uacce/hisi_zip-/node_id
+Date:   Oct 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the numa node
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index c55b637..929feb0 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -481,4 +481,5 @@ source "drivers/misc/cxl/Kconfig"
 source "drivers/misc/ocxl/Kconfig"
 source "drivers/misc/cardreader/Kconfig"
 source "drivers/misc/habanalabs/Kconfig"
+source "drivers/misc/uacce/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index c1860d3..9abf292 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -56,4 +56,5 @@ obj-$(CONFIG_OCXL)+= ocxl/
 obj-y  += cardreader/
 obj-$(CONFIG_PVPANIC)  += pvpanic.o
 obj-$(CONFIG_HABANA_AI)+= habanalabs/
+obj-$(CONFIG_UACCE)+= uacce/
 obj-$(CONFIG_XILINX_SDFEC) += xilinx_sdfec.o
diff --git a/drivers/misc/uacce/Kconfig b/drivers/misc/uacce/Kconfig
new file mode 100644
index 000..e854354
--- /dev/null
+++ b/drivers/misc/uacce/Kconfig
@@ -0,0 +1,13 @@
+config UACCE
+   tristate "Accelerator Framework for User Land"
+   depends on IOMMU_API
+   help
+ UACCE provides interface for the user process to access the hardware
+ without interaction with the kernel space in data path.
+
+ The user-space interface is described in
+ 

[PATCH v5 0/3] Add uacce module for Accelerator

2019-10-14 Thread Zhangfei Gao
Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Because of unified address, hardware and user space of process can share
the same virtual address in the communication.

Uacce is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support. 
We have keep verifying with Jean's sva/current [2]
We also keep verifying with Eric's SMMUv3 Nested Stage patch [3]

This series and related zip & qm driver
https://github.com/Linaro/linux-kernel-warpdrive/tree/5.4-rc1-uacce-v5

The library and user application:
https://github.com/Linaro/warpdrive/tree/wdprd-v1-upstream

References:
[1] http://jpbrucker.net/sva/
[2] 
http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/current
[3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

Change History:

v5: 
Add an example patch using the uacce interface, suggested by Greg
0003-crypto-hisilicon-register-zip-engine-to-uacce.patch

v4:
Based on 5.4-rc1
Considering other driver integrating uacce, 
if uacce not compiled, uacce_register return error and uacce_unregister is 
empty.
Simplify uacce flag: UACCE_DEV_SVA.
Address Greg's comments: 
Fix state machine, remove potential syslog triggered from user space etc.

v3:
Recommended by Greg, use sturct uacce_device instead of struct uacce,
and use struct *cdev in struct uacce_device, as a result, 
cdev can be released by itself when refcount decreased to 0.
So the two structures are decoupled and self-maintained by themsleves.
Also add dev.release for put_device.

v2:
Address comments from Greg and Jonathan
Modify interface uacce_register
Drop noiommu mode first

v1:
1. Rebase to 5.3-rc1
2. Build on iommu interface
3. Verifying with Jean's sva and Eric's nested mode iommu.
4. User library has developed a lot: support zlib, openssl etc.
5. Move to misc first

RFC3:
https://lkml.org/lkml/2018/11/12/1951

RFC2:
https://lwn.net/Articles/763990/


Background of why Uacce:
Von Neumann processor is not good at general data manipulation.
It is designed for control-bound rather than data-bound application.
The latter need less control path facility and more/specific ALUs.
So there are more and more heterogeneous processors, such as
encryption/decryption accelerators, TPUs, or
EDGE (Explicated Data Graph Execution) processors, introduced to gain
better performance or power efficiency for particular applications
these days.

There are generally two ways to make use of these heterogeneous processors:

The first is to make them co-processors, just like FPU.
This is good for some application but it has its own cons:
It changes the ISA set permanently.
You must save all state elements when the process is switched out.
But most data-bound processors have a huge set of state elements.
It makes the kernel scheduler more complex.

The second is Accelerator.
It is taken as a IO device from the CPU's point of view
(but it need not to be physically). The process, running on CPU,
hold a context of the accelerator and send instructions to it as if
it calls a function or thread running with FPU.
The context is bound with the processor itself.
So the state elements remain in the hardware context until
the context is released.

We believe this is the core feature of an "Accelerator" vs. Co-processor
or other heterogeneous processors.

The intention of Uacce is to provide the basic facility to backup
this scenario. Its first step is to make sure the accelerator and process
can share the same address space. So the accelerator ISA can directly
address any data structure of the main CPU.
This differs from the data sharing between CPU and IO device,
which share data content rather than address.
So it is different comparing to the other DMA libraries.

In the future, we may add more facility to support linking accelerator
library to the main application, or managing the accelerator context as
special thread.
But no matter how, this can be a solid start point for new processor
to be used as an "accelerator" as this is the essential requirement.

Kenneth Lee (2):
  uacce: Add documents for uacce
  uacce: add uacce driver

Zhangfei Gao (1):
  crypto: hisilicon - register zip engine to uacce

 Documentation/ABI/testing/sysfs-driver-uacce |  47 ++
 Documentation/misc-devices/uacce.rst | 297 
 drivers/crypto/hisilicon/qm.c| 259 ++-
 drivers/crypto/hisilicon/qm.h|  13 +-
 drivers/crypto/hisilicon/zip/zip_main.c  |  39 +-
 drivers/misc/Kconfig |   1 +
 drivers/misc/Makefile|   1 +
 drivers/misc/uacce/Kconfig   |  13 +
 drivers/misc/uacce/Makefile  |   2 +
 dr

[PATCH v5 1/3] uacce: Add documents for uacce

2019-10-14 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) is
a kernel module targets to provide Shared Virtual Addressing (SVA)
between the accelerator and process.

This patch add document to explain how it works.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/misc-devices/uacce.rst | 297 +++
 1 file changed, 297 insertions(+)
 create mode 100644 Documentation/misc-devices/uacce.rst

diff --git a/Documentation/misc-devices/uacce.rst 
b/Documentation/misc-devices/uacce.rst
new file mode 100644
index 000..1ddf4ff
--- /dev/null
+++ b/Documentation/misc-devices/uacce.rst
@@ -0,0 +1,297 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Introduction of Uacce
+=
+
+Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
+provide Shared Virtual Addressing (SVA) between accelerators and processes.
+So accelerator can access any data structure of the main cpu.
+This differs from the data sharing between cpu and io device, which share
+data content rather than address.
+Because of the unified address, hardware and user space of process can
+share the same virtual address in the communication.
+Uacce takes the hardware accelerator as a heterogeneous processor, while
+IOMMU share the same CPU page tables and as a result the same translation
+from va to pa.
+
+__   __
+   |  | |  |
+   |  User application (CPU)  | |   Hardware Accelerator   |
+   |__| |__|
+
+| |
+| va  | va
+V V
+ ____
+|  |  |  |
+|   MMU|  |  IOMMU   |
+|__|  |__|
+| |
+| |
+V pa  V pa
+___
+   |   |
+   |  Memory   |
+   |___|
+
+
+
+Architecture
+
+
+Uacce is the kernel module, taking charge of iommu and address sharing.
+The user drivers and libraries are called WarpDrive.
+
+A virtual concept, queue, is used for the communication. It provides a
+FIFO-like interface. And it maintains a unified address space between the
+application and all involved hardware.
+
+ ___  

+|   |   user API | 
   |
+| WarpDrive library | >  |  user 
driver   |
+|___|
||
+ ||
+ ||
+ | queue fd   |
+ ||
+ ||
+ v|
+ ___ _|
+|   |   | |   | 
mmap memory
+| Other framework   |   |  uacce  |   | 
r/w interface
+| crypto/nic/others |   |_|   |
+|___| |
+ |   ||
+ | register  | register   |
+ |   ||
+ |   ||
+ |_   __  |
+ |   | | |  | |
+  -  |  Device Driver  | |  IOMMU   | |
+ |_| |__| |
+ ||
+ |  

[RESEND PATCH v4 2/2] uacce: add uacce driver

2019-10-09 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Since unified address, hardware and user space of process can share the
same virtual address in the communication.

Uacce create a chrdev for every registration, the queue is allocated to
the process when the chrdev is opened. Then the process can access the
hardware resource by interact with the queue file. By mmap the queue
file space to user space, the process can directly put requests to the
hardware without syscall to the kernel space.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/ABI/testing/sysfs-driver-uacce |   47 ++
 drivers/misc/Kconfig |1 +
 drivers/misc/Makefile|1 +
 drivers/misc/uacce/Kconfig   |   13 +
 drivers/misc/uacce/Makefile  |2 +
 drivers/misc/uacce/uacce.c   | 1013 ++
 include/linux/uacce.h|  167 +
 include/uapi/misc/uacce.h|   36 +
 8 files changed, 1280 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 include/uapi/misc/uacce.h

diff --git a/Documentation/ABI/testing/sysfs-driver-uacce 
b/Documentation/ABI/testing/sysfs-driver-uacce
new file mode 100644
index 000..b7ff6af2
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-uacce
@@ -0,0 +1,47 @@
+What:   /sys/class/uacce/hisi_zip-/id
+Date:   Oct 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the device.
+
+What:   /sys/class/uacce/hisi_zip-/api
+Date:   Oct 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Api of the device, used by application to match the correct 
driver
+
+What:   /sys/class/uacce/hisi_zip-/flags
+Date:   Oct 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Attributes of the device, see UACCE_DEV_xxx flag defined in 
uacce.h
+
+What:   /sys/class/uacce/hisi_zip-/available_instances
+Date:   Oct 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Available instances left of the device
+
+What:   /sys/class/uacce/hisi_zip-/algorithms
+Date:   Oct 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Algorithms supported by this accelerator
+
+What:   /sys/class/uacce/hisi_zip-/qfrs_offset
+Date:   Oct 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Page offsets of each queue file regions
+
+What:   /sys/class/uacce/hisi_zip-/numa_distance
+Date:   Oct 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Distance of device node to cpu node
+
+What:   /sys/class/uacce/hisi_zip-/node_id
+Date:   Oct 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the numa node
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index c55b637..929feb0 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -481,4 +481,5 @@ source "drivers/misc/cxl/Kconfig"
 source "drivers/misc/ocxl/Kconfig"
 source "drivers/misc/cardreader/Kconfig"
 source "drivers/misc/habanalabs/Kconfig"
+source "drivers/misc/uacce/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index c1860d3..9abf292 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -56,4 +56,5 @@ obj-$(CONFIG_OCXL)+= ocxl/
 obj-y  += cardreader/
 obj-$(CONFIG_PVPANIC)  += pvpanic.o
 obj-$(CONFIG_HABANA_AI)+= habanalabs/
+obj-$(CONFIG_UACCE)+= uacce/
 obj-$(CONFIG_XILINX_SDFEC) += xilinx_sdfec.o
diff --git a/drivers/misc/uacce/Kconfig b/drivers/misc/uacce/Kconfig
new file mode 100644
index 000..e854354
--- /dev/null
+++ b/drivers/misc/uacce/Kconfig
@@ -0,0 +1,13 @@
+config UACCE
+   tristate "Accelerator Framework for User Land"
+   depends on IOMMU_API
+   help
+ UACCE provides interface for the user process to access the hardware
+ without interaction with the kernel space in data path.
+
+ The user-space interface is desc

[RESEND PATCH v4 1/2] uacce: Add documents for uacce

2019-10-09 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) is
a kernel module targets to provide Shared Virtual Addressing (SVA)
between the accelerator and process.

This patch add document to explain how it works.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/misc-devices/uacce.rst | 297 +++
 1 file changed, 297 insertions(+)
 create mode 100644 Documentation/misc-devices/uacce.rst

diff --git a/Documentation/misc-devices/uacce.rst 
b/Documentation/misc-devices/uacce.rst
new file mode 100644
index 000..b3cf0d5
--- /dev/null
+++ b/Documentation/misc-devices/uacce.rst
@@ -0,0 +1,297 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Introduction of Uacce
+=
+
+Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
+provide Shared Virtual Addressing (SVA) between accelerators and processes.
+So accelerator can access any data structure of the main cpu.
+This differs from the data sharing between cpu and io device, which share
+data content rather than address.
+Because of the unified address, hardware and user space of process can
+share the same virtual address in the communication.
+Uacce takes the hardware accelerator as a heterogeneous processor, while
+IOMMU share the same CPU page tables and as a result the same translation
+from va to pa.
+
+__   __
+   |  | |  |
+   |  User application (CPU)  | |   Hardware Accelerator   |
+   |__| |__|
+
+| |
+| va  | va
+V V
+ ____
+|  |  |  |
+|   MMU|  |  IOMMU   |
+|__|  |__|
+| |
+| |
+V pa  V pa
+___
+   |   |
+   |  Memory   |
+   |___|
+
+
+
+Architecture
+
+
+Uacce is the kernel module, taking charge of iommu and address sharing.
+The user drivers and libraries are called WarpDrive.
+
+A virtual concept, queue, is used for the communication. It provides a
+FIFO-like interface. And it maintains a unified address space between the
+application and all involved hardware.
+
+ ___  

+|   |   user API | 
   |
+| WarpDrive library | >  |  user 
driver   |
+|___|
||
+ ||
+ ||
+ | queue fd   |
+ ||
+ ||
+ v|
+ ___ _|
+|   |   | |   | 
mmap memory
+| Other framework   |   |  uacce  |   | 
r/w interface
+| crypto/nic/others |   |_|   |
+|___| |
+ |   ||
+ | register  | register   |
+ |   ||
+ |   ||
+ |_   __  |
+ |   | | |  | |
+  -  |  Device Driver  | |  IOMMU   | |
+ |_| |__| |
+ ||
+ |  

[RESEND PATCH v4 0/2] Add uacce module for Accelerator

2019-10-09 Thread Zhangfei Gao
Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Because of unified address, hardware and user space of process can share
the same virtual address in the communication.

Uacce is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support. 
We have keep verifying with Jean's sva/current [2]
We also keep verifying with Eric's SMMUv3 Nested Stage patch [3]

This series and related zip & qm driver
https://github.com/Linaro/linux-kernel-warpdrive/tree/5.4-rc1-uacce-v4

The library and user application:
https://github.com/Linaro/warpdrive/tree/wdprd-v1-upstream

References:
[1] http://jpbrucker.net/sva/
[2] 
http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/current
[3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

Change History:
v4:
Based on 5.4-rc1
Considering other driver integrating uacce, 
if uacce not compiled, uacce_register return error and uacce_unregister is 
empty.
Simplify uacce flag: UACCE_DEV_SVA.
Address Greg's comments: 
Fix state machine, remove potential syslog triggered from user space etc.

v3:
Recommended by Greg, use sturct uacce_device instead of struct uacce,
and use struct *cdev in struct uacce_device, as a result, 
cdev can be released by itself when refcount decreased to 0.
So the two structures are decoupled and self-maintained by themsleves.
Also add dev.release for put_device.

v2:
Address comments from Greg and Jonathan
Modify interface uacce_register
Drop noiommu mode first

v1:
1. Rebase to 5.3-rc1
2. Build on iommu interface
3. Verifying with Jean's sva and Eric's nested mode iommu.
4. User library has developed a lot: support zlib, openssl etc.
5. Move to misc first

RFC3:
https://lkml.org/lkml/2018/11/12/1951

RFC2:
https://lwn.net/Articles/763990/


Background of why Uacce:
Von Neumann processor is not good at general data manipulation.
It is designed for control-bound rather than data-bound application.
The latter need less control path facility and more/specific ALUs.
So there are more and more heterogeneous processors, such as
encryption/decryption accelerators, TPUs, or
EDGE (Explicated Data Graph Execution) processors, introduced to gain
better performance or power efficiency for particular applications
these days.

There are generally two ways to make use of these heterogeneous processors:

The first is to make them co-processors, just like FPU.
This is good for some application but it has its own cons:
It changes the ISA set permanently.
You must save all state elements when the process is switched out.
But most data-bound processors have a huge set of state elements.
It makes the kernel scheduler more complex.

The second is Accelerator.
It is taken as a IO device from the CPU's point of view
(but it need not to be physically). The process, running on CPU,
hold a context of the accelerator and send instructions to it as if
it calls a function or thread running with FPU.
The context is bound with the processor itself.
So the state elements remain in the hardware context until
the context is released.

We believe this is the core feature of an "Accelerator" vs. Co-processor
or other heterogeneous processors.

The intention of Uacce is to provide the basic facility to backup
this scenario. Its first step is to make sure the accelerator and process
can share the same address space. So the accelerator ISA can directly
address any data structure of the main CPU.
This differs from the data sharing between CPU and IO device,
which share data content rather than address.
So it is different comparing to the other DMA libraries.

In the future, we may add more facility to support linking accelerator
library to the main application, or managing the accelerator context as
special thread.
But no matter how, this can be a solid start point for new processor
to be used as an "accelerator" as this is the essential requirement.

Kenneth Lee (2):
  uacce: Add documents for uacce
  uacce: add uacce driver

 Documentation/ABI/testing/sysfs-driver-uacce |   47 ++
 Documentation/misc-devices/uacce.rst |  297 
 drivers/misc/Kconfig |1 +
 drivers/misc/Makefile|1 +
 drivers/misc/uacce/Kconfig   |   13 +
 drivers/misc/uacce/Makefile  |2 +
 drivers/misc/uacce/uacce.c   | 1013 ++
 include/linux/uacce.h|  167 +
 include/uapi/misc/uacce.h|   36 +
 9 files changed, 1577 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
 create mode 100644 Documentation/misc-devices/uacce.rst
 create mode 100644 

[PATCH v4 2/2] uacce: add uacce driver

2019-09-17 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Since unified address, hardware and user space of process can share the
same virtual address in the communication.

Uacce create a chrdev for every registration, the queue is allocated to
the process when the chrdev is opened. Then the process can access the
hardware resource by interact with the queue file. By mmap the queue
file space to user space, the process can directly put requests to the
hardware without syscall to the kernel space.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/ABI/testing/sysfs-driver-uacce |   47 ++
 drivers/misc/Kconfig |1 +
 drivers/misc/Makefile|1 +
 drivers/misc/uacce/Kconfig   |   13 +
 drivers/misc/uacce/Makefile  |2 +
 drivers/misc/uacce/uacce.c   | 1028 ++
 include/linux/uacce.h|  156 
 include/uapi/misc/uacce.h|   40 +
 8 files changed, 1288 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 include/uapi/misc/uacce.h

diff --git a/Documentation/ABI/testing/sysfs-driver-uacce 
b/Documentation/ABI/testing/sysfs-driver-uacce
new file mode 100644
index 000..563f55c
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-uacce
@@ -0,0 +1,47 @@
+What:   /sys/class/uacce/hisi_zip-/id
+Date:   Sep 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the device.
+
+What:   /sys/class/uacce/hisi_zip-/api
+Date:   Sep 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Api of the device, used by application to match the correct 
driver
+
+What:   /sys/class/uacce/hisi_zip-/flags
+Date:   Sep 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Attributes of the device, see UACCE_DEV_xxx flag defined in 
uacce.h
+
+What:   /sys/class/uacce/hisi_zip-/available_instances
+Date:   Sep 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Available instances left of the device
+
+What:   /sys/class/uacce/hisi_zip-/algorithms
+Date:   Sep 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Algorithms supported by this accelerator
+
+What:   /sys/class/uacce/hisi_zip-/qfrs_offset
+Date:   Sep 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Page offsets of each queue file regions
+
+What:   /sys/class/uacce/hisi_zip-/numa_distance
+Date:   Sep 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Distance of device node to cpu node
+
+What:   /sys/class/uacce/hisi_zip-/node_id
+Date:   Sep 2019
+KernelVersion:  5.4
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the numa node
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 1690035..94d363c 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -503,4 +503,5 @@ source "drivers/misc/cxl/Kconfig"
 source "drivers/misc/ocxl/Kconfig"
 source "drivers/misc/cardreader/Kconfig"
 source "drivers/misc/habanalabs/Kconfig"
+source "drivers/misc/uacce/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index abd8ae2..93a131b 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -58,4 +58,5 @@ obj-$(CONFIG_OCXL)+= ocxl/
 obj-y  += cardreader/
 obj-$(CONFIG_PVPANIC)  += pvpanic.o
 obj-$(CONFIG_HABANA_AI)+= habanalabs/
+obj-$(CONFIG_UACCE)+= uacce/
 obj-$(CONFIG_XILINX_SDFEC) += xilinx_sdfec.o
diff --git a/drivers/misc/uacce/Kconfig b/drivers/misc/uacce/Kconfig
new file mode 100644
index 000..e854354
--- /dev/null
+++ b/drivers/misc/uacce/Kconfig
@@ -0,0 +1,13 @@
+config UACCE
+   tristate "Accelerator Framework for User Land"
+   depends on IOMMU_API
+   help
+ UACCE provides interface for the user process to access the hardware
+ without interaction with the kernel space in data path.
+
+ The user-space interface is described in
+

[PATCH v4 0/2] Add uacce module for Accelerator

2019-09-17 Thread Zhangfei Gao
Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Because of unified address, hardware and user space of process can share
the same virtual address in the communication.

Uacce is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support. 
We have keep verifying with Jean's sva/current [2]
We also keep verifying with Eric's SMMUv3 Nested Stage patch [3]

This series and related zip & qm driver
https://github.com/Linaro/linux-kernel-warpdrive/tree/5.3-uacce-v4

The library and user application:
https://github.com/Linaro/warpdrive/tree/wdprd-v1-upstream

References:
[1] http://jpbrucker.net/sva/
[2] 
http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/current
[3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

Change History:
v4:
Based on 5.3
Address Greg comments: 
Fix state machine, remove potential syslog triggered from user space etc.

v3:
Recommended by Greg, use sturct uacce_device instead of struct uacce,
and use struct *cdev in struct uacce_device, as a result, 
cdev can be released by itself when refcount decreased to 0.
So the two structures are decoupled and self-maintained by themsleves.
Also add dev.release for put_device.

v2:
Address comments from Greg and Jonathan
Modify interface uacce_register
Drop noiommu mode first

v1:
1. Rebase to 5.3-rc1
2. Build on iommu interface
3. Verifying with Jean's sva and Eric's nested mode iommu.
4. User library has developed a lot: support zlib, openssl etc.
5. Move to misc first

RFC3:
https://lkml.org/lkml/2018/11/12/1951

RFC2:
https://lwn.net/Articles/763990/


Background of why Uacce:
Von Neumann processor is not good at general data manipulation.
It is designed for control-bound rather than data-bound application.
The latter need less control path facility and more/specific ALUs.
So there are more and more heterogeneous processors, such as
encryption/decryption accelerators, TPUs, or
EDGE (Explicated Data Graph Execution) processors, introduced to gain
better performance or power efficiency for particular applications
these days.

There are generally two ways to make use of these heterogeneous processors:

The first is to make them co-processors, just like FPU.
This is good for some application but it has its own cons:
It changes the ISA set permanently.
You must save all state elements when the process is switched out.
But most data-bound processors have a huge set of state elements.
It makes the kernel scheduler more complex.

The second is Accelerator.
It is taken as a IO device from the CPU's point of view
(but it need not to be physically). The process, running on CPU,
hold a context of the accelerator and send instructions to it as if
it calls a function or thread running with FPU.
The context is bound with the processor itself.
So the state elements remain in the hardware context until
the context is released.

We believe this is the core feature of an "Accelerator" vs. Co-processor
or other heterogeneous processors.

The intention of Uacce is to provide the basic facility to backup
this scenario. Its first step is to make sure the accelerator and process
can share the same address space. So the accelerator ISA can directly
address any data structure of the main CPU.
This differs from the data sharing between CPU and IO device,
which share data content rather than address.
So it is different comparing to the other DMA libraries.

In the future, we may add more facility to support linking accelerator
library to the main application, or managing the accelerator context as
special thread.
But no matter how, this can be a solid start point for new processor
to be used as an "accelerator" as this is the essential requirement.


Kenneth Lee (2):
  uacce: Add documents for uacce
  uacce: add uacce driver

 Documentation/ABI/testing/sysfs-driver-uacce |   47 ++
 Documentation/misc-devices/uacce.rst |  309 
 drivers/misc/Kconfig |1 +
 drivers/misc/Makefile|1 +
 drivers/misc/uacce/Kconfig   |   13 +
 drivers/misc/uacce/Makefile  |2 +
 drivers/misc/uacce/uacce.c   | 1038 ++
 include/linux/uacce.h|  156 
 include/uapi/misc/uacce.h|   40 +
 9 files changed, 1607 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
 create mode 100644 Documentation/misc-devices/uacce.rst
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 

[PATCH v4 1/2] uacce: Add documents for uacce

2019-09-17 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) is
a kernel module targets to provide Shared Virtual Addressing (SVA)
between the accelerator and process.

This patch add document to explain how it works.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/misc-devices/uacce.rst | 308 +++
 1 file changed, 308 insertions(+)
 create mode 100644 Documentation/misc-devices/uacce.rst

diff --git a/Documentation/misc-devices/uacce.rst 
b/Documentation/misc-devices/uacce.rst
new file mode 100644
index 000..4fd356e
--- /dev/null
+++ b/Documentation/misc-devices/uacce.rst
@@ -0,0 +1,308 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Introduction of Uacce
+=
+
+Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
+provide Shared Virtual Addressing (SVA) between accelerators and processes.
+So accelerator can access any data structure of the main cpu.
+This differs from the data sharing between cpu and io device, which share
+data content rather than address.
+Because of the unified address, hardware and user space of process can
+share the same virtual address in the communication.
+Uacce takes the hardware accelerator as a heterogeneous processor, while
+IOMMU share the same CPU page tables and as a result the same translation
+from va to pa.
+
+__   __
+   |  | |  |
+   |  User application (CPU)  | |   Hardware Accelerator   |
+   |__| |__|
+
+| |
+| va  | va
+V V
+ ____
+|  |  |  |
+|   MMU|  |  IOMMU   |
+|__|  |__|
+| |
+| |
+V pa  V pa
+___
+   |   |
+   |  Memory   |
+   |___|
+
+
+
+Architecture
+
+
+Uacce is the kernel module, taking charge of iommu and address sharing.
+The user drivers and libraries are called WarpDrive.
+
+A virtual concept, queue, is used for the communication. It provides a
+FIFO-like interface. And it maintains a unified address space between the
+application and all involved hardware.
+
+ ___  

+|   |   user API | 
   |
+| WarpDrive library | >  |  user 
driver   |
+|___|
||
+ ||
+ ||
+ | queue fd   |
+ ||
+ ||
+ v|
+ ___ _|
+|   |   | |   | 
mmap memory
+| Other framework   |   |  uacce  |   | 
r/w interface
+| crypto/nic/others |   |_|   |
+|___| |
+ |   ||
+ | register  | register   |
+ |   ||
+ |   ||
+ |_   __  |
+ |   | | |  | |
+  -  |  Device Driver  | |  IOMMU   | |
+ |_| |__| |
+ ||
+ |  

[PATCH v3 1/2] uacce: Add documents for uacce

2019-09-02 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) is
a kernel module targets to provide Shared Virtual Addressing (SVA)
between the accelerator and process.

This patch add document to explain how it works.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/misc-devices/uacce.rst | 309 +++
 1 file changed, 309 insertions(+)
 create mode 100644 Documentation/misc-devices/uacce.rst

diff --git a/Documentation/misc-devices/uacce.rst 
b/Documentation/misc-devices/uacce.rst
new file mode 100644
index 000..211f796
--- /dev/null
+++ b/Documentation/misc-devices/uacce.rst
@@ -0,0 +1,309 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Introduction of Uacce
+=
+
+Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
+provide Shared Virtual Addressing (SVA) between accelerators and processes.
+So accelerator can access any data structure of the main cpu.
+This differs from the data sharing between cpu and io device, which share
+data content rather than address.
+Because of the unified address, hardware and user space of process can
+share the same virtual address in the communication.
+Uacce takes the hardware accelerator as a heterogeneous processor, while
+IOMMU share the same CPU page tables and as a result the same translation
+from va to pa.
+
+__   __
+   |  | |  |
+   |  User application (CPU)  | |   Hardware Accelerator   |
+   |__| |__|
+
+| |
+| va  | va
+V V
+ ____
+|  |  |  |
+|   MMU|  |  IOMMU   |
+|__|  |__|
+| |
+| |
+V pa  V pa
+___
+   |   |
+   |  Memory   |
+   |___|
+
+
+
+Architecture
+
+
+Uacce is the kernel module, taking charge of iommu and address sharing.
+The user drivers and libraries are called WarpDrive.
+
+A virtual concept, queue, is used for the communication. It provides a
+FIFO-like interface. And it maintains a unified address space between the
+application and all involved hardware.
+
+ ___  

+|   |   user API | 
   |
+| WarpDrive library | >  |  user 
driver   |
+|___|
||
+ ||
+ ||
+ | queue fd   |
+ ||
+ ||
+ v|
+ ___ _|
+|   |   | |   | 
mmap memory
+| Other framework   |   |  uacce  |   | 
r/w interface
+| crypto/nic/others |   |_|   |
+|___| |
+ |   ||
+ | register  | register   |
+ |   ||
+ |   ||
+ |_   __  |
+ |   | | |  | |
+  -  |  Device Driver  | |  IOMMU   | |
+ |_| |__| |
+ ||
+ |  

[PATCH v3 2/2] uacce: add uacce driver

2019-09-02 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Since unified address, hardware and user space of process can share the
same virtual address in the communication.

Uacce create a chrdev for every registration, the queue is allocated to
the process when the chrdev is opened. Then the process can access the
hardware resource by interact with the queue file. By mmap the queue
file space to user space, the process can directly put requests to the
hardware without syscall to the kernel space.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/ABI/testing/sysfs-driver-uacce |   47 ++
 drivers/misc/Kconfig |1 +
 drivers/misc/Makefile|1 +
 drivers/misc/uacce/Kconfig   |   13 +
 drivers/misc/uacce/Makefile  |2 +
 drivers/misc/uacce/uacce.c   | 1096 ++
 include/linux/uacce.h|  172 
 include/uapi/misc/uacce.h|   39 +
 8 files changed, 1371 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 include/uapi/misc/uacce.h

diff --git a/Documentation/ABI/testing/sysfs-driver-uacce 
b/Documentation/ABI/testing/sysfs-driver-uacce
new file mode 100644
index 000..ee0a66e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-uacce
@@ -0,0 +1,47 @@
+What:   /sys/class/uacce/hisi_zip-/id
+Date:   Sep 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the device.
+
+What:   /sys/class/uacce/hisi_zip-/api
+Date:   Sep 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Api of the device, used by application to match the correct 
driver
+
+What:   /sys/class/uacce/hisi_zip-/flags
+Date:   Sep 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Attributes of the device, see UACCE_DEV_xxx flag defined in 
uacce.h
+
+What:   /sys/class/uacce/hisi_zip-/available_instances
+Date:   Sep 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Available instances left of the device
+
+What:   /sys/class/uacce/hisi_zip-/algorithms
+Date:   Sep 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Algorithms supported by this accelerator
+
+What:   /sys/class/uacce/hisi_zip-/qfrs_offset
+Date:   Sep 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Page offsets of each queue file regions
+
+What:   /sys/class/uacce/hisi_zip-/numa_distance
+Date:   Sep 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Distance of device node to cpu node
+
+What:   /sys/class/uacce/hisi_zip-/node_id
+Date:   Sep 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the numa node
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 6abfc8e..8073eb8 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -502,4 +502,5 @@ source "drivers/misc/cxl/Kconfig"
 source "drivers/misc/ocxl/Kconfig"
 source "drivers/misc/cardreader/Kconfig"
 source "drivers/misc/habanalabs/Kconfig"
+source "drivers/misc/uacce/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index abd8ae2..93a131b 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -58,4 +58,5 @@ obj-$(CONFIG_OCXL)+= ocxl/
 obj-y  += cardreader/
 obj-$(CONFIG_PVPANIC)  += pvpanic.o
 obj-$(CONFIG_HABANA_AI)+= habanalabs/
+obj-$(CONFIG_UACCE)+= uacce/
 obj-$(CONFIG_XILINX_SDFEC) += xilinx_sdfec.o
diff --git a/drivers/misc/uacce/Kconfig b/drivers/misc/uacce/Kconfig
new file mode 100644
index 000..e854354
--- /dev/null
+++ b/drivers/misc/uacce/Kconfig
@@ -0,0 +1,13 @@
+config UACCE
+   tristate "Accelerator Framework for User Land"
+   depends on IOMMU_API
+   help
+ UACCE provides interface for the user process to access the hardware
+ without interaction with the kernel space in data path.
+
+ The user-space interface is described in
+

[PATCH v3 0/2] Add uacce module for Accelerator

2019-09-02 Thread Zhangfei Gao
Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Because of unified address, hardware and user space of process can share
the same virtual address in the communication.

Uacce is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support. 
We have keep verifying with Jean's sva/current [2]
We also keep verifying with Eric's SMMUv3 Nested Stage patch [3]

This series and related zip & qm driver
https://github.com/Linaro/linux-kernel-warpdrive/tree/5.3-rc1-warpdrive-v3

The library and user application:
https://github.com/Linaro/warpdrive/tree/wdprd-v1-upstream

References:
[1] http://jpbrucker.net/sva/
[2] 
http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/current
[3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

Change History:
v3:
Recommended by Greg, use sturct uacce_device instead of struct uacce,
and use struct *cdev in struct uacce_device, as a result, 
cdev can be released by itself when refcount decreased to 0.
So the two structures are decoupled and self-maintained by themsleves.
Also add dev.release for put_device.

v2:
Address comments from Greg and Jonathan
Modify interface uacce_register
Drop noiommu mode first

v1:
1. Rebase to 5.3-rc1
2. Build on iommu interface
3. Verifying with Jean's sva and Eric's nested mode iommu.
4. User library has developed a lot: support zlib, openssl etc.
5. Move to misc first

RFC3:
https://lkml.org/lkml/2018/11/12/1951

RFC2:
https://lwn.net/Articles/763990/


Background of why Uacce:
Von Neumann processor is not good at general data manipulation.
It is designed for control-bound rather than data-bound application.
The latter need less control path facility and more/specific ALUs.
So there are more and more heterogeneous processors, such as
encryption/decryption accelerators, TPUs, or
EDGE (Explicated Data Graph Execution) processors, introduced to gain
better performance or power efficiency for particular applications
these days.

There are generally two ways to make use of these heterogeneous processors:

The first is to make them co-processors, just like FPU.
This is good for some application but it has its own cons:
It changes the ISA set permanently.
You must save all state elements when the process is switched out.
But most data-bound processors have a huge set of state elements.
It makes the kernel scheduler more complex.

The second is Accelerator.
It is taken as a IO device from the CPU's point of view
(but it need not to be physically). The process, running on CPU,
hold a context of the accelerator and send instructions to it as if
it calls a function or thread running with FPU.
The context is bound with the processor itself.
So the state elements remain in the hardware context until
the context is released.

We believe this is the core feature of an "Accelerator" vs. Co-processor
or other heterogeneous processors.

The intention of Uacce is to provide the basic facility to backup
this scenario. Its first step is to make sure the accelerator and process
can share the same address space. So the accelerator ISA can directly
address any data structure of the main CPU.
This differs from the data sharing between CPU and IO device,
which share data content rather than address.
So it is different comparing to the other DMA libraries.

In the future, we may add more facility to support linking accelerator
library to the main application, or managing the accelerator context as
special thread.
But no matter how, this can be a solid start point for new processor
to be used as an "accelerator" as this is the essential requirement.


Kenneth Lee (2):
  uacce: Add documents for uacce
  uacce: add uacce driver

 Documentation/ABI/testing/sysfs-driver-uacce |   47 ++
 Documentation/misc-devices/uacce.rst |  309 
 drivers/misc/Kconfig |1 +
 drivers/misc/Makefile|1 +
 drivers/misc/uacce/Kconfig   |   13 +
 drivers/misc/uacce/Makefile  |2 +
 drivers/misc/uacce/uacce.c   | 1094 ++
 include/linux/uacce.h|  172 
 include/uapi/misc/uacce.h|   39 +
 9 files changed, 1678 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
 create mode 100644 Documentation/misc-devices/uacce.rst
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 include/uapi/misc/uacce.h

-- 
2.7.4



[PATCH v2 1/2] uacce: Add documents for uacce

2019-08-28 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) is
a kernel module targets to provide Shared Virtual Addressing (SVA)
between the accelerator and process.

This patch add document to explain how it works.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/misc-devices/uacce.rst | 309 +++
 1 file changed, 309 insertions(+)
 create mode 100644 Documentation/misc-devices/uacce.rst

diff --git a/Documentation/misc-devices/uacce.rst 
b/Documentation/misc-devices/uacce.rst
new file mode 100644
index 000..a2cbd00
--- /dev/null
+++ b/Documentation/misc-devices/uacce.rst
@@ -0,0 +1,309 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Introduction of Uacce
+=
+
+Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
+provide Shared Virtual Addressing (SVA) between accelerators and processes.
+So accelerator can access any data structure of the main cpu.
+This differs from the data sharing between cpu and io device, which share
+data content rather than address.
+Because of the unified address, hardware and user space of process can
+share the same virtual address in the communication.
+Uacce takes the hardware accelerator as a heterogeneous processor, while
+IOMMU share the same CPU page tables and as a result the same translation
+from va to pa.
+
+__   __
+   |  | |  |
+   |  User application (CPU)  | |   Hardware Accelerator   |
+   |__| |__|
+
+| |
+| va  | va
+V V
+ ____
+|  |  |  |
+|   MMU|  |  IOMMU   |
+|__|  |__|
+| |
+| |
+V pa  V pa
+___
+   |   |
+   |  Memory   |
+   |___|
+
+
+
+Architecture
+
+
+Uacce is the kernel module, taking charge of iommu and address sharing.
+The user drivers and libraries are called WarpDrive.
+
+A virtual concept, queue, is used for the communication. It provides a
+FIFO-like interface. And it maintains a unified address space between the
+application and all involved hardware.
+
+ ___  

+|   |   user API | 
   |
+| WarpDrive library | >  |  user 
driver   |
+|___|
||
+ ||
+ ||
+ | queue fd   |
+ ||
+ ||
+ v|
+ ___ _|
+|   |   | |   | 
mmap memory
+| Other framework   |   |  uacce  |   | 
r/w interface
+| crypto/nic/others |   |_|   |
+|___| |
+ |   ||
+ | register  | register   |
+ |   ||
+ |   ||
+ |_   __  |
+ |   | | |  | |
+  -  |  Device Driver  | |  IOMMU   | |
+ |_| |__| |
+ ||
+ |  

[PATCH v2 2/2] uacce: add uacce driver

2019-08-28 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Since unified address, hardware and user space of process can share the
same virtual address in the communication.

Uacce create a chrdev for every registration, the queue is allocated to
the process when the chrdev is opened. Then the process can access the
hardware resource by interact with the queue file. By mmap the queue
file space to user space, the process can directly put requests to the
hardware without syscall to the kernel space.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/ABI/testing/sysfs-driver-uacce |   47 ++
 drivers/misc/Kconfig |1 +
 drivers/misc/Makefile|1 +
 drivers/misc/uacce/Kconfig   |   13 +
 drivers/misc/uacce/Makefile  |2 +
 drivers/misc/uacce/uacce.c   | 1086 ++
 include/linux/uacce.h|  172 
 include/uapi/misc/uacce.h|   39 +
 8 files changed, 1361 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 include/uapi/misc/uacce.h

diff --git a/Documentation/ABI/testing/sysfs-driver-uacce 
b/Documentation/ABI/testing/sysfs-driver-uacce
new file mode 100644
index 000..44e2f69
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-uacce
@@ -0,0 +1,47 @@
+What:   /sys/class/uacce/hisi_zip-/id
+Date:   Aug 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the device.
+
+What:   /sys/class/uacce/hisi_zip-/api
+Date:   Aug 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Api of the device, used by application to match the correct 
driver
+
+What:   /sys/class/uacce/hisi_zip-/flags
+Date:   Aug 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Attributes of the device, see UACCE_DEV_xxx flag defined in 
uacce.h
+
+What:   /sys/class/uacce/hisi_zip-/available_instances
+Date:   Aug 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Available instances left of the device
+
+What:   /sys/class/uacce/hisi_zip-/algorithms
+Date:   Aug 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Algorithms supported by this accelerator
+
+What:   /sys/class/uacce/hisi_zip-/qfrs_offset
+Date:   Aug 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Page offsets of each queue file regions
+
+What:   /sys/class/uacce/hisi_zip-/numa_distance
+Date:   Aug 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Distance of device node to cpu node
+
+What:   /sys/class/uacce/hisi_zip-/node_id
+Date:   Aug 2019
+KernelVersion:  5.3
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Id of the numa node
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 6abfc8e..8073eb8 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -502,4 +502,5 @@ source "drivers/misc/cxl/Kconfig"
 source "drivers/misc/ocxl/Kconfig"
 source "drivers/misc/cardreader/Kconfig"
 source "drivers/misc/habanalabs/Kconfig"
+source "drivers/misc/uacce/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index abd8ae2..93a131b 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -58,4 +58,5 @@ obj-$(CONFIG_OCXL)+= ocxl/
 obj-y  += cardreader/
 obj-$(CONFIG_PVPANIC)  += pvpanic.o
 obj-$(CONFIG_HABANA_AI)+= habanalabs/
+obj-$(CONFIG_UACCE)+= uacce/
 obj-$(CONFIG_XILINX_SDFEC) += xilinx_sdfec.o
diff --git a/drivers/misc/uacce/Kconfig b/drivers/misc/uacce/Kconfig
new file mode 100644
index 000..e854354
--- /dev/null
+++ b/drivers/misc/uacce/Kconfig
@@ -0,0 +1,13 @@
+config UACCE
+   tristate "Accelerator Framework for User Land"
+   depends on IOMMU_API
+   help
+ UACCE provides interface for the user process to access the hardware
+ without interaction with the kernel space in data path.
+
+ The user-space interface is described in
+

[PATCH v2 0/2] Add uacce module for Accelerator

2019-08-28 Thread Zhangfei Gao
Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Since unified address, hardware and user space of process can share the
same virtual address in the communication.

Uacce is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support. 
We have keep verifying with Jean's sva/current [2]
We also keep verifying with Eric's SMMUv3 Nested Stage patch [3]

This series and related zip & qm driver
https://github.com/Linaro/linux-kernel-warpdrive/tree/5.3-rc1-warpdrive-v2

The library and user application:
https://github.com/Linaro/warpdrive/tree/5.3-rc1-v2

References:
[1] http://jpbrucker.net/sva/
[2] 
http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/current
[3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

Change History:
v2:
Address comments from Greg and Jonathan
Modify interface uacce_register
Drop noiommu mode first

v1:
1. Rebase to 5.3-rc1
2. Build on iommu interface
3. Verifying with Jean's sva and Eric's nested mode iommu.
4. User library has developed a lot: support zlib, openssl etc.
5. Move to misc first

RFC3:
https://lkml.org/lkml/2018/11/12/1951

RFC2:
https://lwn.net/Articles/763990/


Background of why Uacce:
Von Neumann processor is not good at general data manipulation.
It is designed for control-bound rather than data-bound application.
The latter need less control path facility and more/specific ALUs.
So there are more and more heterogeneous processors, such as
encryption/decryption accelerators, TPUs, or
EDGE (Explicated Data Graph Execution) processors, introduced to gain
better performance or power efficiency for particular applications
these days.

There are generally two ways to make use of these heterogeneous processors:

The first is to make them co-processors, just like FPU.
This is good for some application but it has its own cons:
It changes the ISA set permanently.
You must save all state elements when the process is switched out.
But most data-bound processors have a huge set of state elements.
It makes the kernel scheduler more complex.

The second is Accelerator.
It is taken as a IO device from the CPU's point of view
(but it need not to be physically). The process, running on CPU,
hold a context of the accelerator and send instructions to it as if
it calls a function or thread running with FPU.
The context is bound with the processor itself.
So the state elements remain in the hardware context until
the context is released.

We believe this is the core feature of an "Accelerator" vs. Co-processor
or other heterogeneous processors.

The intention of Uacce is to provide the basic facility to backup
this scenario. Its first step is to make sure the accelerator and process
can share the same address space. So the accelerator ISA can directly
address any data structure of the main CPU.
This differs from the data sharing between CPU and IO device,
which share data content rather than address.
So it is different comparing to the other DMA libraries.

In the future, we may add more facility to support linking accelerator
library to the main application, or managing the accelerator context as
special thread.
But no matter how, this can be a solid start point for new processor
to be used as an "accelerator" as this is the essential requirement.

Kenneth Lee (2):
  uacce: Add documents for uacce
  uacce: add uacce driver

 Documentation/ABI/testing/sysfs-driver-uacce |   47 ++
 Documentation/misc-devices/uacce.rst |  335 
 drivers/misc/Kconfig |1 +
 drivers/misc/Makefile|1 +
 drivers/misc/uacce/Kconfig   |   13 +
 drivers/misc/uacce/Makefile  |2 +
 drivers/misc/uacce/uacce.c   | 1086 ++
 include/linux/uacce.h|  172 
 include/uapi/misc/uacce.h|   39 +
 9 files changed, 1696 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
 create mode 100644 Documentation/misc-devices/uacce.rst
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 include/uapi/misc/uacce.h

-- 
2.7.4



[PATCH 2/2] uacce: add uacce module

2019-08-14 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce is the kernel component to support WarpDrive accelerator
framework. It provides register/unregister interface for device drivers
to expose their hardware resource to the user space. The resource is
taken as "queue" in WarpDrive.

Uacce create a chrdev for every registration, the queue is allocated to
the process when the chrdev is opened. Then the process can access the
hardware resource by interact with the queue file. By mmap the queue
file space to user space, the process can directly put requests to the
hardware without syscall to the kernel space.

Uacce also manages unify addresses between the hardware and user space
of the process. So they can share the same virtual address in the
communication.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 drivers/misc/Kconfig|1 +
 drivers/misc/Makefile   |1 +
 drivers/misc/uacce/Kconfig  |   13 +
 drivers/misc/uacce/Makefile |2 +
 drivers/misc/uacce/uacce.c  | 1186 +++
 include/linux/uacce.h   |  109 
 include/uapi/misc/uacce.h   |   44 ++
 7 files changed, 1356 insertions(+)
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 include/uapi/misc/uacce.h

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 6abfc8e..8073eb8 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -502,4 +502,5 @@ source "drivers/misc/cxl/Kconfig"
 source "drivers/misc/ocxl/Kconfig"
 source "drivers/misc/cardreader/Kconfig"
 source "drivers/misc/habanalabs/Kconfig"
+source "drivers/misc/uacce/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index abd8ae2..93a131b 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -58,4 +58,5 @@ obj-$(CONFIG_OCXL)+= ocxl/
 obj-y  += cardreader/
 obj-$(CONFIG_PVPANIC)  += pvpanic.o
 obj-$(CONFIG_HABANA_AI)+= habanalabs/
+obj-$(CONFIG_UACCE)+= uacce/
 obj-$(CONFIG_XILINX_SDFEC) += xilinx_sdfec.o
diff --git a/drivers/misc/uacce/Kconfig b/drivers/misc/uacce/Kconfig
new file mode 100644
index 000..569669c
--- /dev/null
+++ b/drivers/misc/uacce/Kconfig
@@ -0,0 +1,13 @@
+config UACCE
+   tristate "Accelerator Framework for User Land"
+   depends on IOMMU_API
+   help
+ UACCE provides interface for the user process to access the hardware
+ without interaction with the kernel space in data path.
+
+ The user-space interface is described in
+ include/uapi/misc/uacce.h
+
+ See Documentation/misc-devices/warpdrive.rst for more details.
+
+ If you don't know what to do here, say N.
diff --git a/drivers/misc/uacce/Makefile b/drivers/misc/uacce/Makefile
new file mode 100644
index 000..5b4374e
--- /dev/null
+++ b/drivers/misc/uacce/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+obj-$(CONFIG_UACCE) += uacce.o
diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
new file mode 100644
index 000..43e0c9b
--- /dev/null
+++ b/drivers/misc/uacce/uacce.c
@@ -0,0 +1,1186 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct class *uacce_class;
+static DEFINE_IDR(uacce_idr);
+static dev_t uacce_devt;
+static DEFINE_MUTEX(uacce_mutex); /* mutex to protect uacce */
+
+/* lock to protect all queues management */
+static DECLARE_RWSEM(uacce_qs_lock);
+#define uacce_qs_rlock() down_read(_qs_lock)
+#define uacce_qs_runlock() up_read(_qs_lock)
+#define uacce_qs_wlock() down_write(_qs_lock)
+#define uacce_qs_wunlock() up_write(_qs_lock)
+
+static const struct file_operations uacce_fops;
+
+/* match with enum uacce_qfrt */
+static const char *const qfrt_str[] = {
+   "mmio",
+   "dko",
+   "dus",
+   "ss",
+   "invalid"
+};
+
+static const char *uacce_qfrt_str(struct uacce_qfile_region *qfr)
+{
+   enum uacce_qfrt type = qfr->type;
+
+   if (type > UACCE_QFRT_INVALID)
+   type = UACCE_QFRT_INVALID;
+
+   return qfrt_str[type];
+}
+
+/**
+ * uacce_wake_up - Wake up the process who is waiting this queue
+ * @q the accelerator queue to wake up
+ */
+void uacce_wake_up(struct uacce_queue *q)
+{
+   dev_dbg(>uacce->dev, "wake up\n");
+   wake_up_interruptible(>wait);
+}
+EXPORT_SYMBOL_GPL(uacce_wake_up);
+
+static int uacce_queue_map_qfr(struct uacce_queue *q,
+  struct uacce_qfile_region *qfr)
+{
+   struct device *dev = q->uacce->pdev;
+   struct iommu_domain *domain = iommu_

[PATCH 1/2] uacce: Add documents for WarpDrive/uacce

2019-08-14 Thread Zhangfei Gao
From: Kenneth Lee 

WarpDrive is a general accelerator framework for the user application to
access the hardware without going through the kernel in data path.

The kernel component to provide kernel facility to driver for expose the
user interface is called uacce. It a short name for
"Unified/User-space-access-intended Accelerator Framework".

This patch add document to explain how it works.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/misc-devices/warpdrive.rst | 351 +++
 1 file changed, 351 insertions(+)
 create mode 100644 Documentation/misc-devices/warpdrive.rst

diff --git a/Documentation/misc-devices/warpdrive.rst 
b/Documentation/misc-devices/warpdrive.rst
new file mode 100644
index 000..14e5939
--- /dev/null
+++ b/Documentation/misc-devices/warpdrive.rst
@@ -0,0 +1,351 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Introduction of WarpDrive
+=
+
+*WarpDrive* is a general accelerator framework for the user application to
+communicate with the hardware without going through the kernel in data path.
+
+It can be used as a quick channel for accelerators, network adaptors or
+other hardware for application in user space.
+
+It may also make some exist solution simpler.  E.g.  you can reuse most of the
+*netdev* driver in kernel and just share some ring buffer to the user space
+driver for *DPDK* or *ODP*. Or you can combine the RSA accelerator
+with the *netdev* in the user space as a https reverse proxy, etc.
+
+*WarpDrive* takes the hardware accelerator as a heterogeneous processor which
+can share particular load from the CPU:
+
+__   __
+   |  | |  |
+   |  User application (CPU)  | |   Hardware Accelerator   |
+   |__| |__|
+
+| |
+| |
+V V
+ ____
+|  |  |  |
+|   MMU|  |  IOMMU   |
+|__|  |__|
+ \   /
+  \ /
+   \   /
+__
+   |  |
+   |  Memory  |
+   |__|
+
+
+
+Architecture
+
+
+*WarpDrive* includes general user libraries, kernel management modules
+and drivers for the hardware. In kernel, the management module
+is called *uacce*, meaning "Unified/User-space-access-intended
+Accelerator Framework".
+
+A virtual concept, queue, is used for the communication. It provides a
+FIFO-like interface. And it maintains a unified address space between the
+application and all involved hardware.
+
+ ___  

+|   |   user API | 
   |
+| WarpDrive library | >  |  user 
driver   |
+|___|
||
+ ||
+ ||
+ | queue fd   |
+ ||
+ ||
+ v|
+ ___ _|
+|   |   | |   | 
mmap memory
+| Other framework   |   |  uacce  |   | 
r/w interface
+| crypto/nic/others |   |_|   |
+|___| |
+ |   ||
+ | register  | register   |
+ |   ||
+ |   ||
+ |_   __  |
+ |   | | |  | |
+  -  |  Device

[PATCH 0/2] A General Accelerator Framework, WarpDrive

2019-08-14 Thread Zhangfei Gao
*WarpDrive* is a general accelerator framework for the user application to
access the hardware without going through the kernel in data path.

WarpDrive is the name for the whole framework. The component in kernel
is called uacce, meaning "Unified/User-space-access-intended Accelerator
Framework". It makes use of the capability of IOMMU to maintain a
unified virtual address space between the hardware and the process.

WarpDrive is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support. 
We have keep verifying with Jean's sva/current [2]
We also keep verifying with Eric's SMMUv3 Nested Stage patch [3]

This series and related zip & qm driver as well as dummy driver for qemu test:
https://github.com/Linaro/linux-kernel-warpdrive/tree/5.3-rc1-warpdrive-v1
zip driver already been upstreamed.
zip supporting uacce will be the next step.

The library and user application:
https://github.com/Linaro/warpdrive/tree/wdprd-v1-current

Change History:
v4 changed from V3
1. Rebase to 5.3-rc1
2. Build on iommu interface
3. Verifying with Jean's sva and Eric's nested mode iommu.
4. User library has developed a lot: support zlib, openssl etc.
5. Move to misc first

V3 changed from V2:
https://lkml.org/lkml/2018/11/12/1951
1. Build uacce from original IOMMU interface. V2 is built on VFIO.
   But the VFIO way locking the user memory in place will not
   work properly if the process fork a child. Because the
   copy-on-write strategy will make the parent process lost its
   page. This is not acceptable to accelerator user.
2. The kernel component is renamed to uacce from sdmdev accordingly
3. Document is updated for the new design. The Static Shared
   Virtual Memory concept is introduced to replace the User
Memory Sharing concept.
4. Rebase to the lastest kernel (4.20.0-rc1)
5. As an RFC, this version is tested only with "test-to-pass"
   test case and not tested with Jean's SVA patch.

V2 changed from V1:
https://lwn.net/Articles/763990/
1. Change kernel framework name from SPIMDEV (Share Parent IOMMU
   Mdev) to SDMDEV (Share Domain Mdev).
2. Allocate Hardware Resource when a new mdev is created (While
   it is allocated when the mdev is openned)
3. Unmap pages from the shared domain when the sdmdev iommu group is
   detached. (This procedure is necessary, but missed in V1)
4. Update document accordingly.
5. Rebase to the latest kernel (4.19.0-rc1)

References:
[1] http://jpbrucker.net/sva/
[2] 
http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/current
[3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

Kenneth Lee (2):
  uacce: Add documents for WarpDrive/uacce
  uacce: add uacce module

 Documentation/misc-devices/warpdrive.rst |  351 +
 drivers/misc/Kconfig |1 +
 drivers/misc/Makefile|1 +
 drivers/misc/uacce/Kconfig   |   13 +
 drivers/misc/uacce/Makefile  |2 +
 drivers/misc/uacce/uacce.c   | 1186 ++
 include/linux/uacce.h|  109 +++
 include/uapi/misc/uacce.h|   44 ++
 8 files changed, 1707 insertions(+)
 create mode 100644 Documentation/misc-devices/warpdrive.rst
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 include/uapi/misc/uacce.h

-- 
2.7.4



Re: [PATCH 01/11] hisi_sas: add v2 hw support for ECC and AXI bus fatal error

2016-11-23 Thread Zhangfei Gao
On Wed, Nov 23, 2016 at 4:59 PM, John Garry <john.ga...@huawei.com> wrote:
> On 16/11/2016 01:47, Zhangfei Gao wrote:
>>
>> On Mon, Nov 7, 2016 at 8:48 PM, John Garry <john.ga...@huawei.com> wrote:
>>>
>>> From: Xiang Chen <chenxian...@hisilicon.com>

Reviewed-by: Zhangfei Gao <zhangfei@linaro.org>

>>>
>>> For ECC 1bit error, logic can recover it, so we only print
>>> a warning.
>>> For ECC multi-bit and AXI bus fatal error, we panic.
>>
>>
>> Is it possible to recover via resetting phy and device etc instead of
>> panic?
>>
>> Thanks
>>
>>
>
>
> Hi Zhangfei,
>
> We are actually now working on supporting controller reset for certain
> AXI/ECC errors, so that we will not need to panic.

Got it, thanks for the info.

Thanks


Re: [PATCH 01/11] hisi_sas: add v2 hw support for ECC and AXI bus fatal error

2016-11-23 Thread Zhangfei Gao
On Wed, Nov 23, 2016 at 4:59 PM, John Garry  wrote:
> On 16/11/2016 01:47, Zhangfei Gao wrote:
>>
>> On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
>>>
>>> From: Xiang Chen 

Reviewed-by: Zhangfei Gao 

>>>
>>> For ECC 1bit error, logic can recover it, so we only print
>>> a warning.
>>> For ECC multi-bit and AXI bus fatal error, we panic.
>>
>>
>> Is it possible to recover via resetting phy and device etc instead of
>> panic?
>>
>> Thanks
>>
>>
>
>
> Hi Zhangfei,
>
> We are actually now working on supporting controller reset for certain
> AXI/ECC errors, so that we will not need to panic.

Got it, thanks for the info.

Thanks


Re: [PATCH 05/11] hisi_sas: replace WARN_ON() with dev_warn() for internal abort

2016-11-23 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry <john.ga...@huawei.com> wrote:
> From: Xiang Chen <chenxian...@hisilicon.com>
>
> Replace WARN_ON() with dev_warn() print when internal abort fails.
>
> Signed-off-by: Xiang Chen <chenxian...@hisilicon.com>
> Signed-off-by: John Garry <john.ga...@huawei.com>
Reviewed-by: Zhangfei Gao <zhangfei@linaro.org>

Sorry, miss this one.


Re: [PATCH 05/11] hisi_sas: replace WARN_ON() with dev_warn() for internal abort

2016-11-23 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
> From: Xiang Chen 
>
> Replace WARN_ON() with dev_warn() print when internal abort fails.
>
> Signed-off-by: Xiang Chen 
> Signed-off-by: John Garry 
Reviewed-by: Zhangfei Gao 

Sorry, miss this one.


Re: [PATCH 06/11] hisi_sas: modify return value of hisi_sas_query_task()

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry <john.ga...@huawei.com> wrote:
> From: Xiang Chen <chenxian...@hisilicon.com>
>
> sas_scsi_find_task() only deals with return value
> TMF_RESP_FUNC_FAILED/TMF_RESP_FUNC_SUCC/TMF_RESP_FUNC_COMPLETE of
> query task. So for LLDD errors just return TMF_RESP_FUNC_FAILED.
>
> Signed-off-by: Xiang Chen <chenxian...@hisilicon.com>
> Signed-off-by: John Garry <john.ga...@huawei.com>

Reviewed-by: Zhangfei Gao <zhangfei@linaro.org>


Re: [PATCH 06/11] hisi_sas: modify return value of hisi_sas_query_task()

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
> From: Xiang Chen 
>
> sas_scsi_find_task() only deals with return value
> TMF_RESP_FUNC_FAILED/TMF_RESP_FUNC_SUCC/TMF_RESP_FUNC_COMPLETE of
> query task. So for LLDD errors just return TMF_RESP_FUNC_FAILED.
>
> Signed-off-by: Xiang Chen 
> Signed-off-by: John Garry 

Reviewed-by: Zhangfei Gao 


Re: [PATCH 09/11] hisi_sas: check SATA FIS when directly attaching SATA device

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry <john.ga...@huawei.com> wrote:
> From: Xiang Chen <chenxian...@hisilicon.com>
>
> Check ERR bit of status to decide whether there is something wrong with
> initial register-D2H FIS. If error exists, PHY reset the channel to
> restart OOB.
>
> Signed-off-by: Xiang Chen <chenxian...@hisilicon.com>
> Signed-off-by: John Garry <john.ga...@huawei.com>

Reviewed-by: Zhangfei Gao <zhangfei@linaro.org>


Re: [PATCH 09/11] hisi_sas: check SATA FIS when directly attaching SATA device

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
> From: Xiang Chen 
>
> Check ERR bit of status to decide whether there is something wrong with
> initial register-D2H FIS. If error exists, PHY reset the channel to
> restart OOB.
>
> Signed-off-by: Xiang Chen 
> Signed-off-by: John Garry 

Reviewed-by: Zhangfei Gao 


Re: [PATCH 11/11] hisi_sas: add PHY set linkrate support for v1 and v2 hw

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry <john.ga...@huawei.com> wrote:
> From: Xiang Chen <chenxian...@hisilicon.com>
>
> Add the function to set PHY min and max linkrate through
> sysfs interface.
>
> Signed-off-by: Xiang Chen <chenxian...@hisilicon.com>
> Signed-off-by: John Garry <john.ga...@huawei.com>

Reviewed-by: Zhangfei Gao <zhangfei@linaro.org>


Re: [PATCH 11/11] hisi_sas: add PHY set linkrate support for v1 and v2 hw

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
> From: Xiang Chen 
>
> Add the function to set PHY min and max linkrate through
> sysfs interface.
>
> Signed-off-by: Xiang Chen 
> Signed-off-by: John Garry 

Reviewed-by: Zhangfei Gao 


Re: [PATCH 01/11] hisi_sas: add v2 hw support for ECC and AXI bus fatal error

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
> From: Xiang Chen 
>
> For ECC 1bit error, logic can recover it, so we only print
> a warning.
> For ECC multi-bit and AXI bus fatal error, we panic.

Is it possible to recover via resetting phy and device etc instead of panic?

Thanks


Re: [PATCH 01/11] hisi_sas: add v2 hw support for ECC and AXI bus fatal error

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
> From: Xiang Chen 
>
> For ECC 1bit error, logic can recover it, so we only print
> a warning.
> For ECC multi-bit and AXI bus fatal error, we panic.

Is it possible to recover via resetting phy and device etc instead of panic?

Thanks


Re: [PATCH 03/11] hisi_sas: only process broadcast change in phy_bcast_v2_hw()

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry <john.ga...@huawei.com> wrote:
> From: Xiang Chen <chenxian...@hisilicon.com>
>
> There are many BROADCAST primitives generated by the host.
> We are only interested in BROADCAST (CHANGE) primitives currently,
> so only process this.
>
> Signed-off-by: Xiang Chen <chenxian...@hisilicon.com>
> Signed-off-by: John Garry <john.ga...@huawei.com>

Reviewed-by: Zhangfei Gao <zhangfei@linaro.org>


Re: [PATCH 03/11] hisi_sas: only process broadcast change in phy_bcast_v2_hw()

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
> From: Xiang Chen 
>
> There are many BROADCAST primitives generated by the host.
> We are only interested in BROADCAST (CHANGE) primitives currently,
> so only process this.
>
> Signed-off-by: Xiang Chen 
> Signed-off-by: John Garry 

Reviewed-by: Zhangfei Gao 


Re: [PATCH 08/11] hisi_sas: modify some values in get_ata_protocol()

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry <john.ga...@huawei.com> wrote:
> From: Xiang Chen <chenxian...@hisilicon.com>
>
> Modify and add some SATA commands according to SATA protocol.
>
> Signed-off-by: Xiang Chen <chenxian...@hisilicon.com>
> Signed-off-by: John Garry <john.ga...@huawei.com>

Reviewed-by: Zhangfei Gao <zhangfei@linaro.org>


Re: [PATCH 08/11] hisi_sas: modify some values in get_ata_protocol()

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
> From: Xiang Chen 
>
> Modify and add some SATA commands according to SATA protocol.
>
> Signed-off-by: Xiang Chen 
> Signed-off-by: John Garry 

Reviewed-by: Zhangfei Gao 


Re: [PATCH 07/11] hisi_sas: delete repeated configuration in free_device_v2_hw()

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry <john.ga...@huawei.com> wrote:
> From: Xiang Chen <chenxian...@hisilicon.com>
>
> Delete repeated configuration items for hisi_sas_device() when
> we free a device. These items are now only set in
> hisi_sas_dev_gone().
>
> Signed-off-by: Xiang Chen <chenxian...@hisilicon.com>
> Signed-off-by: John Garry <john.ga...@huawei.com>

Reviewed-by: Zhangfei Gao <zhangfei@linaro.org>


Re: [PATCH 10/11] hisi_sas: use atomic64_t for hisi_sas_device.running_req

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry <john.ga...@huawei.com> wrote:
> Sometimes the value of hisi_sas_device.running_req
> would go negative unless we have the check for
> running_req >= 0 before trying to decrement.
>
> This is because using running_req is not thread-safe.
>
> As such, the value for running_req may be actually incorrect,
> so use atomic64_t instead.
>
> Signed-off-by: John Garry <john.ga...@huawei.com>
> Reviewed-by: Xiang Chen <chenxian...@hisilicon.com>

Reviewed-by: Zhangfei Gao <zhangfei@linaro.org>


Re: [PATCH 07/11] hisi_sas: delete repeated configuration in free_device_v2_hw()

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
> From: Xiang Chen 
>
> Delete repeated configuration items for hisi_sas_device() when
> we free a device. These items are now only set in
> hisi_sas_dev_gone().
>
> Signed-off-by: Xiang Chen 
> Signed-off-by: John Garry 

Reviewed-by: Zhangfei Gao 


Re: [PATCH 10/11] hisi_sas: use atomic64_t for hisi_sas_device.running_req

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
> Sometimes the value of hisi_sas_device.running_req
> would go negative unless we have the check for
> running_req >= 0 before trying to decrement.
>
> This is because using running_req is not thread-safe.
>
> As such, the value for running_req may be actually incorrect,
> so use atomic64_t instead.
>
> Signed-off-by: John Garry 
> Reviewed-by: Xiang Chen 

Reviewed-by: Zhangfei Gao 


Re: [PATCH 04/11] hisi_sas: fix port form bug in hisi_sas_port_notify_formed()

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry <john.ga...@huawei.com> wrote:
> From: Xiang Chen <chenxian...@hisilicon.com>
>
> When we form a wideport, we should use hardware PHY port_id instead
> of sas_phy->id.
>
> Signed-off-by: Xiang Chen <chenxian...@hisilicon.com>
> Signed-off-by: John Garry <john.ga...@huawei.com>

Reviewed-by: Zhangfei Gao <zhangfei@linaro.org>


Re: [PATCH 04/11] hisi_sas: fix port form bug in hisi_sas_port_notify_formed()

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
> From: Xiang Chen 
>
> When we form a wideport, we should use hardware PHY port_id instead
> of sas_phy->id.
>
> Signed-off-by: Xiang Chen 
> Signed-off-by: John Garry 

Reviewed-by: Zhangfei Gao 


Re: [PATCH 02/11] hisi_sas: alloc queue id of slot according to device id

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry <john.ga...@huawei.com> wrote:
> From: Xiang Chen <chenxian...@hisilicon.com>
>
> Currently slots are allocated from queues in a round-robin fashion.
> This causes a problem for internal commands in device mode. For this
> mode, we should ensure that the internal abort command is the last
> command seen in the host for that device. We can only ensure this when
> we place the internal abort command after the preceding commands for
> device that in the same queue, as there is no order in which the host
> will select a queue to execute the next command.

Is there performance penalty, since only one queue is supported for a device.

>
> This queue restriction makes supporting scsi mq more tricky in
> the future, but should not be a blocker.
>
> Note: Even though v1 hw does not support internal abort, the
>   allocation method is chosen to be the same for consistency.
>
> Signed-off-by: Xiang Chen <chenxian...@hisilicon.com>
> Signed-off-by: John Garry <john.ga...@huawei.com>

Reviewed-by: Zhangfei Gao <zhangfei@linaro.org>


Re: [PATCH 02/11] hisi_sas: alloc queue id of slot according to device id

2016-11-15 Thread Zhangfei Gao
On Mon, Nov 7, 2016 at 8:48 PM, John Garry  wrote:
> From: Xiang Chen 
>
> Currently slots are allocated from queues in a round-robin fashion.
> This causes a problem for internal commands in device mode. For this
> mode, we should ensure that the internal abort command is the last
> command seen in the host for that device. We can only ensure this when
> we place the internal abort command after the preceding commands for
> device that in the same queue, as there is no order in which the host
> will select a queue to execute the next command.

Is there performance penalty, since only one queue is supported for a device.

>
> This queue restriction makes supporting scsi mq more tricky in
> the future, but should not be a blocker.
>
> Note: Even though v1 hw does not support internal abort, the
>   allocation method is chosen to be the same for consistency.
>
> Signed-off-by: Xiang Chen 
> Signed-off-by: John Garry 

Reviewed-by: Zhangfei Gao 


Re: [PATCH 0/5] hisi_sas: v2 hw SATA fixes

2016-04-13 Thread Zhangfei Gao
On Fri, Apr 8, 2016 at 5:23 PM, John Garry <john.ga...@huawei.com> wrote:
> This patchset introduces SATA support fixes for

> the HiSilicon v2 hw SAS controller.
>
> Fixes include:
> - attach issue for SATA disk attached through expander
> - intermittent issue for directly attaching multiple
> SATA disks
> - add support for directly attaching SATA disk to phy
> index 4+
> - ITCT config issue
>
> John Garry (5):
>   hisi_sas: use device linkrate in MCR for v2 hw
>   hisi_sas: fix v2 hw multiple SATA disk issue
>   hisi_sas: add v2 hw support for >4 SATA phys
>   hisi_sas: for v2 hw only set ITCT qw2 for SAS device
>   hisi_sas: update driver version to 1.4

For the series,
Reviewed-by:  Zhangfei Gao <zhangfei@linaro.org>

Thanks


Re: [PATCH 0/5] hisi_sas: v2 hw SATA fixes

2016-04-13 Thread Zhangfei Gao
On Fri, Apr 8, 2016 at 5:23 PM, John Garry  wrote:
> This patchset introduces SATA support fixes for

> the HiSilicon v2 hw SAS controller.
>
> Fixes include:
> - attach issue for SATA disk attached through expander
> - intermittent issue for directly attaching multiple
> SATA disks
> - add support for directly attaching SATA disk to phy
> index 4+
> - ITCT config issue
>
> John Garry (5):
>   hisi_sas: use device linkrate in MCR for v2 hw
>   hisi_sas: fix v2 hw multiple SATA disk issue
>   hisi_sas: add v2 hw support for >4 SATA phys
>   hisi_sas: for v2 hw only set ITCT qw2 for SAS device
>   hisi_sas: update driver version to 1.4

For the series,
Reviewed-by:  Zhangfei Gao 

Thanks


[PATCH v4 4/4] phy: add phy-hi6220-usb

2015-02-11 Thread Zhangfei Gao
Add usb phy controller for hi6220 platform

Signed-off-by: Zhangfei Gao 
---
 drivers/phy/Kconfig  |   9 ++
 drivers/phy/Makefile |   1 +
 drivers/phy/phy-hi6220-usb.c | 306 +++
 3 files changed, 316 insertions(+)
 create mode 100644 drivers/phy/phy-hi6220-usb.c

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index ccad880..40a1ef1 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -162,6 +162,15 @@ config PHY_HIX5HD2_SATA
help
  Support for SATA PHY on Hisilicon hix5hd2 Soc.
 
+config PHY_HI6220_USB
+   tristate "hi6220 USB PHY support"
+   select USB_PHY
+   select MFD_SYSCON
+   help
+ Enable this to support the HISILICON HI6220 USB PHY.
+
+ To compile this driver as a module, choose M here.
+
 config PHY_SUN4I_USB
tristate "Allwinner sunxi SoC USB PHY driver"
depends on ARCH_SUNXI && HAS_IOMEM && OF
diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile
index aa74f96..ec43c2d 100644
--- a/drivers/phy/Makefile
+++ b/drivers/phy/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_TI_PIPE3)+= 
phy-ti-pipe3.o
 obj-$(CONFIG_TWL4030_USB)  += phy-twl4030-usb.o
 obj-$(CONFIG_PHY_EXYNOS5250_SATA)  += phy-exynos5250-sata.o
 obj-$(CONFIG_PHY_HIX5HD2_SATA) += phy-hix5hd2-sata.o
+obj-$(CONFIG_PHY_HI6220_USB)   += phy-hi6220-usb.o
 obj-$(CONFIG_PHY_SUN4I_USB)+= phy-sun4i-usb.o
 obj-$(CONFIG_PHY_SAMSUNG_USB2) += phy-exynos-usb2.o
 phy-exynos-usb2-y  += phy-samsung-usb2.o
diff --git a/drivers/phy/phy-hi6220-usb.c b/drivers/phy/phy-hi6220-usb.c
new file mode 100644
index 000..0d9f5ac
--- /dev/null
+++ b/drivers/phy/phy-hi6220-usb.c
@@ -0,0 +1,306 @@
+/*
+ * Copyright (c) 2015 Linaro Ltd.
+ * Copyright (c) 2015 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SC_PERIPH_CTRL40x00c
+
+#define CTRL4_PICO_SIDDQ   BIT(6)
+#define CTRL4_PICO_OGDISABLE   BIT(8)
+#define CTRL4_PICO_VBUSVLDEXT  BIT(10)
+#define CTRL4_PICO_VBUSVLDEXTSEL   BIT(11)
+#define CTRL4_OTG_PHY_SEL  BIT(21)
+
+#define SC_PERIPH_CTRL50x010
+
+#define CTRL5_USBOTG_RES_SEL   BIT(3)
+#define CTRL5_PICOPHY_ACAENB   BIT(4)
+#define CTRL5_PICOPHY_BC_MODE  BIT(5)
+#define CTRL5_PICOPHY_CHRGSEL  BIT(6)
+#define CTRL5_PICOPHY_VDATSRCEND   BIT(7)
+#define CTRL5_PICOPHY_VDATDETENB   BIT(8)
+#define CTRL5_PICOPHY_DCDENB   BIT(9)
+#define CTRL5_PICOPHY_IDDIGBIT(10)
+
+#define SC_PERIPH_CTRL80x018
+#define SC_PERIPH_RSTEN0   0x300
+#define SC_PERIPH_RSTDIS0  0x304
+
+#define RST0_USBOTG_BUSBIT(4)
+#define RST0_POR_PICOPHY   BIT(5)
+#define RST0_USBOTGBIT(6)
+#define RST0_USBOTG_32KBIT(7)
+
+#define EYE_PATTERN_PARA   0x7053348c
+
+struct hi6220_priv {
+   struct usb_phy phy;
+   struct delayed_work work;
+   struct regmap *reg;
+   struct clk *clk;
+   struct regulator *vcc;
+   struct device *dev;
+   int gpio_vbus;
+   int gpio_id;
+   enum usb_otg_state state;
+};
+
+static void hi6220_start_peripheral(struct hi6220_priv *priv, bool on)
+{
+   struct usb_otg *otg = priv->phy.otg;
+
+   if (!otg->gadget)
+   return;
+
+   if (on)
+   usb_gadget_connect(otg->gadget);
+   else
+   usb_gadget_disconnect(otg->gadget);
+}
+
+static void hi6220_detect_work(struct work_struct *work)
+{
+   struct hi6220_priv *priv =
+   container_of(work, struct hi6220_priv, work.work);
+   int gpio_id, gpio_vbus;
+   enum usb_otg_state state;
+
+   if (!gpio_is_valid(priv->gpio_id) || !gpio_is_valid(priv->gpio_vbus))
+   return;
+
+   gpio_id = gpio_get_value_cansleep(priv->gpio_id);
+   gpio_vbus = gpio_get_value_cansleep(priv->gpio_vbus);
+
+   if (gpio_vbus == 0) {
+   if (gpio_id == 1)
+   state = OTG_STATE_B_PERIPHERAL;
+   else
+   state = OTG_STATE_A_HOST;
+   } else {
+   state = OTG_STATE_A_HOST;
+   }
+
+   if (priv->state != state) {
+   hi6220_start_peripheral(priv, state == OTG_STATE_B_PERIPHERAL);
+   priv->state = state;
+   }
+}
+
+static irqreturn_t hiusb_gpio_intr(int irq, void *data)
+{
+   struct hi6220_priv 

[PATCH v4 3/4] usb: dwc2: platform: add hi6220 support

2015-02-11 Thread Zhangfei Gao
Signed-off-by: Zhangfei Gao 
---
 drivers/usb/dwc2/platform.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c
index ae095f0..f7c67db 100644
--- a/drivers/usb/dwc2/platform.c
+++ b/drivers/usb/dwc2/platform.c
@@ -50,6 +50,35 @@
 
 static const char dwc2_driver_name[] = "dwc2";
 
+static const struct dwc2_core_params params_hi6220 = {
+   .otg_cap= 2,/* No HNP/SRP capable */
+   .otg_ver= 0,/* 1.3 */
+   .dma_enable = 1,
+   .dma_desc_enable= 0,
+   .speed  = 0,/* High Speed */
+   .enable_dynamic_fifo= 1,
+   .en_multiple_tx_fifo= 1,
+   .host_rx_fifo_size  = 512,
+   .host_nperio_tx_fifo_size   = 512,
+   .host_perio_tx_fifo_size= 512,
+   .max_transfer_size  = 65535,
+   .max_packet_count   = 511,
+   .host_channels  = 16,
+   .phy_type   = 1,/* UTMI */
+   .phy_utmi_width = 8,
+   .phy_ulpi_ddr   = 0,/* Single */
+   .phy_ulpi_ext_vbus  = 0,
+   .i2c_enable = 0,
+   .ulpi_fs_ls = 0,
+   .host_support_fs_ls_low_power   = 0,
+   .host_ls_low_power_phy_clk  = 0,/* 48 MHz */
+   .ts_dline   = 0,
+   .reload_ctl = 0,
+   .ahbcfg = GAHBCFG_HBSTLEN_INCR16 <<
+ GAHBCFG_HBSTLEN_SHIFT,
+   .uframe_sched   = 0,
+};
+
 static const struct dwc2_core_params params_bcm2835 = {
.otg_cap= 0,/* HNP/SRP capable */
.otg_ver= 0,/* 1.3 */
@@ -129,6 +158,7 @@ static int dwc2_driver_remove(struct platform_device *dev)
 
 static const struct of_device_id dwc2_of_match_table[] = {
{ .compatible = "brcm,bcm2835-usb", .data = _bcm2835 },
+   { .compatible = "hisilicon,hi6220-usb", .data = _hi6220 },
{ .compatible = "rockchip,rk3066-usb", .data = _rk3066 },
{ .compatible = "snps,dwc2", .data = NULL },
{ .compatible = "samsung,s3c6400-hsotg", .data = NULL},
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/4] Documentation: dt-bindings: add dt binding info for hi6220 dwc2

2015-02-11 Thread Zhangfei Gao
Add necessary dwc2 binding documentation for Hisilicon soc: hi6220

Signed-off-by: Zhangfei Gao 
---
 Documentation/devicetree/bindings/usb/dwc2.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/usb/dwc2.txt 
b/Documentation/devicetree/bindings/usb/dwc2.txt
index fd132cb..2213682 100644
--- a/Documentation/devicetree/bindings/usb/dwc2.txt
+++ b/Documentation/devicetree/bindings/usb/dwc2.txt
@@ -4,6 +4,7 @@ Platform DesignWare HS OTG USB 2.0 controller
 Required properties:
 - compatible : One of:
   - brcm,bcm2835-usb: The DWC2 USB controller instance in the BCM2835 SoC.
+  - hisilicon,hi6220-usb: The DWC2 USB controller instance in the hi6220 SoC.
   - rockchip,rk3066-usb: The DWC2 USB controller instance in the rk3066 Soc;
   - "rockchip,rk3188-usb", "rockchip,rk3066-usb", "snps,dwc2": for rk3188 Soc;
   - "rockchip,rk3288-usb", "rockchip,rk3066-usb", "snps,dwc2": for rk3288 Soc;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/4] add usb support for hi6220

2015-02-11 Thread Zhangfei Gao
v4:
Move drivers/usb/phy/phy-hi6220-usb.c to drivers/phy/phy-hi6220-usb.c, required 
by Balbi.
Modify dt bindings per comments from Mark and Sergei

v3:
fix typo and add -EPROBE_DEFER of regulator, pointed by Peter

v2:
address comments from Sergei and Peter
add hi6220_phy_setup(false) code

v1:
hi6220 usb controller is inherited from dwc2
add phy accordingly
support otg gadget/host

Zhangfei Gao (4):
  Documentation: dt-bindings: add dt binding info for hi6220 dwc2
  Documentation: dt-bindings: add dt binding info for hi6220
  usb: dwc2: platform: add hi6220 support
  phy: add phy-hi6220-usb

 Documentation/devicetree/bindings/usb/dwc2.txt |   1 +
 .../devicetree/bindings/usb/hi6220-usb.txt |  49 
 drivers/phy/Kconfig|   9 +
 drivers/phy/Makefile   |   1 +
 drivers/phy/phy-hi6220-usb.c   | 306 +
 drivers/usb/dwc2/platform.c|  30 ++
 6 files changed, 396 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/usb/hi6220-usb.txt
 create mode 100644 drivers/phy/phy-hi6220-usb.c

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 2/4] Documentation: dt-bindings: add dt binding info for hi6220

2015-02-11 Thread Zhangfei Gao
Signed-off-by: Zhangfei Gao 
---
 .../devicetree/bindings/usb/hi6220-usb.txt | 49 ++
 1 file changed, 49 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/usb/hi6220-usb.txt

diff --git a/Documentation/devicetree/bindings/usb/hi6220-usb.txt 
b/Documentation/devicetree/bindings/usb/hi6220-usb.txt
new file mode 100644
index 000..b3a7b5a
--- /dev/null
+++ b/Documentation/devicetree/bindings/usb/hi6220-usb.txt
@@ -0,0 +1,49 @@
+Hisilicon hi6220 SoC USB controller
+-
+
+usb controller is inherited from dwc2, refer dwc2.txt
+-
+
+Required properties:
+- compatible: "hisilicon,hi6220-usb"
+Refer to dwc2.txt for dwc2 usb properties
+
+
+PHY:
+-
+
+Required properties:
+- compatible: "hisilicon,hi6220-usb-phy"
+- vcc-supply: phandle to the regulator that provides power to the PHY.
+- clocks: phandle and clock specifier of the PHY clock.
+- hisilicon,peripheral-syscon: phandle of syscon used to control peripheral.
+- hisilicon,vbus-gpios: gpio of detecting vbus.
+- hisilicon,id-gpios: gpio of detecting id.
+
+Example:
+
+   sys_ctrl: syscon@f703 {
+   compatible = "hisilicon,sysctrl", "syscon";
+   reg = <0x0 0xf703 0x0 0x1000>;
+   };
+
+   usb_phy: usb-phy {
+   compatible = "hisilicon,hi6220-usb-phy";
+   vcc-supply = <_5v_hub>;
+   hisilicon,vbus-gpios = < 6 0>;
+   hisilicon,id-gpios = < 5 0>;
+   hisilicon,peripheral-syscon = <_ctrl>;
+   clocks = <_sys HI6220_USBOTG_HCLK>;
+   };
+
+   usb: usb@f72c {
+   compatible = "hisilicon,hi6220-usb";
+   reg = <0x0 0xf72c 0x0 0x4>;
+   phys = <_phy>;
+   dr_mode = "otg";
+   g-use-dma;
+   g-rx-fifo-size = <512>;
+   g-np-tx-fifo-size = <128>;
+   g-tx-fifo-size = <128>;
+   interrupts = <0 77 0x4>;
+   };
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 4/4] phy: add phy-hi6220-usb

2015-02-11 Thread Zhangfei Gao
Add usb phy controller for hi6220 platform

Signed-off-by: Zhangfei Gao zhangfei@linaro.org
---
 drivers/phy/Kconfig  |   9 ++
 drivers/phy/Makefile |   1 +
 drivers/phy/phy-hi6220-usb.c | 306 +++
 3 files changed, 316 insertions(+)
 create mode 100644 drivers/phy/phy-hi6220-usb.c

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index ccad880..40a1ef1 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -162,6 +162,15 @@ config PHY_HIX5HD2_SATA
help
  Support for SATA PHY on Hisilicon hix5hd2 Soc.
 
+config PHY_HI6220_USB
+   tristate hi6220 USB PHY support
+   select USB_PHY
+   select MFD_SYSCON
+   help
+ Enable this to support the HISILICON HI6220 USB PHY.
+
+ To compile this driver as a module, choose M here.
+
 config PHY_SUN4I_USB
tristate Allwinner sunxi SoC USB PHY driver
depends on ARCH_SUNXI  HAS_IOMEM  OF
diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile
index aa74f96..ec43c2d 100644
--- a/drivers/phy/Makefile
+++ b/drivers/phy/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_TI_PIPE3)+= 
phy-ti-pipe3.o
 obj-$(CONFIG_TWL4030_USB)  += phy-twl4030-usb.o
 obj-$(CONFIG_PHY_EXYNOS5250_SATA)  += phy-exynos5250-sata.o
 obj-$(CONFIG_PHY_HIX5HD2_SATA) += phy-hix5hd2-sata.o
+obj-$(CONFIG_PHY_HI6220_USB)   += phy-hi6220-usb.o
 obj-$(CONFIG_PHY_SUN4I_USB)+= phy-sun4i-usb.o
 obj-$(CONFIG_PHY_SAMSUNG_USB2) += phy-exynos-usb2.o
 phy-exynos-usb2-y  += phy-samsung-usb2.o
diff --git a/drivers/phy/phy-hi6220-usb.c b/drivers/phy/phy-hi6220-usb.c
new file mode 100644
index 000..0d9f5ac
--- /dev/null
+++ b/drivers/phy/phy-hi6220-usb.c
@@ -0,0 +1,306 @@
+/*
+ * Copyright (c) 2015 Linaro Ltd.
+ * Copyright (c) 2015 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include linux/clk.h
+#include linux/mfd/syscon.h
+#include linux/of_gpio.h
+#include linux/platform_device.h
+#include linux/regmap.h
+#include linux/regulator/consumer.h
+#include linux/usb/gadget.h
+#include linux/usb/otg.h
+
+#define SC_PERIPH_CTRL40x00c
+
+#define CTRL4_PICO_SIDDQ   BIT(6)
+#define CTRL4_PICO_OGDISABLE   BIT(8)
+#define CTRL4_PICO_VBUSVLDEXT  BIT(10)
+#define CTRL4_PICO_VBUSVLDEXTSEL   BIT(11)
+#define CTRL4_OTG_PHY_SEL  BIT(21)
+
+#define SC_PERIPH_CTRL50x010
+
+#define CTRL5_USBOTG_RES_SEL   BIT(3)
+#define CTRL5_PICOPHY_ACAENB   BIT(4)
+#define CTRL5_PICOPHY_BC_MODE  BIT(5)
+#define CTRL5_PICOPHY_CHRGSEL  BIT(6)
+#define CTRL5_PICOPHY_VDATSRCEND   BIT(7)
+#define CTRL5_PICOPHY_VDATDETENB   BIT(8)
+#define CTRL5_PICOPHY_DCDENB   BIT(9)
+#define CTRL5_PICOPHY_IDDIGBIT(10)
+
+#define SC_PERIPH_CTRL80x018
+#define SC_PERIPH_RSTEN0   0x300
+#define SC_PERIPH_RSTDIS0  0x304
+
+#define RST0_USBOTG_BUSBIT(4)
+#define RST0_POR_PICOPHY   BIT(5)
+#define RST0_USBOTGBIT(6)
+#define RST0_USBOTG_32KBIT(7)
+
+#define EYE_PATTERN_PARA   0x7053348c
+
+struct hi6220_priv {
+   struct usb_phy phy;
+   struct delayed_work work;
+   struct regmap *reg;
+   struct clk *clk;
+   struct regulator *vcc;
+   struct device *dev;
+   int gpio_vbus;
+   int gpio_id;
+   enum usb_otg_state state;
+};
+
+static void hi6220_start_peripheral(struct hi6220_priv *priv, bool on)
+{
+   struct usb_otg *otg = priv-phy.otg;
+
+   if (!otg-gadget)
+   return;
+
+   if (on)
+   usb_gadget_connect(otg-gadget);
+   else
+   usb_gadget_disconnect(otg-gadget);
+}
+
+static void hi6220_detect_work(struct work_struct *work)
+{
+   struct hi6220_priv *priv =
+   container_of(work, struct hi6220_priv, work.work);
+   int gpio_id, gpio_vbus;
+   enum usb_otg_state state;
+
+   if (!gpio_is_valid(priv-gpio_id) || !gpio_is_valid(priv-gpio_vbus))
+   return;
+
+   gpio_id = gpio_get_value_cansleep(priv-gpio_id);
+   gpio_vbus = gpio_get_value_cansleep(priv-gpio_vbus);
+
+   if (gpio_vbus == 0) {
+   if (gpio_id == 1)
+   state = OTG_STATE_B_PERIPHERAL;
+   else
+   state = OTG_STATE_A_HOST;
+   } else {
+   state = OTG_STATE_A_HOST;
+   }
+
+   if (priv-state != state) {
+   hi6220_start_peripheral(priv, state == OTG_STATE_B_PERIPHERAL);
+   priv-state = state;
+   }
+}
+
+static

[PATCH v4 3/4] usb: dwc2: platform: add hi6220 support

2015-02-11 Thread Zhangfei Gao
Signed-off-by: Zhangfei Gao zhangfei@linaro.org
---
 drivers/usb/dwc2/platform.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c
index ae095f0..f7c67db 100644
--- a/drivers/usb/dwc2/platform.c
+++ b/drivers/usb/dwc2/platform.c
@@ -50,6 +50,35 @@
 
 static const char dwc2_driver_name[] = dwc2;
 
+static const struct dwc2_core_params params_hi6220 = {
+   .otg_cap= 2,/* No HNP/SRP capable */
+   .otg_ver= 0,/* 1.3 */
+   .dma_enable = 1,
+   .dma_desc_enable= 0,
+   .speed  = 0,/* High Speed */
+   .enable_dynamic_fifo= 1,
+   .en_multiple_tx_fifo= 1,
+   .host_rx_fifo_size  = 512,
+   .host_nperio_tx_fifo_size   = 512,
+   .host_perio_tx_fifo_size= 512,
+   .max_transfer_size  = 65535,
+   .max_packet_count   = 511,
+   .host_channels  = 16,
+   .phy_type   = 1,/* UTMI */
+   .phy_utmi_width = 8,
+   .phy_ulpi_ddr   = 0,/* Single */
+   .phy_ulpi_ext_vbus  = 0,
+   .i2c_enable = 0,
+   .ulpi_fs_ls = 0,
+   .host_support_fs_ls_low_power   = 0,
+   .host_ls_low_power_phy_clk  = 0,/* 48 MHz */
+   .ts_dline   = 0,
+   .reload_ctl = 0,
+   .ahbcfg = GAHBCFG_HBSTLEN_INCR16 
+ GAHBCFG_HBSTLEN_SHIFT,
+   .uframe_sched   = 0,
+};
+
 static const struct dwc2_core_params params_bcm2835 = {
.otg_cap= 0,/* HNP/SRP capable */
.otg_ver= 0,/* 1.3 */
@@ -129,6 +158,7 @@ static int dwc2_driver_remove(struct platform_device *dev)
 
 static const struct of_device_id dwc2_of_match_table[] = {
{ .compatible = brcm,bcm2835-usb, .data = params_bcm2835 },
+   { .compatible = hisilicon,hi6220-usb, .data = params_hi6220 },
{ .compatible = rockchip,rk3066-usb, .data = params_rk3066 },
{ .compatible = snps,dwc2, .data = NULL },
{ .compatible = samsung,s3c6400-hsotg, .data = NULL},
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >