Re: [RFC 1/5] RISC-V: Implement arch_sync_dma* functions

2021-09-11 Thread Guo Ren
On Tue, Jul 27, 2021 at 5:53 AM Atish Patra  wrote:
>
> On Sun, Jul 25, 2021 at 11:57 PM Christoph Hellwig  wrote:
> >
> > > +#ifdef CONFIG_RISCV_DMA_NONCOHERENT
> > > +struct riscv_dma_cache_sync {
> > > + void (*cache_invalidate)(phys_addr_t paddr, size_t size);
> > > + void (*cache_clean)(phys_addr_t paddr, size_t size);
> > > + void (*cache_flush)(phys_addr_t paddr, size_t size);
> > > +};
> > > +
> > > +void riscv_dma_cache_sync_set(struct riscv_dma_cache_sync *ops);
> > > +#endif
> >
> > As told a bunch of times before: doing indirect calls here is a
> > performance nightmare.  Use something that actually does perform
> > horribly like alternatives.  Or even delay implementing that until
> > we need it and do a plain direct call for now.
> >
>
> I was initially planning to replace this with alternatives in the
> future versions. But there is no point in doing it
> until we have CMO spec finalized. We also don't have any other
> platform using these apart from sifive l2
> cache controllers for now.
>
> I will change these to direct for now.
I think alternatives' would be helpful, I've rebased on your patchset, thx.
https://lore.kernel.org/linux-riscv/20210911092139.79607-6-guo...@kernel.org/

>
> > static void __dma_sync(phys_addr_t paddr, size_t size, enum 
> > dma_data_direction dir)
> > > +{
> > > + if ((dir == DMA_FROM_DEVICE) && (dma_cache_sync->cache_invalidate))
> > > + dma_cache_sync->cache_invalidate(paddr, size);
> > > + else if ((dir == DMA_TO_DEVICE) && (dma_cache_sync->cache_clean))
> > > + dma_cache_sync->cache_clean(paddr, size);
> > > + else if ((dir == DMA_BIDIRECTIONAL) && dma_cache_sync->cache_flush)
> > > + dma_cache_sync->cache_flush(paddr, size);
> > > +}
> >
> > The seletion of flush types is completely broken.  Take a look at the
> > comment in arch/arc/mm/dma.c above arch_sync_dma_for_device for a good
> > explanation.
> >
>
> Thanks. I will fix it.
>
> > > +void arch_dma_prep_coherent(struct page *page, size_t size)
> > > +{
> > > + void *flush_addr = page_address(page);
> > > +
> > > + memset(flush_addr, 0, size);
> >
> > arch_dma_prep_coherent is not supposed to modify the content of
> > the data.
>
> Sorry. This was a leftover from some experimental code. It shouldn't
> have been included.
>
> > ___
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
>
>
> --
> Regards,
> Atish
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/5] Support non-coherent DMA on RISC-V using a global pool

2021-08-17 Thread Guo Ren
On Tue, Aug 17, 2021 at 11:28 AM Atish Patra  wrote:
>
> On Mon, Aug 16, 2021 at 6:37 PM Guo Ren  wrote:
> >
> > 1
> >
> > On Thu, Jul 29, 2021 at 2:19 PM Atish Patra  wrote:
> > >
> > > On Wed, Jul 28, 2021 at 9:30 PM Palmer Dabbelt  wrote:
> > > >
> > > > On Fri, 23 Jul 2021 14:40:26 PDT (-0700), Atish Patra wrote:
> > > > > RISC-V privilege specification doesn't define an IOMMU or any method 
> > > > > modify
> > > > > PMA attributes or PTE entries to allow non-coherent mappings yet. In
> > > > > the beginning, this approach was adopted assuming that most of the 
> > > > > RISC-V
> > > > > platforms would support full cache-coherent IO. Here is excerpt from 
> > > > > the
> > > > > priv spec section 3.6.5
> > > > >
> > > > > "In RISC-V platforms, the use of hardware-incoherent regions is 
> > > > > discouraged
> > > > > due to software complexity, performance, and energy impacts."
> > > > >
> > > > > While some of the RISC-V platforms adhere to the above suggestion, 
> > > > > not all
> > > > > platforms can afford to build to fully cache-coherent I/O devices. To
> > > > > address DMA for non-coherent I/O devices, we need to mark a region of 
> > > > > memory
> > > > > as non-cacheable. Some of the platforms rely on a fixed region of 
> > > > > uncached
> > > > > memory that is remapped to the system memory while some other modify
> > > > > the PTE entries to achieve that.
> > > > >
> > > > > The patch3 solves the issue for the fist use case by using a global
> > > > > pool of memory that is reserved for DMA. The device access the 
> > > > > reserved area
> > > > > of the region while corresponding CPU address will be from uncached 
> > > > > region
> > > > > As the uncached region is remapped to the beginning of the system 
> > > > > ram, both
> > > > > CPU and device get the same view.
> > > > >
> > > > > To facilitate streaming DMA APIs, patch 1 introduces a set of generic
> > > > > cache operations. Any platform can use the generic ops to provide 
> > > > > platform
> > > > > specific cache management operations. Once the standard RISC-V CMO 
> > > > > extension
> > > > > is available, it will also use these generic ops.
> > > > >
> > > > > To address the second use case, Page Based Memory Attribute (PBMT) 
> > > > > extension
> > > > > is proposed. Once the extension is in good shape, we can leverage that
> > > > > using CONFIG_DIRECT_REMAP. Currently, it is selected via a compile 
> > > > > time config
> > > > > option. We will probably need another arch specific hooks to know if 
> > > > > the
> > > > > platform supports direct remap at runtime. For RISC-V, it will check 
> > > > > the
> > > > > presence of PBMT extension while other arch function will simply 
> > > > > return true
> > > >
> > > > IIUC this is another extension that's not yet frozen or implemented in
> > > > hardware?  Is this one compatible with what's in the c906, or is it
> > > > doing things its own way?
> > >
> > > Hi Palmer,
> > > This series doesn't implement the PBMT extension which is still in 
> > > discussion.
> > > It simply reuse the existing non-coherent dma mapping support in
> > > upstream kernel and enable
> > > it for RISC-V. The current version uses a non-coherent global pool. I
> > > will update it to use arch_set_uncached
> > > as per Christoph's suggestion. It solves the non-coherent DMA problem
> > > for beagleV and not c906.
> > >
> > > I briefly mentioned the PBMT extension just to provide an idea how the
> > > RISC-V Linux kernel
> > > can support both unached window and PBMT extension. PBMT extension is
> > > planned to be frozen
> > > by the end of this year and none of the hardware has implemented it.
> > >
> > > The implementation in c906 is a non-standard one and will not be
> > > supported by the default PBMT
> > > extension implementation.
> > The default PBMT & c908 PBMT are similar, only BIT definitions are
> 

Re: [RFC 1/5] RISC-V: Implement arch_sync_dma* functions

2021-08-17 Thread Guo Ren
On Tue, Aug 17, 2021 at 11:24 AM Atish Patra  wrote:
>
> On Mon, Aug 16, 2021 at 6:48 PM Guo Ren  wrote:
> >
> > On Sat, Jul 24, 2021 at 5:40 AM Atish Patra  wrote:
> > >
> > > To facilitate streaming DMA APIs, this patch introduces a set of generic
> > > cache operations related dma sync. Any platform can use the generic ops
> > > to provide platform specific cache management operations. Once the
> > > standard RISC-V CMO extension is available, it can be built on top of it.
> > >
> > > Signed-off-by: Atish Patra 
> > > ---
> > >  arch/riscv/include/asm/dma-noncoherent.h | 19 +++
> > >  arch/riscv/mm/Makefile   |  1 +
> > >  arch/riscv/mm/dma-noncoherent.c  | 66 
> > >  3 files changed, 86 insertions(+)
> > >  create mode 100644 arch/riscv/include/asm/dma-noncoherent.h
> > >  create mode 100644 arch/riscv/mm/dma-noncoherent.c
> > >
> > > diff --git a/arch/riscv/include/asm/dma-noncoherent.h 
> > > b/arch/riscv/include/asm/dma-noncoherent.h
> > > new file mode 100644
> > > index ..5bdb03c9c427
> > > --- /dev/null
> > > +++ b/arch/riscv/include/asm/dma-noncoherent.h
> > > @@ -0,0 +1,19 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/*
> > > + * Copyright (c) 2021 Western Digital Corporation or its affiliates.
> > > + */
> > > +
> > > +#ifndef __ASM_RISCV_DMA_NON_COHERENT_H
> > > +#define __ASM_RISCV_DMA_NON_COHERENT_H
> > > +
> > > +#ifdef CONFIG_RISCV_DMA_NONCOHERENT
> > > +struct riscv_dma_cache_sync {
> > > +   void (*cache_invalidate)(phys_addr_t paddr, size_t size);
> > > +   void (*cache_clean)(phys_addr_t paddr, size_t size);
> > > +   void (*cache_flush)(phys_addr_t paddr, size_t size);
> > > +};
> > I like the style like this than my previous patch which using
> > sbi_call. The c906 has custom instructions that could be called in
> > S-mode directly.
> >
>
> How are you going to include the custom instructions in the upstream kernel ?
In errata, call set_ops? I'm headache with that issue.

>
> > Hope the patch could be soon merged, after correct the
> > DMA_FROM/TO_DEVICE/BIDIRECTIONAL and alternatives ops_set.
> >
> > > +
> > > +void riscv_dma_cache_sync_set(struct riscv_dma_cache_sync *ops);
> > > +#endif
> > > +
> > > +#endif
> > > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
> > > index 7ebaef10ea1b..959bef49098b 100644
> > > --- a/arch/riscv/mm/Makefile
> > > +++ b/arch/riscv/mm/Makefile
> > > @@ -27,3 +27,4 @@ KASAN_SANITIZE_init.o := n
> > >  endif
> > >
> > >  obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
> > > +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o
> > > diff --git a/arch/riscv/mm/dma-noncoherent.c 
> > > b/arch/riscv/mm/dma-noncoherent.c
> > > new file mode 100644
> > > index ..2f6e9627c4aa
> > > --- /dev/null
> > > +++ b/arch/riscv/mm/dma-noncoherent.c
> > > @@ -0,0 +1,66 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * RISC-V specific functions to support DMA for non-coherent devices
> > > + *
> > > + * Copyright (c) 2021 Western Digital Corporation or its affiliates.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +static struct riscv_dma_cache_sync *dma_cache_sync;
> > > +unsigned long riscv_dma_uc_offset;
> > > +
> > > +static void __dma_sync(phys_addr_t paddr, size_t size, enum 
> > > dma_data_direction dir)
> > > +{
> > > +   if ((dir == DMA_FROM_DEVICE) && 
> > > (dma_cache_sync->cache_invalidate))
> > > +   dma_cache_sync->cache_invalidate(paddr, size);
> > > +   else if ((dir == DMA_TO_DEVICE) && (dma_cache_sync->cache_clean))
> > > +   dma_cache_sync->cache_clean(paddr, size);
> > > +   else if ((dir == DMA_BIDIRECTIONAL) && 
> > > dma_cache_sync->cache_flush)
> > > +   dma_cache_sync->cache_flush(paddr, size);
> > > +}
> > > +
> > > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum 
> > >

Re: [RFC 1/5] RISC-V: Implement arch_sync_dma* functions

2021-08-16 Thread Guo Ren
On Sat, Jul 24, 2021 at 5:40 AM Atish Patra  wrote:
>
> To facilitate streaming DMA APIs, this patch introduces a set of generic
> cache operations related dma sync. Any platform can use the generic ops
> to provide platform specific cache management operations. Once the
> standard RISC-V CMO extension is available, it can be built on top of it.
>
> Signed-off-by: Atish Patra 
> ---
>  arch/riscv/include/asm/dma-noncoherent.h | 19 +++
>  arch/riscv/mm/Makefile   |  1 +
>  arch/riscv/mm/dma-noncoherent.c  | 66 
>  3 files changed, 86 insertions(+)
>  create mode 100644 arch/riscv/include/asm/dma-noncoherent.h
>  create mode 100644 arch/riscv/mm/dma-noncoherent.c
>
> diff --git a/arch/riscv/include/asm/dma-noncoherent.h 
> b/arch/riscv/include/asm/dma-noncoherent.h
> new file mode 100644
> index ..5bdb03c9c427
> --- /dev/null
> +++ b/arch/riscv/include/asm/dma-noncoherent.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2021 Western Digital Corporation or its affiliates.
> + */
> +
> +#ifndef __ASM_RISCV_DMA_NON_COHERENT_H
> +#define __ASM_RISCV_DMA_NON_COHERENT_H
> +
> +#ifdef CONFIG_RISCV_DMA_NONCOHERENT
> +struct riscv_dma_cache_sync {
> +   void (*cache_invalidate)(phys_addr_t paddr, size_t size);
> +   void (*cache_clean)(phys_addr_t paddr, size_t size);
> +   void (*cache_flush)(phys_addr_t paddr, size_t size);
> +};
I like the style like this than my previous patch which using
sbi_call. The c906 has custom instructions that could be called in
S-mode directly.

Hope the patch could be soon merged, after correct the
DMA_FROM/TO_DEVICE/BIDIRECTIONAL and alternatives ops_set.

> +
> +void riscv_dma_cache_sync_set(struct riscv_dma_cache_sync *ops);
> +#endif
> +
> +#endif
> diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
> index 7ebaef10ea1b..959bef49098b 100644
> --- a/arch/riscv/mm/Makefile
> +++ b/arch/riscv/mm/Makefile
> @@ -27,3 +27,4 @@ KASAN_SANITIZE_init.o := n
>  endif
>
>  obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
> +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o
> diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> new file mode 100644
> index ..2f6e9627c4aa
> --- /dev/null
> +++ b/arch/riscv/mm/dma-noncoherent.c
> @@ -0,0 +1,66 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * RISC-V specific functions to support DMA for non-coherent devices
> + *
> + * Copyright (c) 2021 Western Digital Corporation or its affiliates.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static struct riscv_dma_cache_sync *dma_cache_sync;
> +unsigned long riscv_dma_uc_offset;
> +
> +static void __dma_sync(phys_addr_t paddr, size_t size, enum 
> dma_data_direction dir)
> +{
> +   if ((dir == DMA_FROM_DEVICE) && (dma_cache_sync->cache_invalidate))
> +   dma_cache_sync->cache_invalidate(paddr, size);
> +   else if ((dir == DMA_TO_DEVICE) && (dma_cache_sync->cache_clean))
> +   dma_cache_sync->cache_clean(paddr, size);
> +   else if ((dir == DMA_BIDIRECTIONAL) && dma_cache_sync->cache_flush)
> +   dma_cache_sync->cache_flush(paddr, size);
> +}
> +
> +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum 
> dma_data_direction dir)
> +{
> +   if (!dma_cache_sync)
> +   return;
> +
> +   __dma_sync(paddr, size, dir);
> +}
> +
> +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum 
> dma_data_direction dir)
> +{
> +   if (!dma_cache_sync)
> +   return;
> +
> +   __dma_sync(paddr, size, dir);
> +}
> +
> +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> +   const struct iommu_ops *iommu, bool coherent)
> +{
> +   /* If a specific device is dma-coherent, set it here */
> +   dev->dma_coherent = coherent;
> +}
> +
> +void arch_dma_prep_coherent(struct page *page, size_t size)
> +{
> +   void *flush_addr = page_address(page);
> +
> +   memset(flush_addr, 0, size);
> +   if (dma_cache_sync && dma_cache_sync->cache_flush)
> +   dma_cache_sync->cache_flush(__pa(flush_addr), size);
> +}
> +
> +void riscv_dma_cache_sync_set(struct riscv_dma_cache_sync *ops)
> +{
> +   dma_cache_sync = ops;
> +}
> --
> 2.31.1
>
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/5] Support non-coherent DMA on RISC-V using a global pool

2021-08-16 Thread Guo Ren
solves the non-coherent DMA support on beagleV but 
> > > requires
> > > additional beagleV specific patches[3] which will be upstreamed soon.
> > >
> > >
> > > [1] 
> > > https://lists.linuxfoundation.org/pipermail/iommu/2021-July/057266.html
> > > [2] https://github.com/atishp04/linux/tree/riscv_nc_global_pool
> > > [3] https://github.com/atishp04/linux/tree/wip_beaglev_dma_nc_global
> > >
> > > Atish Patra (5):
> > > RISC-V: Implement arch_sync_dma* functions
> > > of: Move of_dma_get_range to of_address.h
> > > dma-mapping: Enable global non-coherent pool support for RISC-V
> > > dma-direct: Allocate dma pages directly if global pool allocation
> > > fails
> > > RISC-V: Support a new config option for non-coherent DMA
> > >
> > > arch/riscv/Kconfig   |  8 +++
> > > arch/riscv/include/asm/dma-noncoherent.h | 19 +++
> > > arch/riscv/mm/Makefile   |  1 +
> > > arch/riscv/mm/dma-noncoherent.c  | 66 
> > > drivers/of/of_private.h  | 10 
> > > include/linux/of_address.h   | 12 +
> > > kernel/dma/coherent.c| 49 +++---
> > > kernel/dma/direct.c  |  7 ++-
> > > 8 files changed, 152 insertions(+), 20 deletions(-)
> > > create mode 100644 arch/riscv/include/asm/dma-noncoherent.h
> > > create mode 100644 arch/riscv/mm/dma-noncoherent.c
> >
> > Can you guys please make up your minds about how this is going to be
> > supported at the ISA level?  I get a different answer every day here:
> > sometimes it's that these systems are not compliant, sometimes that
> > Linux is the compliance suite, sometimes that we're doing an ISA
> > extension, and sometimes that we're doing the SBI stuff.
> >
>
> I am not sure whom you have talked to. I would be happy to set up a
> meeting so that everybody is on
> the same page if you are getting different answers every time.
>
> > I don't really care all that much about what the decision is, but it's
> > impossible to move forward without some semblance of a plan.
> >
> > ___
> > linux-riscv mailing list
> > linux-ri...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
>
>
> --
> Regards,
> Atish
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [tech-privileged] [RFC PATCH V1] riscv-privileged: Add broadcast mode to sfence.vma

2019-09-19 Thread Guo Ren
Hi,

On Fri, Sep 20, 2019 at 12:10 AM Andrew Waterman  wrote:
>
> This needs to be discussed and debated at length; proposing edits to the spec 
> at this stage is putting the cart before the horse!
Agree :)

>
> We shouldn’t change the definition of the existing SFENCE.VMA instruction to 
> accomplish this. It’s also not abundantly clear to me that this should be an 
> instruction:
If you implement sfence.vma as current define, it also could work with
new mechanism, they are compatible.

> TLB shootdown looks more like MMIO.
Per-CPU MMIO ? I the proposal, every hart only takes care of its own request.




>
> On Thu, Sep 19, 2019 at 5:36 AM Guo Ren  wrote:
>>
>> From: Guo Ren 
>>
>> The patch is for https://github.com/riscv/riscv-isa-manual
>>
>> The proposal has been talked in LPC-2019 RISC-V MC ref [1]. Here is the
>> formal patch.
>>
>> Introduction
>> 
>>
>> Using the Hardware TLB broadcast invalidation instruction to maintain the
>> system TLB is a good choice and it'll simplify the system software design.
>> The proposal hopes to add a broadcast mode to the sfence.vma in the
>> riscv-privilege specification. To support the sfence.vma broadcast mode,
>> there are two modification introduced below:
>>
>>  1) Add PGD.PPN (root page table's PPN) as the unique identifier of the
>> address space in addition to asid/vmid. Compared to the dynamically
>> changed asid/vmid, PGD.PPN is fixed throughout the address space life
>> cycle. This feature enables uniform address space identification
>> between different TLB systems (actually, it's difficult to unify the
>> asid/vmid between the CPU system and the IOMMU system, because their
>> mechanisms are different)
>>
>>  2) Modify the definition of the sfence.vma instruction from synchronous
>> mode to asynchronous mode, which means that the completion of the TLB
>> operation is not guaranteed when the sfence.vma instruction retires.
>> It needs to be completed by checking the flag bit on the hart. The
>> sfence.vma request finish can notify the software by generating an
>> interrupt. This function alleviates the large delay of TLB invalidation
>> in the PCI ATS system.
>>
>> Add S1/S2.PGD.PPN for ASID/VMID
>> ===
>>
>> PGD is global directory (defined in linux) and PPN is page physical number
>> (defined in riscv-spec). PGD.PNN corresponds to the root page table pointer
>> of the address space, i.e. mm->pgd (linux concept).
>>
>> In CPU/IOMMU TLB, we use asid/vmid to distinguish the address space of
>> process or virtual machine. Due to the limitation of id encoding, it can
>> only represent a part(window) of the address space. S1/S2.PGD.PPN are the
>> root page table's PPNs of the address spaces and S1/S2.PGD.PPN are the
>> unique identifier of the address spaces.
>>
>> For the CPU SMP system, you can use context switch to perform the necessary
>> software mechanism to ensure that the asid/vmid on all harts is consistent
>> (please refer to the arm64 asid mechanism). In this way, the TLB broadcast
>> invalidation instruction can determine the address space processed on all
>> harts by asid/vmid.
>>
>> Different from the CPU SMP system, there is no context switch for the
>> DMA-IOMMU system, so the unification with the CPU asid/vmid cannot be
>> guaranteed. So we need a unique identifier for the address space to
>> establish a communication bridge between the TLBs of different systems.
>>
>> That is PGD.PPN (for virtualization scenarios: S1/S2.PGD.PPN)
>>
>> current:
>>  sfence.vma  rs1 = vaddr, rs2 = asid
>>  hfence.vvma rs1 = vaddr, rs2 = asid
>>  hfence.gvma rs1 = gaddr, rs2 = vmid
>>
>> proposed:
>>  sfence.vma  rs1 = vaddr, rs2 = mode:ppn:asid
>>  hfence.vvma rs1 = vaddr, rs2 = mode:ppn:asid
>>  hfence.gvma rs1 = gaddr, rs2 = mode:ppn:vmid
>>
>>  mode  - broadcast | local
>>  ppn   - the PPN of the address space of the root page table
>>  vmid/asid - the window identifier of the address space
>>
>> At the Linux Plumber Conference 2019 RISCV-MC, ref:[1], we've showed two
>> IOMMU examples to explain how it work with hardware.
>>
>> 1) In a lightweight IOMMU system (up to 64 address spaces), the hardware
>>could directly convert PGD.PPN into DID (IOMMU ASID)
>>
>> 2) For the PCI ATS scenario, its IO ASID/VMID encoding space can support
>>a very large number of address spaces. We use two reverse mapping
>>tables to let the hardw

Re: [PATCH RFC 11/14] arm64: Move the ASID allocator code in a separate file

2019-09-19 Thread Guo Ren
On Thu, Sep 19, 2019 at 11:18 PM Jean-Philippe Brucker
 wrote:

>
> The SMMU does support PCI Virtual Function - an hypervisor can assign a
> VF to a guest, and let that guest partition the VF into smaller contexts
> by using PASID.  What it can't support is assigning partitions of a PCI
> function (VF or PF) to multiple Virtual Machines, since there is a
> single S2 PGD per function (in the Stream Table Entry), rather than one
> S2 PGD per PASID context.
>
In my concept, the two sentences "The SMMU does support PCI Virtual
Functio" v.s. "What it can't support is assigning partitions of a PCI
function (VF or PF) to multiple Virtual Machines" are conflict and I
don't want to play naming game :)

-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/


Re: [PATCH RFC 11/14] arm64: Move the ASID allocator code in a separate file

2019-09-19 Thread Guo Ren
Hi,

On Tue, Sep 17, 2019 at 11:42 AM Anup Patel  wrote:

> >
> > With a reply stating that the patch "absolutely does not work" ;)
>
> This patch was tested on existing HW (which does not have ASID implementation)
> and tested on QEMU (which has very simplistic Implementation of ASID).
>
> When I asked Gary Guo about way to get access to their HW (in same patch
> email thread), I did not get any reply. After so many months passed, I now
> doubt the his comment "absolutely does not work".
> >
> > What exactly do you want people to do with that? It's an awful lot of 
> > effort to
> > review this sort of stuff and given that Guo Ren is talking about sharing 
> > page
> > tables between the CPU and an accelerator, maybe you're better off
> > stabilising Linux for the platforms that you can actually test rather than
> > getting so far ahead of yourselves that you end up with a bunch of wasted
> > work on patches that probably won't get merged any time soon.
>
> The intention of the ASID patch was to encourage RISC-V implementations
> having ASID in HW and also ensure that things don't break on existing HW.
>
> I don't see our efforts being wasted in trying to make Linux RISC-V feature
> complete and encouraging more feature rich RISC-V CPUs.
>
> Delays in merging patches are fine as long as people have something to try
> on their RISC-V CPU implementations.
>
I'm the supporter of that patch:
http://archive.lwn.net:8080/linux-kernel/20190329045111.14040-1-anup.pa...@wdc.com/T/#u

Because it implicit hw broadcast tlb invalidation optimization.

Honestly it's not suitable for remote tlb flush with software IPI, but
it's still much better than current RISC-V's.

I'll try it on our hardware: 910. wait a moment :)

-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/


Re: [PATCH RFC 11/14] arm64: Move the ASID allocator code in a separate file

2019-09-19 Thread Guo Ren
Hi,

On Mon, Sep 16, 2019 at 8:57 PM Jean-Philippe Brucker
 wrote:
> On 13/09/2019 09:13, Guo Ren wrote:
> > Another idea is seperate remote TLB invalidate into two instructions:
> >
> >  - sfence.vma.b.asyc
> >  - sfence.vma.b.barrier // wait all async TLB invalidate operations
> > finished for all harts.
>
> It's not clear to me how this helps, but I probably don't have the whole
> picture. If you have a place where it is safe to wait for the barrier to
> complete, why not do the whole invalidate there?
>
> > (I remember who mentioned me separate them into two instructions after
> > session. Anup? Is the idea right ?)
Forget it, I still use irq signal in my formal proposal [1]. I also
couldn't image the whole picture :P


> > To solve the problem, we could define a async mode in sfence.vma.b to
> > slove the problem and finished with per_cpu_irq/exception.
>
> The solution I had to this problem is pinning the ASID [1] used by the
> IOMMU, to prevent the CPU from recycling the ASID on rollover. This way
> the CPU doesn't have to wait for IOMMU invalidations to complete, when
> scheduling a task that might not even have anything to do with the IOMMU.
>

> In the Arm SMMU, ASID and IOASID (PASID) are separate identifiers. IOASID
> indexes an entry in the context descriptor table, which contains the ASID.
> So with unpinned shared ASID you don't need to invalidate the ATC on
> rollover, since the IOASID doesn't change, but you do need to modify the
> context descriptor and invalidate cached versions of it.
The terminology confused me a lot. I perfer use PASID for IOMMU and
ASID is for CPU.
Arm's entry of the context descriptor table contains a "IOASID"

IOASID != ASID for CPU_TLB and IOMMU_TLB.

When you say "since the IOASID doesn't change",Is it PASID or my IOASID ? -_*!
PASID in PCI-sig was used to determine transfer address space.
For intel, the entry which is indexed by PASID also contain S1/S2.PGD
and DID(VMID).
For arm, the entry which is indexed by PASID only contain S1.PGD and
IOASID. Compare to Intel Vt-d Scalable mode, arm's design can't
support PCI Virtual Function.

>
> Once you have pinned ASIDs, you could also declare that IOASID = ASID. I
> don't remember finding an argument to strictly forbid it, even though ASID
> and IOASID have different sizes on Arm (respectively 8/16 and 20 bits).
ASID and IOASID are hard to keep the same between CPU system and IOMMU
system. So I introduce S1/S2.PGD.PPN as a bridge between CPUs and
IOMMUs.
See my proposal [1]

1: 
https://lore.kernel.org/linux-csky/1568896556-28769-1-git-send-email-guo...@kernel.org/T/#u
-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/


Re: [PATCH RFC 11/14] arm64: Move the ASID allocator code in a separate file

2019-09-14 Thread Guo Ren
Here is the presentation, any comments is welcome.

https://docs.google.com/presentation/d/1sc295JznVAfDIPieAqzjcyUkcHnNFQsK8FFqdoCY854/edit?usp=sharing

On Fri, Sep 13, 2019 at 3:13 PM Guo Ren  wrote:
>
> Another idea is seperate remote TLB invalidate into two instructions:
>
>  - sfence.vma.b.asyc
>  - sfence.vma.b.barrier // wait all async TLB invalidate operations finished 
> for all harts.
>
> (I remember who mentioned me separate them into two instructions after 
> session. Anup? Is the idea right ?)
>
> Actually, I never consider asyc TLB invalidate before, because current our 
> light iommu did not need it.
>
> Thx all people attend the session :) Let's continue the talk.
>
>
> Guo Ren  于 2019年9月12日周四 22:59写道:
>>
>> Thx Will for reply.
>>
>> On Thu, Sep 12, 2019 at 3:03 PM Will Deacon  wrote:
>> >
>> > On Sun, Sep 08, 2019 at 07:52:55AM +0800, Guo Ren wrote:
>> > > On Mon, Jun 24, 2019 at 6:40 PM Will Deacon  wrote:
>> > > > > I'll keep my system use the same ASID for SMP + IOMMU :P
>> > > >
>> > > > You will want a separate allocator for that:
>> > > >
>> > > > https://lkml.kernel.org/r/20190610184714.6786-2-jean-philippe.bruc...@arm.com
>> > >
>> > > Yes, it is hard to maintain ASID between IOMMU and CPUMMU or different
>> > > system, because it's difficult to synchronize the IO_ASID when the CPU
>> > > ASID is rollover.
>> > > But we could still use hardware broadcast TLB invalidation instruction
>> > > to uniformly manage the ASID and IO_ASID, or OTHER_ASID in our IOMMU.
>> >
>> > That's probably a bad idea, because you'll likely stall execution on the
>> > CPU until the IOTLB has completed invalidation. In the case of ATS, I think
>> > an endpoint ATC is permitted to take over a minute to respond. In reality, 
>> > I
>> > suspect the worst you'll ever see would be in the msec range, but that's
>> > still an unacceptable period of time to hold a CPU.
>> Just as I've said in the session that IOTLB invalidate delay is
>> another topic, My main proposal is to introduce stage1.pgd and
>> stage2.pgd as address space identifiers between different TLB systems
>> based on vmid, asid. My last part of sildes will show you how to
>> translate stage1/2.pgd to as/vmid in PCI ATS system and the method
>> could work with SMMU-v3 and intel Vt-d. (It's regret for me there is
>> no time to show you the whole slides.)
>>
>> In our light IOMMU implementation, there's no IOTLB invalidate delay
>> problem. Becasue IOMMU is very close to CPU MMU and interconnect's
>> delay is the same with SMP CPUs MMU (no PCI, VM supported).
>>
>> To solve the problem, we could define a async mode in sfence.vma.b to
>> slove the problem and finished with per_cpu_irq/exception.
>>
>> --
>> Best Regards
>>  Guo Ren
>>
>> ML: https://lore.kernel.org/linux-csky/



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH RFC 11/14] arm64: Move the ASID allocator code in a separate file

2019-09-13 Thread Guo Ren
Another idea is seperate remote TLB invalidate into two instructions:

 - sfence.vma.b.asyc
 - sfence.vma.b.barrier // wait all async TLB invalidate operations
finished for all harts.

(I remember who mentioned me separate them into two instructions after
session. Anup? Is the idea right ?)

Actually, I never consider asyc TLB invalidate before, because current our
light iommu did not need it.

Thx all people attend the session :) Let's continue the talk.


Guo Ren  于 2019年9月12日周四 22:59写道:

> Thx Will for reply.
>
> On Thu, Sep 12, 2019 at 3:03 PM Will Deacon  wrote:
> >
> > On Sun, Sep 08, 2019 at 07:52:55AM +0800, Guo Ren wrote:
> > > On Mon, Jun 24, 2019 at 6:40 PM Will Deacon  wrote:
> > > > > I'll keep my system use the same ASID for SMP + IOMMU :P
> > > >
> > > > You will want a separate allocator for that:
> > > >
> > > >
> https://lkml.kernel.org/r/20190610184714.6786-2-jean-philippe.bruc...@arm.com
> > >
> > > Yes, it is hard to maintain ASID between IOMMU and CPUMMU or different
> > > system, because it's difficult to synchronize the IO_ASID when the CPU
> > > ASID is rollover.
> > > But we could still use hardware broadcast TLB invalidation instruction
> > > to uniformly manage the ASID and IO_ASID, or OTHER_ASID in our IOMMU.
> >
> > That's probably a bad idea, because you'll likely stall execution on the
> > CPU until the IOTLB has completed invalidation. In the case of ATS, I
> think
> > an endpoint ATC is permitted to take over a minute to respond. In
> reality, I
> > suspect the worst you'll ever see would be in the msec range, but that's
> > still an unacceptable period of time to hold a CPU.
> Just as I've said in the session that IOTLB invalidate delay is
> another topic, My main proposal is to introduce stage1.pgd and
> stage2.pgd as address space identifiers between different TLB systems
> based on vmid, asid. My last part of sildes will show you how to
> translate stage1/2.pgd to as/vmid in PCI ATS system and the method
> could work with SMMU-v3 and intel Vt-d. (It's regret for me there is
> no time to show you the whole slides.)
>
> In our light IOMMU implementation, there's no IOTLB invalidate delay
> problem. Becasue IOMMU is very close to CPU MMU and interconnect's
> delay is the same with SMP CPUs MMU (no PCI, VM supported).
>
> To solve the problem, we could define a async mode in sfence.vma.b to
> slove the problem and finished with per_cpu_irq/exception.
>
> --
> Best Regards
>  Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH RFC 11/14] arm64: Move the ASID allocator code in a separate file

2019-09-12 Thread Guo Ren
Thx Will for reply.

On Thu, Sep 12, 2019 at 3:03 PM Will Deacon  wrote:
>
> On Sun, Sep 08, 2019 at 07:52:55AM +0800, Guo Ren wrote:
> > On Mon, Jun 24, 2019 at 6:40 PM Will Deacon  wrote:
> > > > I'll keep my system use the same ASID for SMP + IOMMU :P
> > >
> > > You will want a separate allocator for that:
> > >
> > > https://lkml.kernel.org/r/20190610184714.6786-2-jean-philippe.bruc...@arm.com
> >
> > Yes, it is hard to maintain ASID between IOMMU and CPUMMU or different
> > system, because it's difficult to synchronize the IO_ASID when the CPU
> > ASID is rollover.
> > But we could still use hardware broadcast TLB invalidation instruction
> > to uniformly manage the ASID and IO_ASID, or OTHER_ASID in our IOMMU.
>
> That's probably a bad idea, because you'll likely stall execution on the
> CPU until the IOTLB has completed invalidation. In the case of ATS, I think
> an endpoint ATC is permitted to take over a minute to respond. In reality, I
> suspect the worst you'll ever see would be in the msec range, but that's
> still an unacceptable period of time to hold a CPU.
Just as I've said in the session that IOTLB invalidate delay is
another topic, My main proposal is to introduce stage1.pgd and
stage2.pgd as address space identifiers between different TLB systems
based on vmid, asid. My last part of sildes will show you how to
translate stage1/2.pgd to as/vmid in PCI ATS system and the method
could work with SMMU-v3 and intel Vt-d. (It's regret for me there is
no time to show you the whole slides.)

In our light IOMMU implementation, there's no IOTLB invalidate delay
problem. Becasue IOMMU is very close to CPU MMU and interconnect's
delay is the same with SMP CPUs MMU (no PCI, VM supported).

To solve the problem, we could define a async mode in sfence.vma.b to
slove the problem and finished with per_cpu_irq/exception.

-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/


Re: [PATCH RFC 11/14] arm64: Move the ASID allocator code in a separate file

2019-09-07 Thread Guo Ren
Thx Will,

On Mon, Jun 24, 2019 at 6:40 PM Will Deacon  wrote:
> > I'll keep my system use the same ASID for SMP + IOMMU :P
>
> You will want a separate allocator for that:
>
> https://lkml.kernel.org/r/20190610184714.6786-2-jean-philippe.bruc...@arm.com

Yes, it is hard to maintain ASID between IOMMU and CPUMMU or different
system, because it's difficult to synchronize the IO_ASID when the CPU
ASID is rollover.
But we could still use hardware broadcast TLB invalidation instruction
to uniformly manage the ASID and IO_ASID, or OTHER_ASID in our IOMMU.

Welcome to join our disscusion:
"Introduce an implementation of IOMMU in linux-riscv"
9 Sep 2019, 10:45 Jade-room-I (Corinthia Hotel Lisbon) RISC-V MC

--
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 9/9] csky: use the generic remapping dma alloc implementation

2018-11-05 Thread Guo Ren
 -
> - __dma_clear_buffer(page, size);
> -
> - if (attrs & DMA_ATTR_NO_KERNEL_MAPPING)
> - return page;
> -
> - vaddr = dma_common_contiguous_remap(page, PAGE_ALIGN(size), VM_USERMAP,
> - pgprot_noncached(PAGE_KERNEL), __builtin_return_address(0));
> - if (!vaddr)
> - BUG();
> -
> - return vaddr;
> -}
> -
> -static void csky_dma_free_nonatomic(
> - struct device *dev,
> - size_t size,
> - void *vaddr,
> - dma_addr_t dma_handle,
> - unsigned long attrs
> - )
> -{
> - struct page *page = phys_to_page(dma_handle);
> - unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> -
> - if ((unsigned int)vaddr >= VMALLOC_START)
> - dma_common_free_remap(vaddr, size, VM_USERMAP);
> -
> - if (IS_ENABLED(CONFIG_DMA_CMA))
> - dma_release_from_contiguous(dev, page, count);
> - else
> - __free_pages(page, get_order(size));
> -}
> -
> -void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
> -  gfp_t gfp, unsigned long attrs)
> -{
> - if (gfpflags_allow_blocking(gfp))
> - return csky_dma_alloc_nonatomic(dev, size, dma_handle, gfp,
> - attrs);
> - else
> - return csky_dma_alloc_atomic(dev, size, dma_handle);
> -}
> -
> -void arch_dma_free(struct device *dev, size_t size, void *vaddr,
> -dma_addr_t dma_handle, unsigned long attrs)
> -{
> - if (!addr_in_gen_pool(atomic_pool, (unsigned int) vaddr, size))
> - csky_dma_free_nonatomic(dev, size, vaddr, dma_handle, attrs);
> - else
> - csky_dma_free_atomic(dev, size, vaddr, dma_handle, attrs);
> -}
> -
>  static inline void cache_op(phys_addr_t paddr, size_t size,
>   void (*fn)(unsigned long start, unsigned long end))
>  {
> -- 
> 2.19.1

Reviewed-by: Guo Ren 

Compile is OK, qemu boot OK. Functions are the same and just move to common.

Looks good for me.

 Guo Ren

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 8/9] csky: don't use GFP_DMA in atomic_pool_init

2018-11-05 Thread Guo Ren
On Mon, Nov 05, 2018 at 01:19:30PM +0100, Christoph Hellwig wrote:
> csky does not implement ZONE_DMA, which means passing GFP_DMA is a no-op.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/csky/mm/dma-mapping.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> index 85437b21e045..ad4046939713 100644
> --- a/arch/csky/mm/dma-mapping.c
> +++ b/arch/csky/mm/dma-mapping.c
> @@ -35,7 +35,7 @@ static int __init atomic_pool_init(void)
>   if (!atomic_pool)
>   BUG();
>  
> - page = alloc_pages(GFP_KERNEL | GFP_DMA, get_order(size));
> + page = alloc_pages(GFP_KERNEL, get_order(size));
>   if (!page)
>   BUG();
>  
> -- 
> 2.19.1

Acked-by: Guo Ren 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 7/9] csky: don't select DMA_NONCOHERENT_OPS

2018-11-05 Thread Guo Ren
On Mon, Nov 05, 2018 at 01:19:29PM +0100, Christoph Hellwig wrote:
> This option is gone past Linux 4.19.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/csky/Kconfig | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
> index 8a30e006a845..c0cf8e948821 100644
> --- a/arch/csky/Kconfig
> +++ b/arch/csky/Kconfig
> @@ -8,7 +8,6 @@ config CSKY
>   select CLKSRC_MMIO
>   select CLKSRC_OF
>   select DMA_DIRECT_OPS
> - select DMA_NONCOHERENT_OPS
>   select DMA_REMAP
>   select IRQ_DOMAIN
>       select HANDLE_DOMAIN_IRQ
> -- 
> 2.19.1

Acked-by: Guo Ren 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu